Streaming Responses

Use Server-Sent Events (SSE) for lower perceived latency and real-time generation UX.

What is SSE?

Server-Sent Events is a streaming HTTP protocol where the server pushes text/event-stream chunks over a single request. It is ideal for token-by-token output in chat and coding interfaces.

Enable streaming

Add "stream": true in your request body for completion or agentic endpoints.

Chunk format

5 lines
data: {"delta":"    left, "}

data: {"delta":"right = 0, len(arr) - 1"}

data: [DONE]

Python example

23 lines
import requests

response = requests.post(
    "https://api.nascentist.ai/v1/complete",
    headers={
        "Authorization": "Bearer nsc_live_YOUR_KEY",
        "Content-Type": "application/json",
    },
    json={
        "prompt": "def quicksort(arr):",
        "stream": True
    },
    stream=True,
)

for line in response.iter_lines():
    if not line:
        continue
    text = line.decode("utf-8")
    if text == "data: [DONE]":
        break
    if text.startswith("data: "):
        print(text[6:], end="", flush=True)

Node.js example

34 lines
const response = await fetch("https://api.nascentist.ai/v1/complete", {
  method: "POST",
  headers: {
    Authorization: "Bearer nsc_live_YOUR_KEY",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    prompt: "def quicksort(arr):",
    stream: true,
  }),
});

const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = "";

while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  buffer += decoder.decode(value, { stream: true });
  const events = buffer.split("\n\n");
  buffer = events.pop() ?? "";

  for (const event of events) {
    if (!event.startsWith("data: ")) continue;
    const payload = event.slice(6);
    if (payload === "[DONE]") {
      break;
    }
    const chunk = JSON.parse(payload);
    process.stdout.write(chunk.delta ?? "");
  }
}

Handling [DONE] and errors

Tip
Always finalize UI state when receiving [DONE], and guard JSON parsing because partial chunks can arrive across multiple network frames.
Warning
If network disconnects mid-stream, keep partial content and show a retry action instead of discarding output.