Streaming allows you to receive partial responses from the API as they are generated, rather than waiting for the entire response to be completed. This can significantly improve the perceived latency and user experience, especially for longer generations.

How Streaming Works

The NusantaraAI API implements streaming using Server-Sent Events (SSE) when the stream parameter is set to true in your request. When a streaming request is made:
  1. Content-Type Header: The API sets the Content-Type header to text/event-stream.
  2. Partial Data Chunks: The server sends data in small chunks. Each chunk is prefixed with data: and contains a JSON object representing a partial response.
  3. End of Stream Signal: The stream is terminated by a data: [DONE] message, indicating that no further data will be sent for that request.
This allows your client application to display or process the generated content incrementally, as it becomes available.

Usage

To enable streaming for your chat completions or text completions, set the stream parameter to true in your request body.

Request Body Example

{
  "model": "nusantara-base",
  "messages": [
    {
      "role": "user",
      "content": "Explain the concept of AI streaming in detail."
    }
  ],
  "temperature": 0.7,
  "max_tokens": 1000,
  "stream": true
}

Example Streaming Response (Truncated)

data: {"id":"chatcmpl-a1b2c3d4e5f6","object":"chat.completion.chunk","created":1701234567,"model":"nusantara-base","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-a1b2c3d4e5f6","object":"chat.completion.chunk","created":1701234567,"model":"nusantara-base","choices":[{"index":0,"delta":{"content":"AI "},"finish_reason":null}]}

data: {"id":"chatcmpl-a1b2c3d4e5f6","object":"chat.completion.chunk","created":1701234567,"model":"nusantara-base","choices":[{"index":0,"delta":{"content":"streaming "},"finish_reason":null}]}

// ... more data chunks ...

data: {"id":"chatcmpl-a1b2c3d4e5f6","object":"chat.completion.chunk","created":1701234567,"model":"nusantara-base","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Python Example (with OpenAI SDK)

You can easily integrate streaming capabilities into your Python applications using the OpenAI Python SDK. Ensure you have the openai package installed (pip install openai).
from openai import OpenAI

# Initialize the OpenAI client with your API key
# Replace "YOUR_API_KEY" with your actual NusantaraAI API Key
client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.neosantara.xyz/v1"
)

def stream_chat_completion(prompt: str, model: str = "nusantara-base"):
    """
    Makes a streaming chat completion request to the NusantaraAI API.
    """
    print(f"Streaming response for model: {model}")
    print(f"Prompt: {prompt}\n")

    try:
        stream = client.chat.completions.create(
            model=model,
            messages=[
                {"role": "user", "content": prompt}
            ],
            stream=True,  # Enable streaming
            max_tokens=500,
            temperature=0.7
        )

        full_response_content = ""
        for chunk in stream:
            # Each chunk is a ChatCompletionChunk object
            content = chunk.choices[0].delta.content if chunk.choices[0].delta.content else ""
            full_response_content += content
            print(content, end="", flush=True) # Print incrementally

        print("\n\n--- Stream finished ---")
        return full_response_content

    except Exception as e:
        print(f"\nAn error occurred during streaming: {e}")
        return None

if __name__ == "__main__":
    user_prompt = "Tell me a short story about a mythical creature from Indonesian folklore."
    streamed_text = stream_chat_completion(user_prompt)

    if streamed_text:
        print(f"\nTotal streamed content length: {len(streamed_text)} characters")