Get your free API key Start with 10,000 Monthly Token Limit on our Free Plan. No credit card required. Your tokens automatically reset on the 1st of each month.
ID of the model to use (e.g., garda-beta-mini, nusantara-base). See /v1/models .
A list of messages comprising the conversation so far. The role of the messages author. One of system, user, assistant, or tool.
The contents of the message.
An optional name for the participant. Provides the model information to differentiate between participants of the same role.
Tool call that this message is responding to (required for tool role).
The maximum number of tokens to generate in the chat completion.
What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass.
If set, partial message deltas will be sent. Tokens will be sent as data-only server-sent events as they become available.
Up to 4 sequences where the API will stop generating further tokens.
A list of tools the model may call. Currently, only functions are supported as a tool. Use this to provide a list of functions the model may generate JSON inputs for.
Controls which (if any) tool is called by the model. Can be none, auto, required, or a specific tool object.
Enable reasoning capabilities for supported models (e.g., nusantara-base, garda-beta-mini). Controls the reasoning effort. Can be low, medium, or high. (Cannot be used with max_tokens)
The maximum number of tokens to reserve for reasoning. (Cannot be used with effort)
Configuration for web search capabilities on supported models. The depth of the search. Can be basic or advanced.
An object specifying the format that the model must output. Must be one of text, json_object, or json_schema.
A unique identifier representing your end-user, which can help Neosantara AI to monitor and detect abuse.
Returns
A unique identifier for the chat completion.
The object type, which is always chat.completion.
The Unix timestamp (in seconds) of when the chat completion was created.
The model used for the chat completion.
A list of chat completion choices. The index of the choice in the list of choices.
A chat completion message generated by the model. The role of the author of this message.
The contents of the message.
The tool calls generated by the model, such as function calls.
The reason the model stopped generating tokens. This will be stop if the model hit a natural stop point or a provided stop sequence, length if the maximum number of tokens specified in the request was reached, tool_calls if the model called a tool, or content_filter if content was omitted due to a flag from our content filters.
Usage statistics for the completion request. Number of tokens in the prompt.
Number of tokens in the generated completion.
Total number of tokens used in the request (prompt + completion).
Return Examples
{
"id" : "chatcmpl-123" ,
"object" : "chat.completion" ,
"created" : 1677652288 ,
"model" : "nusantara-base" ,
"choices" : [
{
"index" : 0 ,
"message" : {
"role" : "assistant" ,
"content" : "Hello there, how may I assist you today?"
},
"finish_reason" : "stop"
}
],
"usage" : {
"prompt_tokens" : 9 ,
"completion_tokens" : 12 ,
"total_tokens" : 21
}
}