Why Stateful Conversations?
- Reduced Latency & Cost: You don’t need to re-upload thousands of tokens of history for every new turn.
- Simpler Client Logic: No need to manage a complex
messagesarray in your application state. - Context Continuity: Ensures the model “remembers” previous tools calls, reasoning steps, and context automatically.
How it Works
The system tracks conversation threads using Response IDs. Every time you generate a response withstore: true (default), the system saves the input and output. To continue the conversation, you simply pass the ID of the last response you received.
1. Starting a Conversation
To start a new conversation, simply make a request. Ensurestore is set to true (it is by default).
Request
id.
Response
2. Continuing the Conversation
To reply, provide your new input and theprevious_response_id from the last turn. You do not need to resend “My name is Alice”.
Request
Response
Managing Context Window
While the server manages history, the underlying model still has a context window limit.- Automatic Truncation: Neosantara attempts to manage context intelligently.
- Manual Control: You can use the
truncationparameter (“auto” or “disabled”) to control behavior when the history exceeds the model’s limit.
Branching Conversations
You can create “forks” in a conversation by referencing an olderprevious_response_id.
This creates two separate conversation trees branching from the same root.
Stateless Mode (Privacy & ZDR)
If you have strict Zero Data Retention (ZDR) requirements or simply don’t want to store history, setstore: false.
store: false:
- The conversation is not saved to the database.
- You cannot use
previous_response_idto continue this conversation later. - The response
idcannot be referenced.