Fair Usage & System Stability
Overview
Rate limits control the speed at which you can make API requests and process tokens. These are technical constraints designed to prevent system overload and ensure consistent performance for all users.How Rate Limits Work
RPM
ITPM
OTPM
Understanding the Three Limit Types
Understanding the Three Limit Types
- You’re on Basic tier (50 RPM, 20K ITPM, 5K OTPM)
- You make 45 requests in one minute (within RPM ✅)
- But each request uses 1,000 input tokens = 45,000 total (exceeds ITPM ❌)
- Result: Rate limit error even though RPM wasn’t exceeded
Throughput Tiers
Your rate limits are determined by your account tier, which automatically upgrades based on your total lifetime deposits.| Tier | Min. Deposit | RPM | ITPM | OTPM | Best For |
|---|---|---|---|---|---|
| Free | Rp 0 | 3 | 5,000 | 2,000 | Testing & Learning |
| Basic | Rp 85,000 | 50 | 20,000 | 5,000 | Small Projects |
| Standard | Rp 670,000 | 1,000 | 100,000 | 25,000 | Production Apps |
| Pro | Rp 3,350,000 | 2,000 | 200,000 | 50,000 | High-Volume Apps |
| Enterprise | Rp 6,700,000 | 4,000 | 500,000 | 125,000 | Large Scale Operations |
Monitoring Rate Limits
Response Headers
Every API response includes headers showing your current rate limit status. Use these to proactively avoid hitting limits.- Request Limits
- Input Token Limits
- Output Token Limits
- Account Info
| Header | Description |
|---|---|
requests-limit | Your maximum RPM |
requests-remaining | Requests left in current window |
requests-reset | When the limit resets (ISO 8601) |
Complete Header Reference
| Header | Type | Description |
|---|---|---|
x-neosantara-ratelimit-requests-limit | integer | Maximum requests per minute |
x-neosantara-ratelimit-requests-remaining | integer | Requests remaining in current window |
x-neosantara-ratelimit-requests-reset | string | ISO 8601 timestamp when request limit resets |
x-neosantara-ratelimit-input-tokens-limit | integer | Maximum input tokens per minute |
x-neosantara-ratelimit-input-tokens-remaining | integer | Input tokens remaining in current window |
x-neosantara-ratelimit-input-tokens-reset | string | ISO 8601 timestamp when input limit resets |
x-neosantara-ratelimit-output-tokens-limit | integer | Maximum output tokens per minute |
x-neosantara-ratelimit-output-tokens-remaining | integer | Output tokens remaining in current window |
x-neosantara-ratelimit-output-tokens-reset | string | ISO 8601 timestamp when output limit resets |
x-neosantara-tier | string | Current account tier (Free, Basic, Standard, Pro, Enterprise) |
Error Handling
429 Too Many Requests
This error occurs when you exceed any of the three throughput limits (RPM, ITPM, or OTPM).Error Response Fields
| Field | Type | Description |
|---|---|---|
error.code | string | The specific limit exceeded: rpm_exceeded, itpm_exceeded, or otpm_exceeded |
error.details.retry_after | integer | Seconds to wait before retrying |
error.details.limit | integer | The limit value that was exceeded |
error.details.remaining | integer | Always 0 when rate limited |
error.details.reset | string | ISO 8601 timestamp when limit resets |
HTTP Headers
| Header | Description |
|---|---|
Retry-After | Seconds to wait before making a new request |
| Rate limit headers | Show which limit was exceeded (see Monitoring section) |
Best Practices
Implement Exponential Backoff
Implement Exponential Backoff
Monitor Response Headers
Monitor Response Headers
Use Batch API for High Volume
Use Batch API for High Volume
- Has separate, higher rate limits
- Costs 50% less
- Better for non-urgent processing
- Processing 100+ requests
- Non-time-sensitive tasks
- Overnight or background jobs
Distribute Load Across Time
Distribute Load Across Time
Implement Request Queuing
Implement Request Queuing
Cache Responses When Possible
Cache Responses When Possible
Upgrade Tier When Needed
Upgrade Tier When Needed
- Check your current tier in the dashboard
- Calculate needed tier based on your usage patterns
- Top up balance to reach the next threshold
- Tier upgrades are automatic and immediate
- 16× more requests (3 → 50 RPM)
- 4× more input tokens
- 2.5× more output tokens
- Access to Batch API (50% savings)
Rate Limit Calculation Examples
- Example 1: Chat Application
- Example 2: Document Processing
- Example 3: Batch Processing
nusantara-baseConfiguration:- Tier: Basic (50 RPM, 20K ITPM, 5K OTPM)
- Average input: 100 tokens/request
- Average output: 50 tokens/request
Frequently Asked Questions
What happens if I exceed rate limits?
What happens if I exceed rate limits?
429 Too Many Requests error with a Retry-After header. Your request is not processed, and no tokens are charged. Wait for the specified time and retry.Do rate limits reset every minute?
Do rate limits reset every minute?
Are Batch API requests counted toward standard rate limits?
Are Batch API requests counted toward standard rate limits?
Can I request temporary rate limit increases?
Can I request temporary rate limit increases?
Why do I get rate limited when my RPM is not exceeded?
Why do I get rate limited when my RPM is not exceeded?
error.code field to see which limit was hit.Do streaming requests count differently?
Do streaming requests count differently?
How do concurrent requests affect rate limits?
How do concurrent requests affect rate limits?
Troubleshooting
Identify Which Limit Was Hit
error.code field in the 429 response:rpm_exceeded- Too many requestsitpm_exceeded- Too many input tokensotpm_exceeded- Too many output tokens
Review Your Usage Pattern
- Are requests evenly distributed?
- Are you sending large batches at once?
- What’s the average token count per request?
Implement Rate Limiting Logic
- Track your own request count
- Monitor response headers
- Implement queuing or throttling