Appearance
Rate Limit Errors
Errors when request or token limits are exceeded.
Error Codes
| Code | HTTP Status | Description |
|---|---|---|
rate_limit_exceeded | 429 | Request rate limit exceeded |
token_limit_exceeded | 429 | Token rate limit exceeded |
concurrent_limit_exceeded | 429 | Too many concurrent requests |
daily_limit_exceeded | 429 | Daily request limit exceeded |
monthly_limit_exceeded | 429 | Monthly request limit exceeded |
Error Format
json
{
"error": {
"type": "rate_limit_error",
"code": "rate_limit_exceeded",
"message": "Rate limit exceeded. Please retry after 5 seconds.",
"retry_after": 5,
"limit": 1000,
"current": 1005,
"reset_at": "2026-02-16T10:05:00Z"
}
}Response Headers
Rate limit errors include these headers:
| Header | Description |
|---|---|
Retry-After | Seconds to wait before retrying |
X-RateLimit-Limit-Requests | Request limit per minute |
X-RateLimit-Remaining-Requests | Requests remaining |
X-RateLimit-Reset-Requests | Seconds until limit resets |
X-RateLimit-Limit-Tokens | Token limit per minute |
X-RateLimit-Remaining-Tokens | Tokens remaining |
X-RateLimit-Reset-Tokens | Seconds until token limit resets |
Error Details
rate_limit_exceeded
Requests per minute (RPM) limit exceeded.
json
{
"error": {
"type": "rate_limit_error",
"code": "rate_limit_exceeded",
"message": "You have exceeded your requests per minute limit.",
"retry_after": 12,
"limit": 1000,
"current": 1000,
"reset_at": "2026-02-16T10:05:00Z",
"limit_type": "rpm"
}
}Resolution:
- Wait for the
retry_afterperiod - Implement exponential backoff
- Consider upgrading your plan for higher limits
token_limit_exceeded
Tokens per minute (TPM) limit exceeded.
json
{
"error": {
"type": "rate_limit_error",
"code": "token_limit_exceeded",
"message": "You have exceeded your tokens per minute limit.",
"retry_after": 45,
"limit": 100000,
"current": 100500,
"reset_at": "2026-02-16T10:05:00Z",
"limit_type": "tpm"
}
}Resolution:
- Reduce prompt/response sizes
- Batch smaller requests
- Wait for the reset period
concurrent_limit_exceeded
Too many simultaneous requests.
json
{
"error": {
"type": "rate_limit_error",
"code": "concurrent_limit_exceeded",
"message": "Too many concurrent requests. Maximum is 50.",
"limit": 50,
"current": 52
}
}Resolution:
- Implement a request queue
- Wait for in-flight requests to complete
- Increase concurrent limit in your plan
daily_limit_exceeded
Daily request quota exhausted.
json
{
"error": {
"type": "rate_limit_error",
"code": "daily_limit_exceeded",
"message": "Daily request limit exceeded.",
"limit": 10000,
"current": 10000,
"reset_at": "2026-02-17T00:00:00Z"
}
}monthly_limit_exceeded
Monthly quota exhausted.
json
{
"error": {
"type": "rate_limit_error",
"code": "monthly_limit_exceeded",
"message": "Monthly request limit exceeded.",
"limit": 100000,
"current": 100000,
"reset_at": "2026-03-01T00:00:00Z"
}
}Handling Rate Limits
Python with Retry
python
import openai
import time
from tenacity import retry, stop_after_attempt, wait_exponential
client = openai.OpenAI(
base_url="https://api.gateflow.ai/v1",
api_key="gw_prod_..."
)
@retry(
stop=stop_after_attempt(5),
wait=wait_exponential(multiplier=1, min=1, max=60)
)
def make_request(messages):
try:
return client.chat.completions.create(
model="gpt-5.2",
messages=messages
)
except openai.RateLimitError as e:
retry_after = int(e.response.headers.get("Retry-After", 5))
print(f"Rate limited, waiting {retry_after}s...")
time.sleep(retry_after)
raise # Re-raise to trigger retryPython with Manual Backoff
python
def make_request_with_backoff(messages, max_retries=5):
for attempt in range(max_retries):
try:
return client.chat.completions.create(
model="gpt-5.2",
messages=messages
)
except openai.RateLimitError as e:
if attempt == max_retries - 1:
raise
retry_after = int(e.response.headers.get("Retry-After", 5))
# Add jitter
wait_time = retry_after + random.uniform(0, 2)
print(f"Rate limited, waiting {wait_time:.1f}s...")
time.sleep(wait_time)JavaScript/TypeScript
typescript
import OpenAI from 'openai';
const client = new OpenAI({
baseURL: 'https://api.gateflow.ai/v1',
apiKey: 'gw_prod_...',
});
async function makeRequestWithRetry(
messages: OpenAI.ChatCompletionMessageParam[],
maxRetries = 5
): Promise<OpenAI.ChatCompletion> {
for (let attempt = 0; attempt < maxRetries; attempt++) {
try {
return await client.chat.completions.create({
model: 'gpt-5.2',
messages,
});
} catch (error) {
if (error instanceof OpenAI.RateLimitError) {
if (attempt === maxRetries - 1) throw error;
const retryAfter = parseInt(
error.headers?.get('retry-after') || '5'
);
console.log(`Rate limited, waiting ${retryAfter}s...`);
await new Promise(r => setTimeout(r, retryAfter * 1000));
} else {
throw error;
}
}
}
throw new Error('Max retries exceeded');
}Using GateFlow Queue
Let GateFlow handle rate limits automatically:
python
response = client.chat.completions.create(
model="gpt-5.2",
messages=[{"role": "user", "content": "Hello"}],
extra_body={
"gateflow": {
"rate_limit_handling": {
"strategy": "queue",
"max_wait_ms": 30000
}
}
}
)Using Fallbacks
Switch to alternative models when rate limited:
python
response = client.chat.completions.create(
model="gpt-5.2",
messages=[{"role": "user", "content": "Hello"}],
extra_body={
"gateflow": {
"rate_limit_handling": {
"strategy": "fallback"
},
"fallbacks": ["claude-sonnet-4-5-20250929", "gemini-3-pro"]
}
}
)Monitoring Rate Limits
Check Current Usage
bash
curl https://api.gateflow.ai/v1/management/rate-limits \
-H "Authorization: Bearer gw_prod_..."Response
json
{
"api_key": {
"rpm": {"limit": 1000, "remaining": 850, "reset_at": "2026-02-16T10:05:00Z"},
"tpm": {"limit": 100000, "remaining": 75000, "reset_at": "2026-02-16T10:05:00Z"},
"daily": {"limit": 10000, "remaining": 8500, "reset_at": "2026-02-17T00:00:00Z"}
},
"providers": {
"openai": {
"rpm": {"limit": 5000, "remaining": 4200},
"tpm": {"limit": 600000, "remaining": 450000}
}
}
}Best Practices
- Implement backoff - Always use exponential backoff with jitter
- Respect Retry-After - Honor the header value
- Monitor proactively - Set up alerts before hitting limits
- Use request queuing - Enable GateFlow's queue feature
- Plan capacity - Right-size your limits for expected traffic
See Also
- Rate Limits Guide - Rate limit management
- Request Queuing - Queue configuration
- Retry Logic - Retry strategies