Appearance
Rate Limits
Understanding and managing rate limits across providers.
Overview
GateFlow aggregates rate limits across all your AI providers and provides unified rate limit management:
Provider Rate Limits
Default Limits by Provider
| Provider | RPM | TPM | Notes |
|---|---|---|---|
| OpenAI | 500-10,000 | 30K-1M | Varies by tier |
| Anthropic | 1,000-4,000 | 100K-400K | Varies by tier |
| 360-1,000 | 120K-1M | Per model | |
| Mistral | 500-2,000 | 500K | Per model |
| Cohere | 10,000 | 10M | Enterprise |
| ElevenLabs | 100-500 | N/A | Characters/month |
Checking Your Limits
bash
curl https://api.gateflow.ai/v1/management/rate-limits \
-H "Authorization: Bearer gw_prod_..."Response:
json
{
"providers": {
"openai": {
"rpm": {"limit": 5000, "remaining": 4850, "reset_at": "2026-02-16T12:05:00Z"},
"tpm": {"limit": 600000, "remaining": 580000, "reset_at": "2026-02-16T12:05:00Z"}
},
"anthropic": {
"rpm": {"limit": 2000, "remaining": 1990, "reset_at": "2026-02-16T12:05:00Z"},
"tpm": {"limit": 200000, "remaining": 195000, "reset_at": "2026-02-16T12:05:00Z"}
}
}
}GateFlow Rate Limits
API Key Limits
Set limits per API key:
bash
curl -X POST https://api.gateflow.ai/v1/management/api-keys \
-H "Authorization: Bearer gw_prod_admin_key" \
-H "Content-Type: application/json" \
-d '{
"name": "production-key",
"rate_limits": {
"rpm": 1000,
"tpm": 100000,
"daily_requests": 50000,
"monthly_cost_usd": 500
}
}'Organization Limits
Set limits at the organization level:
bash
curl -X PATCH https://api.gateflow.ai/v1/management/organization \
-H "Authorization: Bearer gw_prod_admin_key" \
-H "Content-Type: application/json" \
-d '{
"rate_limits": {
"rpm": 10000,
"tpm": 1000000,
"monthly_budget_usd": 5000
}
}'Rate Limit Headers
Every response includes rate limit information:
| Header | Description |
|---|---|
X-RateLimit-Limit-Requests | Total requests allowed per minute |
X-RateLimit-Remaining-Requests | Requests remaining this minute |
X-RateLimit-Reset-Requests | Seconds until limit resets |
X-RateLimit-Limit-Tokens | Total tokens allowed per minute |
X-RateLimit-Remaining-Tokens | Tokens remaining this minute |
X-RateLimit-Reset-Tokens | Seconds until token limit resets |
Handling Rate Limits
Client-Side Handling
python
import openai
import time
client = openai.OpenAI(
base_url="https://api.gateflow.ai/v1",
api_key="gw_prod_..."
)
def make_request_with_backoff(messages, max_retries=3):
for attempt in range(max_retries):
try:
return client.chat.completions.create(
model="gpt-5.2",
messages=messages
)
except openai.RateLimitError as e:
if attempt == max_retries - 1:
raise
retry_after = int(e.response.headers.get("Retry-After", 60))
time.sleep(retry_after)Automatic Handling with GateFlow
Let GateFlow handle rate limits automatically:
python
response = client.chat.completions.create(
model="gpt-5.2",
messages=[{"role": "user", "content": "Hello"}],
extra_body={
"gateflow": {
"rate_limit_handling": {
"strategy": "queue", # or "fallback" or "error"
"max_wait_ms": 30000
}
}
}
)Rate Limit Strategies
1. Queue Strategy
Requests are queued when rate limited:
python
response = client.chat.completions.create(
model="gpt-5.2",
messages=[{"role": "user", "content": "Hello"}],
extra_body={
"gateflow": {
"rate_limit_handling": {
"strategy": "queue",
"max_wait_ms": 60000,
"priority": "high" # high, normal, low
}
}
}
)2. Fallback Strategy
Switch to alternative providers:
python
response = client.chat.completions.create(
model="gpt-5.2",
messages=[{"role": "user", "content": "Hello"}],
extra_body={
"gateflow": {
"rate_limit_handling": {
"strategy": "fallback"
},
"fallbacks": ["claude-sonnet-4-5-20250929", "gemini-3-pro"]
}
}
)3. Error Strategy
Return error immediately (default):
python
response = client.chat.completions.create(
model="gpt-5.2",
messages=[{"role": "user", "content": "Hello"}],
extra_body={
"gateflow": {
"rate_limit_handling": {
"strategy": "error"
}
}
}
)Budget Limits
Cost-Based Limits
Set spending limits:
bash
curl -X POST https://api.gateflow.ai/v1/management/budgets \
-H "Authorization: Bearer gw_prod_admin_key" \
-H "Content-Type: application/json" \
-d '{
"api_key_id": "key_abc123",
"limits": {
"daily_usd": 100,
"monthly_usd": 2000
},
"alerts": {
"thresholds": [0.5, 0.8, 0.95],
"webhook_url": "https://your-app.com/budget-alert"
},
"on_limit": "block" # or "alert_only"
}'Budget Alerts
json
{
"type": "budget_alert",
"api_key_id": "key_abc123",
"threshold": 0.8,
"current_spend_usd": 1600,
"limit_usd": 2000,
"period": "monthly",
"timestamp": "2026-02-16T10:30:00Z"
}Monitoring Rate Limits
Real-Time Dashboard
View rate limit usage in the GateFlow dashboard:
- Current usage vs limits
- Rate limit events over time
- Provider-specific breakdowns
API Metrics
bash
curl https://api.gateflow.ai/v1/management/analytics/rate-limits \
-H "Authorization: Bearer gw_prod_..." \
-G -d "period=24h"Response:
json
{
"period": "24h",
"rate_limit_events": 45,
"requests_queued": 30,
"requests_failed": 5,
"fallbacks_triggered": 10,
"by_provider": {
"openai": {"events": 30, "queued": 20, "failed": 3},
"anthropic": {"events": 15, "queued": 10, "failed": 2}
},
"peak_usage": {
"timestamp": "2026-02-16T14:30:00Z",
"rpm_percent": 95
}
}Best Practices
- Set conservative limits - Start lower and increase based on actual usage
- Use fallbacks - Don't rely on a single provider
- Monitor proactively - Set up alerts before hitting limits
- Implement client backoff - Even with GateFlow handling, client backoff helps
- Use request queuing - For batch operations, enable queuing
Next Steps
- Request Queuing - Queue configuration
- Retry Logic - Retry configuration
- Cost Analytics - Monitor spending