Skip to content

Rate Limits

Understanding and managing rate limits across providers.

Overview

GateFlow aggregates rate limits across all your AI providers and provides unified rate limit management:

Provider Rate Limits

Default Limits by Provider

ProviderRPMTPMNotes
OpenAI500-10,00030K-1MVaries by tier
Anthropic1,000-4,000100K-400KVaries by tier
Google360-1,000120K-1MPer model
Mistral500-2,000500KPer model
Cohere10,00010MEnterprise
ElevenLabs100-500N/ACharacters/month

Checking Your Limits

bash
curl https://api.gateflow.ai/v1/management/rate-limits \
  -H "Authorization: Bearer gw_prod_..."

Response:

json
{
  "providers": {
    "openai": {
      "rpm": {"limit": 5000, "remaining": 4850, "reset_at": "2026-02-16T12:05:00Z"},
      "tpm": {"limit": 600000, "remaining": 580000, "reset_at": "2026-02-16T12:05:00Z"}
    },
    "anthropic": {
      "rpm": {"limit": 2000, "remaining": 1990, "reset_at": "2026-02-16T12:05:00Z"},
      "tpm": {"limit": 200000, "remaining": 195000, "reset_at": "2026-02-16T12:05:00Z"}
    }
  }
}

GateFlow Rate Limits

API Key Limits

Set limits per API key:

bash
curl -X POST https://api.gateflow.ai/v1/management/api-keys \
  -H "Authorization: Bearer gw_prod_admin_key" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "production-key",
    "rate_limits": {
      "rpm": 1000,
      "tpm": 100000,
      "daily_requests": 50000,
      "monthly_cost_usd": 500
    }
  }'

Organization Limits

Set limits at the organization level:

bash
curl -X PATCH https://api.gateflow.ai/v1/management/organization \
  -H "Authorization: Bearer gw_prod_admin_key" \
  -H "Content-Type: application/json" \
  -d '{
    "rate_limits": {
      "rpm": 10000,
      "tpm": 1000000,
      "monthly_budget_usd": 5000
    }
  }'

Rate Limit Headers

Every response includes rate limit information:

HeaderDescription
X-RateLimit-Limit-RequestsTotal requests allowed per minute
X-RateLimit-Remaining-RequestsRequests remaining this minute
X-RateLimit-Reset-RequestsSeconds until limit resets
X-RateLimit-Limit-TokensTotal tokens allowed per minute
X-RateLimit-Remaining-TokensTokens remaining this minute
X-RateLimit-Reset-TokensSeconds until token limit resets

Handling Rate Limits

Client-Side Handling

python
import openai
import time

client = openai.OpenAI(
    base_url="https://api.gateflow.ai/v1",
    api_key="gw_prod_..."
)

def make_request_with_backoff(messages, max_retries=3):
    for attempt in range(max_retries):
        try:
            return client.chat.completions.create(
                model="gpt-5.2",
                messages=messages
            )
        except openai.RateLimitError as e:
            if attempt == max_retries - 1:
                raise
            retry_after = int(e.response.headers.get("Retry-After", 60))
            time.sleep(retry_after)

Automatic Handling with GateFlow

Let GateFlow handle rate limits automatically:

python
response = client.chat.completions.create(
    model="gpt-5.2",
    messages=[{"role": "user", "content": "Hello"}],
    extra_body={
        "gateflow": {
            "rate_limit_handling": {
                "strategy": "queue",  # or "fallback" or "error"
                "max_wait_ms": 30000
            }
        }
    }
)

Rate Limit Strategies

1. Queue Strategy

Requests are queued when rate limited:

python
response = client.chat.completions.create(
    model="gpt-5.2",
    messages=[{"role": "user", "content": "Hello"}],
    extra_body={
        "gateflow": {
            "rate_limit_handling": {
                "strategy": "queue",
                "max_wait_ms": 60000,
                "priority": "high"  # high, normal, low
            }
        }
    }
)

2. Fallback Strategy

Switch to alternative providers:

python
response = client.chat.completions.create(
    model="gpt-5.2",
    messages=[{"role": "user", "content": "Hello"}],
    extra_body={
        "gateflow": {
            "rate_limit_handling": {
                "strategy": "fallback"
            },
            "fallbacks": ["claude-sonnet-4-5-20250929", "gemini-3-pro"]
        }
    }
)

3. Error Strategy

Return error immediately (default):

python
response = client.chat.completions.create(
    model="gpt-5.2",
    messages=[{"role": "user", "content": "Hello"}],
    extra_body={
        "gateflow": {
            "rate_limit_handling": {
                "strategy": "error"
            }
        }
    }
)

Budget Limits

Cost-Based Limits

Set spending limits:

bash
curl -X POST https://api.gateflow.ai/v1/management/budgets \
  -H "Authorization: Bearer gw_prod_admin_key" \
  -H "Content-Type: application/json" \
  -d '{
    "api_key_id": "key_abc123",
    "limits": {
      "daily_usd": 100,
      "monthly_usd": 2000
    },
    "alerts": {
      "thresholds": [0.5, 0.8, 0.95],
      "webhook_url": "https://your-app.com/budget-alert"
    },
    "on_limit": "block"  # or "alert_only"
  }'

Budget Alerts

json
{
  "type": "budget_alert",
  "api_key_id": "key_abc123",
  "threshold": 0.8,
  "current_spend_usd": 1600,
  "limit_usd": 2000,
  "period": "monthly",
  "timestamp": "2026-02-16T10:30:00Z"
}

Monitoring Rate Limits

Real-Time Dashboard

View rate limit usage in the GateFlow dashboard:

  • Current usage vs limits
  • Rate limit events over time
  • Provider-specific breakdowns

API Metrics

bash
curl https://api.gateflow.ai/v1/management/analytics/rate-limits \
  -H "Authorization: Bearer gw_prod_..." \
  -G -d "period=24h"

Response:

json
{
  "period": "24h",
  "rate_limit_events": 45,
  "requests_queued": 30,
  "requests_failed": 5,
  "fallbacks_triggered": 10,
  "by_provider": {
    "openai": {"events": 30, "queued": 20, "failed": 3},
    "anthropic": {"events": 15, "queued": 10, "failed": 2}
  },
  "peak_usage": {
    "timestamp": "2026-02-16T14:30:00Z",
    "rpm_percent": 95
  }
}

Best Practices

  1. Set conservative limits - Start lower and increase based on actual usage
  2. Use fallbacks - Don't rely on a single provider
  3. Monitor proactively - Set up alerts before hitting limits
  4. Implement client backoff - Even with GateFlow handling, client backoff helps
  5. Use request queuing - For batch operations, enable queuing

Next Steps

Built with reliability in mind.