Rate Limits

Understanding and managing rate limits across providers.

Overview

GateFlow aggregates rate limits across all your AI providers and provides unified rate limit management:

Provider Rate Limits

Default Limits by Provider

Provider	RPM	TPM	Notes
OpenAI	500-10,000	30K-1M	Varies by tier
Anthropic	1,000-4,000	100K-400K	Varies by tier
Google	360-1,000	120K-1M	Per model
Mistral	500-2,000	500K	Per model
Cohere	10,000	10M	Enterprise
ElevenLabs	100-500	N/A	Characters/month

Checking Your Limits

bash

curl https://api.gateflow.ai/v1/management/rate-limits \
  -H "Authorization: Bearer gw_prod_..."

Response:

json

{
  "providers": {
    "openai": {
      "rpm": {"limit": 5000, "remaining": 4850, "reset_at": "2026-02-16T12:05:00Z"},
      "tpm": {"limit": 600000, "remaining": 580000, "reset_at": "2026-02-16T12:05:00Z"}
    },
    "anthropic": {
      "rpm": {"limit": 2000, "remaining": 1990, "reset_at": "2026-02-16T12:05:00Z"},
      "tpm": {"limit": 200000, "remaining": 195000, "reset_at": "2026-02-16T12:05:00Z"}
    }
  }
}

GateFlow Rate Limits

API Key Limits

Set limits per API key:

bash

curl -X POST https://api.gateflow.ai/v1/management/api-keys \
  -H "Authorization: Bearer gw_prod_admin_key" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "production-key",
    "rate_limits": {
      "rpm": 1000,
      "tpm": 100000,
      "daily_requests": 50000,
      "monthly_cost_usd": 500
    }
  }'

Organization Limits

Set limits at the organization level:

bash

curl -X PATCH https://api.gateflow.ai/v1/management/organization \
  -H "Authorization: Bearer gw_prod_admin_key" \
  -H "Content-Type: application/json" \
  -d '{
    "rate_limits": {
      "rpm": 10000,
      "tpm": 1000000,
      "monthly_budget_usd": 5000
    }
  }'

Rate Limit Headers

Every response includes rate limit information:

Header	Description
`X-RateLimit-Limit-Requests`	Total requests allowed per minute
`X-RateLimit-Remaining-Requests`	Requests remaining this minute
`X-RateLimit-Reset-Requests`	Seconds until limit resets
`X-RateLimit-Limit-Tokens`	Total tokens allowed per minute
`X-RateLimit-Remaining-Tokens`	Tokens remaining this minute
`X-RateLimit-Reset-Tokens`	Seconds until token limit resets

Handling Rate Limits

Client-Side Handling

python

import openai
import time

client = openai.OpenAI(
    base_url="https://api.gateflow.ai/v1",
    api_key="gw_prod_..."
)

def make_request_with_backoff(messages, max_retries=3):
    for attempt in range(max_retries):
        try:
            return client.chat.completions.create(
                model="gpt-5.2",
                messages=messages
            )
        except openai.RateLimitError as e:
            if attempt == max_retries - 1:
                raise
            retry_after = int(e.response.headers.get("Retry-After", 60))
            time.sleep(retry_after)

Automatic Handling with GateFlow

Let GateFlow handle rate limits automatically:

python

response = client.chat.completions.create(
    model="gpt-5.2",
    messages=[{"role": "user", "content": "Hello"}],
    extra_body={
        "gateflow": {
            "rate_limit_handling": {
                "strategy": "queue",  # or "fallback" or "error"
                "max_wait_ms": 30000
            }
        }
    }
)

Rate Limit Strategies

1. Queue Strategy

Requests are queued when rate limited:

python

response = client.chat.completions.create(
    model="gpt-5.2",
    messages=[{"role": "user", "content": "Hello"}],
    extra_body={
        "gateflow": {
            "rate_limit_handling": {
                "strategy": "queue",
                "max_wait_ms": 60000,
                "priority": "high"  # high, normal, low
            }
        }
    }
)

2. Fallback Strategy

Switch to alternative providers:

python

response = client.chat.completions.create(
    model="gpt-5.2",
    messages=[{"role": "user", "content": "Hello"}],
    extra_body={
        "gateflow": {
            "rate_limit_handling": {
                "strategy": "fallback"
            },
            "fallbacks": ["claude-sonnet-4-5-20250929", "gemini-3-pro"]
        }
    }
)

3. Error Strategy

Return error immediately (default):

python

response = client.chat.completions.create(
    model="gpt-5.2",
    messages=[{"role": "user", "content": "Hello"}],
    extra_body={
        "gateflow": {
            "rate_limit_handling": {
                "strategy": "error"
            }
        }
    }
)

Budget Limits

Cost-Based Limits

Set spending limits:

bash

curl -X POST https://api.gateflow.ai/v1/management/budgets \
  -H "Authorization: Bearer gw_prod_admin_key" \
  -H "Content-Type: application/json" \
  -d '{
    "api_key_id": "key_abc123",
    "limits": {
      "daily_usd": 100,
      "monthly_usd": 2000
    },
    "alerts": {
      "thresholds": [0.5, 0.8, 0.95],
      "webhook_url": "https://your-app.com/budget-alert"
    },
    "on_limit": "block"  # or "alert_only"
  }'

Budget Alerts

json

{
  "type": "budget_alert",
  "api_key_id": "key_abc123",
  "threshold": 0.8,
  "current_spend_usd": 1600,
  "limit_usd": 2000,
  "period": "monthly",
  "timestamp": "2026-02-16T10:30:00Z"
}

Monitoring Rate Limits

Real-Time Dashboard

View rate limit usage in the GateFlow dashboard:

Current usage vs limits
Rate limit events over time
Provider-specific breakdowns

API Metrics

bash

curl https://api.gateflow.ai/v1/management/analytics/rate-limits \
  -H "Authorization: Bearer gw_prod_..." \
  -G -d "period=24h"

Response:

json

{
  "period": "24h",
  "rate_limit_events": 45,
  "requests_queued": 30,
  "requests_failed": 5,
  "fallbacks_triggered": 10,
  "by_provider": {
    "openai": {"events": 30, "queued": 20, "failed": 3},
    "anthropic": {"events": 15, "queued": 10, "failed": 2}
  },
  "peak_usage": {
    "timestamp": "2026-02-16T14:30:00Z",
    "rpm_percent": 95
  }
}

Best Practices

Set conservative limits - Start lower and increase based on actual usage
Use fallbacks - Don't rely on a single provider
Monitor proactively - Set up alerts before hitting limits
Implement client backoff - Even with GateFlow handling, client backoff helps
Use request queuing - For batch operations, enable queuing

Next Steps

Request Queuing - Queue configuration
Retry Logic - Retry configuration
Cost Analytics - Monitor spending

Rate Limits ​

Overview ​

Provider Rate Limits ​

Default Limits by Provider ​

Checking Your Limits ​

GateFlow Rate Limits ​

API Key Limits ​

Organization Limits ​

Rate Limit Headers ​

Handling Rate Limits ​

Client-Side Handling ​

Automatic Handling with GateFlow ​

Rate Limit Strategies ​

1. Queue Strategy ​

2. Fallback Strategy ​

3. Error Strategy ​

Budget Limits ​

Cost-Based Limits ​

Budget Alerts ​

Monitoring Rate Limits ​

Real-Time Dashboard ​

API Metrics ​

Best Practices ​

Next Steps ​

Rate Limits

Overview

Provider Rate Limits

Default Limits by Provider

Checking Your Limits

GateFlow Rate Limits

API Key Limits

Organization Limits

Rate Limit Headers

Handling Rate Limits

Client-Side Handling

Automatic Handling with GateFlow

Rate Limit Strategies

1. Queue Strategy

2. Fallback Strategy

3. Error Strategy

Budget Limits

Cost-Based Limits

Budget Alerts

Monitoring Rate Limits

Real-Time Dashboard

API Metrics

Best Practices

Next Steps