Rate Limit Errors

Errors when request or token limits are exceeded.

Error Codes

Code	HTTP Status	Description
`rate_limit_exceeded`	429	Request rate limit exceeded
`token_limit_exceeded`	429	Token rate limit exceeded
`concurrent_limit_exceeded`	429	Too many concurrent requests
`daily_limit_exceeded`	429	Daily request limit exceeded
`monthly_limit_exceeded`	429	Monthly request limit exceeded

Error Format

json

{
  "error": {
    "type": "rate_limit_error",
    "code": "rate_limit_exceeded",
    "message": "Rate limit exceeded. Please retry after 5 seconds.",
    "retry_after": 5,
    "limit": 1000,
    "current": 1005,
    "reset_at": "2026-02-16T10:05:00Z"
  }
}

Response Headers

Rate limit errors include these headers:

Header	Description
`Retry-After`	Seconds to wait before retrying
`X-RateLimit-Limit-Requests`	Request limit per minute
`X-RateLimit-Remaining-Requests`	Requests remaining
`X-RateLimit-Reset-Requests`	Seconds until limit resets
`X-RateLimit-Limit-Tokens`	Token limit per minute
`X-RateLimit-Remaining-Tokens`	Tokens remaining
`X-RateLimit-Reset-Tokens`	Seconds until token limit resets

Error Details

rate_limit_exceeded

Requests per minute (RPM) limit exceeded.

json

{
  "error": {
    "type": "rate_limit_error",
    "code": "rate_limit_exceeded",
    "message": "You have exceeded your requests per minute limit.",
    "retry_after": 12,
    "limit": 1000,
    "current": 1000,
    "reset_at": "2026-02-16T10:05:00Z",
    "limit_type": "rpm"
  }
}

Resolution:

Wait for the retry_after period
Implement exponential backoff
Consider upgrading your plan for higher limits

token_limit_exceeded

Tokens per minute (TPM) limit exceeded.

json

{
  "error": {
    "type": "rate_limit_error",
    "code": "token_limit_exceeded",
    "message": "You have exceeded your tokens per minute limit.",
    "retry_after": 45,
    "limit": 100000,
    "current": 100500,
    "reset_at": "2026-02-16T10:05:00Z",
    "limit_type": "tpm"
  }
}

Resolution:

Reduce prompt/response sizes
Batch smaller requests
Wait for the reset period

concurrent_limit_exceeded

Too many simultaneous requests.

json

{
  "error": {
    "type": "rate_limit_error",
    "code": "concurrent_limit_exceeded",
    "message": "Too many concurrent requests. Maximum is 50.",
    "limit": 50,
    "current": 52
  }
}

Resolution:

Implement a request queue
Wait for in-flight requests to complete
Increase concurrent limit in your plan

daily_limit_exceeded

Daily request quota exhausted.

json

{
  "error": {
    "type": "rate_limit_error",
    "code": "daily_limit_exceeded",
    "message": "Daily request limit exceeded.",
    "limit": 10000,
    "current": 10000,
    "reset_at": "2026-02-17T00:00:00Z"
  }
}

monthly_limit_exceeded

Monthly quota exhausted.

json

{
  "error": {
    "type": "rate_limit_error",
    "code": "monthly_limit_exceeded",
    "message": "Monthly request limit exceeded.",
    "limit": 100000,
    "current": 100000,
    "reset_at": "2026-03-01T00:00:00Z"
  }
}

Handling Rate Limits

Python with Retry

python

import openai
import time
from tenacity import retry, stop_after_attempt, wait_exponential

client = openai.OpenAI(
    base_url="https://api.gateflow.ai/v1",
    api_key="gw_prod_..."
)

@retry(
    stop=stop_after_attempt(5),
    wait=wait_exponential(multiplier=1, min=1, max=60)
)
def make_request(messages):
    try:
        return client.chat.completions.create(
            model="gpt-5.2",
            messages=messages
        )
    except openai.RateLimitError as e:
        retry_after = int(e.response.headers.get("Retry-After", 5))
        print(f"Rate limited, waiting {retry_after}s...")
        time.sleep(retry_after)
        raise  # Re-raise to trigger retry

Python with Manual Backoff

python

def make_request_with_backoff(messages, max_retries=5):
    for attempt in range(max_retries):
        try:
            return client.chat.completions.create(
                model="gpt-5.2",
                messages=messages
            )
        except openai.RateLimitError as e:
            if attempt == max_retries - 1:
                raise

            retry_after = int(e.response.headers.get("Retry-After", 5))
            # Add jitter
            wait_time = retry_after + random.uniform(0, 2)
            print(f"Rate limited, waiting {wait_time:.1f}s...")
            time.sleep(wait_time)

JavaScript/TypeScript

typescript

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://api.gateflow.ai/v1',
  apiKey: 'gw_prod_...',
});

async function makeRequestWithRetry(
  messages: OpenAI.ChatCompletionMessageParam[],
  maxRetries = 5
): Promise<OpenAI.ChatCompletion> {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      return await client.chat.completions.create({
        model: 'gpt-5.2',
        messages,
      });
    } catch (error) {
      if (error instanceof OpenAI.RateLimitError) {
        if (attempt === maxRetries - 1) throw error;

        const retryAfter = parseInt(
          error.headers?.get('retry-after') || '5'
        );
        console.log(`Rate limited, waiting ${retryAfter}s...`);
        await new Promise(r => setTimeout(r, retryAfter * 1000));
      } else {
        throw error;
      }
    }
  }
  throw new Error('Max retries exceeded');
}

Using GateFlow Queue

Let GateFlow handle rate limits automatically:

python

response = client.chat.completions.create(
    model="gpt-5.2",
    messages=[{"role": "user", "content": "Hello"}],
    extra_body={
        "gateflow": {
            "rate_limit_handling": {
                "strategy": "queue",
                "max_wait_ms": 30000
            }
        }
    }
)

Using Fallbacks

Switch to alternative models when rate limited:

python

response = client.chat.completions.create(
    model="gpt-5.2",
    messages=[{"role": "user", "content": "Hello"}],
    extra_body={
        "gateflow": {
            "rate_limit_handling": {
                "strategy": "fallback"
            },
            "fallbacks": ["claude-sonnet-4-5-20250929", "gemini-3-pro"]
        }
    }
)

Monitoring Rate Limits

Check Current Usage

bash

curl https://api.gateflow.ai/v1/management/rate-limits \
  -H "Authorization: Bearer gw_prod_..."

Response

json

{
  "api_key": {
    "rpm": {"limit": 1000, "remaining": 850, "reset_at": "2026-02-16T10:05:00Z"},
    "tpm": {"limit": 100000, "remaining": 75000, "reset_at": "2026-02-16T10:05:00Z"},
    "daily": {"limit": 10000, "remaining": 8500, "reset_at": "2026-02-17T00:00:00Z"}
  },
  "providers": {
    "openai": {
      "rpm": {"limit": 5000, "remaining": 4200},
      "tpm": {"limit": 600000, "remaining": 450000}
    }
  }
}

Best Practices

Implement backoff - Always use exponential backoff with jitter
Respect Retry-After - Honor the header value
Monitor proactively - Set up alerts before hitting limits
Use request queuing - Enable GateFlow's queue feature
Plan capacity - Right-size your limits for expected traffic

Rate Limit Errors ​

Error Codes ​

Error Format ​

Response Headers ​

Error Details ​

rate_limit_exceeded ​

token_limit_exceeded ​

concurrent_limit_exceeded ​

daily_limit_exceeded ​

monthly_limit_exceeded ​

Handling Rate Limits ​

Python with Retry ​

Python with Manual Backoff ​

JavaScript/TypeScript ​

Using GateFlow Queue ​

Using Fallbacks ​

Monitoring Rate Limits ​

Check Current Usage ​

Response ​

Best Practices ​

See Also ​

Rate Limit Errors

Error Codes

Error Format

Response Headers

Error Details

rate_limit_exceeded

token_limit_exceeded

concurrent_limit_exceeded

daily_limit_exceeded

monthly_limit_exceeded

Handling Rate Limits

Python with Retry

Python with Manual Backoff

JavaScript/TypeScript

Using GateFlow Queue

Using Fallbacks

Monitoring Rate Limits

Check Current Usage

Response

Best Practices

See Also