Skip to content

Rate Limit Errors

Errors when request or token limits are exceeded.

Error Codes

CodeHTTP StatusDescription
rate_limit_exceeded429Request rate limit exceeded
token_limit_exceeded429Token rate limit exceeded
concurrent_limit_exceeded429Too many concurrent requests
daily_limit_exceeded429Daily request limit exceeded
monthly_limit_exceeded429Monthly request limit exceeded

Error Format

json
{
  "error": {
    "type": "rate_limit_error",
    "code": "rate_limit_exceeded",
    "message": "Rate limit exceeded. Please retry after 5 seconds.",
    "retry_after": 5,
    "limit": 1000,
    "current": 1005,
    "reset_at": "2026-02-16T10:05:00Z"
  }
}

Response Headers

Rate limit errors include these headers:

HeaderDescription
Retry-AfterSeconds to wait before retrying
X-RateLimit-Limit-RequestsRequest limit per minute
X-RateLimit-Remaining-RequestsRequests remaining
X-RateLimit-Reset-RequestsSeconds until limit resets
X-RateLimit-Limit-TokensToken limit per minute
X-RateLimit-Remaining-TokensTokens remaining
X-RateLimit-Reset-TokensSeconds until token limit resets

Error Details

rate_limit_exceeded

Requests per minute (RPM) limit exceeded.

json
{
  "error": {
    "type": "rate_limit_error",
    "code": "rate_limit_exceeded",
    "message": "You have exceeded your requests per minute limit.",
    "retry_after": 12,
    "limit": 1000,
    "current": 1000,
    "reset_at": "2026-02-16T10:05:00Z",
    "limit_type": "rpm"
  }
}

Resolution:

  1. Wait for the retry_after period
  2. Implement exponential backoff
  3. Consider upgrading your plan for higher limits

token_limit_exceeded

Tokens per minute (TPM) limit exceeded.

json
{
  "error": {
    "type": "rate_limit_error",
    "code": "token_limit_exceeded",
    "message": "You have exceeded your tokens per minute limit.",
    "retry_after": 45,
    "limit": 100000,
    "current": 100500,
    "reset_at": "2026-02-16T10:05:00Z",
    "limit_type": "tpm"
  }
}

Resolution:

  1. Reduce prompt/response sizes
  2. Batch smaller requests
  3. Wait for the reset period

concurrent_limit_exceeded

Too many simultaneous requests.

json
{
  "error": {
    "type": "rate_limit_error",
    "code": "concurrent_limit_exceeded",
    "message": "Too many concurrent requests. Maximum is 50.",
    "limit": 50,
    "current": 52
  }
}

Resolution:

  1. Implement a request queue
  2. Wait for in-flight requests to complete
  3. Increase concurrent limit in your plan

daily_limit_exceeded

Daily request quota exhausted.

json
{
  "error": {
    "type": "rate_limit_error",
    "code": "daily_limit_exceeded",
    "message": "Daily request limit exceeded.",
    "limit": 10000,
    "current": 10000,
    "reset_at": "2026-02-17T00:00:00Z"
  }
}

monthly_limit_exceeded

Monthly quota exhausted.

json
{
  "error": {
    "type": "rate_limit_error",
    "code": "monthly_limit_exceeded",
    "message": "Monthly request limit exceeded.",
    "limit": 100000,
    "current": 100000,
    "reset_at": "2026-03-01T00:00:00Z"
  }
}

Handling Rate Limits

Python with Retry

python
import openai
import time
from tenacity import retry, stop_after_attempt, wait_exponential

client = openai.OpenAI(
    base_url="https://api.gateflow.ai/v1",
    api_key="gw_prod_..."
)

@retry(
    stop=stop_after_attempt(5),
    wait=wait_exponential(multiplier=1, min=1, max=60)
)
def make_request(messages):
    try:
        return client.chat.completions.create(
            model="gpt-5.2",
            messages=messages
        )
    except openai.RateLimitError as e:
        retry_after = int(e.response.headers.get("Retry-After", 5))
        print(f"Rate limited, waiting {retry_after}s...")
        time.sleep(retry_after)
        raise  # Re-raise to trigger retry

Python with Manual Backoff

python
def make_request_with_backoff(messages, max_retries=5):
    for attempt in range(max_retries):
        try:
            return client.chat.completions.create(
                model="gpt-5.2",
                messages=messages
            )
        except openai.RateLimitError as e:
            if attempt == max_retries - 1:
                raise

            retry_after = int(e.response.headers.get("Retry-After", 5))
            # Add jitter
            wait_time = retry_after + random.uniform(0, 2)
            print(f"Rate limited, waiting {wait_time:.1f}s...")
            time.sleep(wait_time)

JavaScript/TypeScript

typescript
import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://api.gateflow.ai/v1',
  apiKey: 'gw_prod_...',
});

async function makeRequestWithRetry(
  messages: OpenAI.ChatCompletionMessageParam[],
  maxRetries = 5
): Promise<OpenAI.ChatCompletion> {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      return await client.chat.completions.create({
        model: 'gpt-5.2',
        messages,
      });
    } catch (error) {
      if (error instanceof OpenAI.RateLimitError) {
        if (attempt === maxRetries - 1) throw error;

        const retryAfter = parseInt(
          error.headers?.get('retry-after') || '5'
        );
        console.log(`Rate limited, waiting ${retryAfter}s...`);
        await new Promise(r => setTimeout(r, retryAfter * 1000));
      } else {
        throw error;
      }
    }
  }
  throw new Error('Max retries exceeded');
}

Using GateFlow Queue

Let GateFlow handle rate limits automatically:

python
response = client.chat.completions.create(
    model="gpt-5.2",
    messages=[{"role": "user", "content": "Hello"}],
    extra_body={
        "gateflow": {
            "rate_limit_handling": {
                "strategy": "queue",
                "max_wait_ms": 30000
            }
        }
    }
)

Using Fallbacks

Switch to alternative models when rate limited:

python
response = client.chat.completions.create(
    model="gpt-5.2",
    messages=[{"role": "user", "content": "Hello"}],
    extra_body={
        "gateflow": {
            "rate_limit_handling": {
                "strategy": "fallback"
            },
            "fallbacks": ["claude-sonnet-4-5-20250929", "gemini-3-pro"]
        }
    }
)

Monitoring Rate Limits

Check Current Usage

bash
curl https://api.gateflow.ai/v1/management/rate-limits \
  -H "Authorization: Bearer gw_prod_..."

Response

json
{
  "api_key": {
    "rpm": {"limit": 1000, "remaining": 850, "reset_at": "2026-02-16T10:05:00Z"},
    "tpm": {"limit": 100000, "remaining": 75000, "reset_at": "2026-02-16T10:05:00Z"},
    "daily": {"limit": 10000, "remaining": 8500, "reset_at": "2026-02-17T00:00:00Z"}
  },
  "providers": {
    "openai": {
      "rpm": {"limit": 5000, "remaining": 4200},
      "tpm": {"limit": 600000, "remaining": 450000}
    }
  }
}

Best Practices

  1. Implement backoff - Always use exponential backoff with jitter
  2. Respect Retry-After - Honor the header value
  3. Monitor proactively - Set up alerts before hitting limits
  4. Use request queuing - Enable GateFlow's queue feature
  5. Plan capacity - Right-size your limits for expected traffic

See Also

Built with reliability in mind.