Request Queuing

Manage request queues for handling traffic spikes and rate limits.

Overview

Request queuing buffers requests when providers are rate limited or under heavy load:

Queue Configuration

Enable Queuing

bash

curl -X POST https://api.gateflow.ai/v1/management/queue-config \
  -H "Authorization: Bearer gw_prod_admin_key" \
  -H "Content-Type: application/json" \
  -d '{
    "enabled": true,
    "max_queue_size": 1000,
    "max_wait_ms": 60000,
    "priority_levels": ["critical", "high", "normal", "low"],
    "default_priority": "normal"
  }'

Per-Request Configuration

python

import openai

client = openai.OpenAI(
    base_url="https://api.gateflow.ai/v1",
    api_key="gw_prod_..."
)

response = client.chat.completions.create(
    model="gpt-5.2",
    messages=[{"role": "user", "content": "Hello"}],
    extra_body={
        "gateflow": {
            "queue": {
                "enabled": True,
                "priority": "high",
                "max_wait_ms": 30000
            }
        }
    }
)

Priority Levels

Priority Queue Order

Priority	Use Case	Max Wait
`critical`	Real-time user interactions	5s
`high`	Interactive features	15s
`normal`	Background processing	60s
`low`	Batch jobs	300s

Setting Priority

Per Request:

python

response = client.chat.completions.create(
    model="gpt-5.2",
    messages=[{"role": "user", "content": "Hello"}],
    extra_body={
        "gateflow": {
            "queue": {"priority": "critical"}
        }
    }
)

Per API Key:

bash

curl -X POST https://api.gateflow.ai/v1/management/api-keys \
  -H "Authorization: Bearer gw_prod_admin_key" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "realtime-key",
    "default_queue_priority": "high"
  }'

Queue Behavior

When Queue is Full

python

response = client.chat.completions.create(
    model="gpt-5.2",
    messages=[{"role": "user", "content": "Hello"}],
    extra_body={
        "gateflow": {
            "queue": {
                "enabled": True,
                "on_full": "reject"  # or "drop_lowest"
            }
        }
    }
)

Behavior	Description
`reject`	Return 503 immediately
`drop_lowest`	Remove lowest priority request

Timeout Handling

python

response = client.chat.completions.create(
    model="gpt-5.2",
    messages=[{"role": "user", "content": "Hello"}],
    extra_body={
        "gateflow": {
            "queue": {
                "max_wait_ms": 30000,
                "on_timeout": "error"  # or "fallback"
            }
        }
    }
)

Queue Status

Check Queue Status

bash

curl https://api.gateflow.ai/v1/management/queue/status \
  -H "Authorization: Bearer gw_prod_..."

Response:

json

{
  "queue_enabled": true,
  "current_size": 45,
  "max_size": 1000,
  "by_priority": {
    "critical": 2,
    "high": 8,
    "normal": 30,
    "low": 5
  },
  "avg_wait_ms": 2500,
  "processing_rate_per_min": 120
}

Queue Position in Response

json

{
  "id": "chatcmpl-abc123",
  "choices": [...],
  "usage": {...},
  "gateflow": {
    "queue": {
      "was_queued": true,
      "queue_wait_ms": 1500,
      "initial_position": 12
    }
  }
}

Fair Queuing

Prevent single consumers from monopolizing the queue:

bash

curl -X POST https://api.gateflow.ai/v1/management/queue-config \
  -H "Authorization: Bearer gw_prod_admin_key" \
  -H "Content-Type: application/json" \
  -d '{
    "fair_queuing": {
      "enabled": true,
      "max_per_key": 50,
      "max_per_ip": 100
    }
  }'

Batch Queue

For batch processing with lower priority:

python

# Queue multiple requests
responses = []
for prompt in prompts:
    response = client.chat.completions.create(
        model="gpt-5.2",
        messages=[{"role": "user", "content": prompt}],
        extra_body={
            "gateflow": {
                "queue": {
                    "priority": "low",
                    "max_wait_ms": 300000,  # 5 minutes
                    "batch_id": "batch_abc123"
                }
            }
        }
    )
    responses.append(response)

Batch Status

bash

curl https://api.gateflow.ai/v1/management/queue/batch/batch_abc123 \
  -H "Authorization: Bearer gw_prod_..."

Response:

json

{
  "batch_id": "batch_abc123",
  "total_requests": 100,
  "completed": 45,
  "queued": 50,
  "failed": 5,
  "avg_wait_ms": 8500
}

Queue Webhooks

Get notified of queue events:

bash

curl -X POST https://api.gateflow.ai/v1/management/webhooks \
  -H "Authorization: Bearer gw_prod_admin_key" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://your-app.com/queue-webhook",
    "events": ["queue.high_usage", "queue.timeout", "queue.full"]
  }'

Webhook Payload:

json

{
  "event": "queue.high_usage",
  "timestamp": "2026-02-16T10:30:00Z",
  "data": {
    "current_size": 850,
    "max_size": 1000,
    "utilization": 0.85
  }
}

Monitoring

Queue Metrics

bash

curl https://api.gateflow.ai/v1/management/analytics/queue \
  -H "Authorization: Bearer gw_prod_..." \
  -G -d "period=1h"

Response:

json

{
  "period": "1h",
  "requests_queued": 1250,
  "requests_processed": 1200,
  "requests_timed_out": 30,
  "requests_rejected": 20,
  "avg_wait_ms": 3500,
  "p95_wait_ms": 12000,
  "p99_wait_ms": 25000,
  "peak_queue_size": 450
}

Best Practices

Set appropriate timeouts - Match max_wait_ms to your use case
Use priority wisely - Reserve "critical" for truly real-time needs
Monitor queue depth - Set alerts before queue fills
Enable fair queuing - Prevent monopolization in multi-tenant setups
Combine with fallbacks - Queue + fallbacks provides best reliability

Next Steps

Rate Limits - Understanding rate limits
Retry Logic - Retry configuration
Multi-Tenant Setup - Tenant isolation

Request Queuing ​

Overview ​

Queue Configuration ​

Enable Queuing ​

Per-Request Configuration ​

Priority Levels ​

Priority Queue Order ​

Setting Priority ​

Queue Behavior ​

When Queue is Full ​

Timeout Handling ​

Queue Status ​

Check Queue Status ​

Queue Position in Response ​

Fair Queuing ​

Batch Queue ​

Batch Status ​

Queue Webhooks ​

Monitoring ​

Queue Metrics ​

Best Practices ​

Next Steps ​

Request Queuing

Overview

Queue Configuration

Enable Queuing

Per-Request Configuration

Priority Levels

Priority Queue Order

Setting Priority

Queue Behavior

When Queue is Full

Timeout Handling

Queue Status

Check Queue Status

Queue Position in Response

Fair Queuing

Batch Queue

Batch Status

Queue Webhooks

Monitoring

Queue Metrics

Best Practices

Next Steps