Appearance
Request Queuing
Manage request queues for handling traffic spikes and rate limits.
Overview
Request queuing buffers requests when providers are rate limited or under heavy load:
Queue Configuration
Enable Queuing
bash
curl -X POST https://api.gateflow.ai/v1/management/queue-config \
-H "Authorization: Bearer gw_prod_admin_key" \
-H "Content-Type: application/json" \
-d '{
"enabled": true,
"max_queue_size": 1000,
"max_wait_ms": 60000,
"priority_levels": ["critical", "high", "normal", "low"],
"default_priority": "normal"
}'Per-Request Configuration
python
import openai
client = openai.OpenAI(
base_url="https://api.gateflow.ai/v1",
api_key="gw_prod_..."
)
response = client.chat.completions.create(
model="gpt-5.2",
messages=[{"role": "user", "content": "Hello"}],
extra_body={
"gateflow": {
"queue": {
"enabled": True,
"priority": "high",
"max_wait_ms": 30000
}
}
}
)Priority Levels
Priority Queue Order
| Priority | Use Case | Max Wait |
|---|---|---|
critical | Real-time user interactions | 5s |
high | Interactive features | 15s |
normal | Background processing | 60s |
low | Batch jobs | 300s |
Setting Priority
Per Request:
python
response = client.chat.completions.create(
model="gpt-5.2",
messages=[{"role": "user", "content": "Hello"}],
extra_body={
"gateflow": {
"queue": {"priority": "critical"}
}
}
)Per API Key:
bash
curl -X POST https://api.gateflow.ai/v1/management/api-keys \
-H "Authorization: Bearer gw_prod_admin_key" \
-H "Content-Type: application/json" \
-d '{
"name": "realtime-key",
"default_queue_priority": "high"
}'Queue Behavior
When Queue is Full
python
response = client.chat.completions.create(
model="gpt-5.2",
messages=[{"role": "user", "content": "Hello"}],
extra_body={
"gateflow": {
"queue": {
"enabled": True,
"on_full": "reject" # or "drop_lowest"
}
}
}
)| Behavior | Description |
|---|---|
reject | Return 503 immediately |
drop_lowest | Remove lowest priority request |
Timeout Handling
python
response = client.chat.completions.create(
model="gpt-5.2",
messages=[{"role": "user", "content": "Hello"}],
extra_body={
"gateflow": {
"queue": {
"max_wait_ms": 30000,
"on_timeout": "error" # or "fallback"
}
}
}
)Queue Status
Check Queue Status
bash
curl https://api.gateflow.ai/v1/management/queue/status \
-H "Authorization: Bearer gw_prod_..."Response:
json
{
"queue_enabled": true,
"current_size": 45,
"max_size": 1000,
"by_priority": {
"critical": 2,
"high": 8,
"normal": 30,
"low": 5
},
"avg_wait_ms": 2500,
"processing_rate_per_min": 120
}Queue Position in Response
json
{
"id": "chatcmpl-abc123",
"choices": [...],
"usage": {...},
"gateflow": {
"queue": {
"was_queued": true,
"queue_wait_ms": 1500,
"initial_position": 12
}
}
}Fair Queuing
Prevent single consumers from monopolizing the queue:
bash
curl -X POST https://api.gateflow.ai/v1/management/queue-config \
-H "Authorization: Bearer gw_prod_admin_key" \
-H "Content-Type: application/json" \
-d '{
"fair_queuing": {
"enabled": true,
"max_per_key": 50,
"max_per_ip": 100
}
}'Batch Queue
For batch processing with lower priority:
python
# Queue multiple requests
responses = []
for prompt in prompts:
response = client.chat.completions.create(
model="gpt-5.2",
messages=[{"role": "user", "content": prompt}],
extra_body={
"gateflow": {
"queue": {
"priority": "low",
"max_wait_ms": 300000, # 5 minutes
"batch_id": "batch_abc123"
}
}
}
)
responses.append(response)Batch Status
bash
curl https://api.gateflow.ai/v1/management/queue/batch/batch_abc123 \
-H "Authorization: Bearer gw_prod_..."Response:
json
{
"batch_id": "batch_abc123",
"total_requests": 100,
"completed": 45,
"queued": 50,
"failed": 5,
"avg_wait_ms": 8500
}Queue Webhooks
Get notified of queue events:
bash
curl -X POST https://api.gateflow.ai/v1/management/webhooks \
-H "Authorization: Bearer gw_prod_admin_key" \
-H "Content-Type: application/json" \
-d '{
"url": "https://your-app.com/queue-webhook",
"events": ["queue.high_usage", "queue.timeout", "queue.full"]
}'Webhook Payload:
json
{
"event": "queue.high_usage",
"timestamp": "2026-02-16T10:30:00Z",
"data": {
"current_size": 850,
"max_size": 1000,
"utilization": 0.85
}
}Monitoring
Queue Metrics
bash
curl https://api.gateflow.ai/v1/management/analytics/queue \
-H "Authorization: Bearer gw_prod_..." \
-G -d "period=1h"Response:
json
{
"period": "1h",
"requests_queued": 1250,
"requests_processed": 1200,
"requests_timed_out": 30,
"requests_rejected": 20,
"avg_wait_ms": 3500,
"p95_wait_ms": 12000,
"p99_wait_ms": 25000,
"peak_queue_size": 450
}Best Practices
- Set appropriate timeouts - Match max_wait_ms to your use case
- Use priority wisely - Reserve "critical" for truly real-time needs
- Monitor queue depth - Set alerts before queue fills
- Enable fair queuing - Prevent monopolization in multi-tenant setups
- Combine with fallbacks - Queue + fallbacks provides best reliability
Next Steps
- Rate Limits - Understanding rate limits
- Retry Logic - Retry configuration
- Multi-Tenant Setup - Tenant isolation