Appearance
Retry Logic
Configurable retry strategies for handling transient failures from AI providers.
Overview
GateFlow automatically retries failed requests with intelligent backoff:
Default Retry Configuration
json
{
"retry": {
"max_attempts": 3,
"initial_delay_ms": 1000,
"max_delay_ms": 30000,
"backoff_multiplier": 2.0,
"retryable_status_codes": [429, 500, 502, 503, 504]
}
}Retry Strategies
Exponential Backoff (Default)
Doubles delay between each retry:
| Attempt | Delay |
|---|---|
| 1 | 1s |
| 2 | 2s |
| 3 | 4s |
| 4 | 8s |
python
import openai
client = openai.OpenAI(
base_url="https://api.gateflow.ai/v1",
api_key="gw_prod_..."
)
response = client.chat.completions.create(
model="gpt-5.2",
messages=[{"role": "user", "content": "Hello"}],
extra_body={
"gateflow": {
"retry": {
"strategy": "exponential",
"max_attempts": 5,
"initial_delay_ms": 500
}
}
}
)Linear Backoff
Fixed delay between retries:
python
response = client.chat.completions.create(
model="gpt-5.2",
messages=[{"role": "user", "content": "Hello"}],
extra_body={
"gateflow": {
"retry": {
"strategy": "linear",
"max_attempts": 3,
"delay_ms": 2000
}
}
}
)Immediate Retry
No delay, useful for load balancer errors:
python
response = client.chat.completions.create(
model="gpt-5.2",
messages=[{"role": "user", "content": "Hello"}],
extra_body={
"gateflow": {
"retry": {
"strategy": "immediate",
"max_attempts": 2
}
}
}
)Retryable Errors
Automatically Retried
| Status Code | Error Type | Description |
|---|---|---|
| 429 | Rate Limit | Too many requests |
| 500 | Server Error | Internal provider error |
| 502 | Bad Gateway | Upstream connection error |
| 503 | Service Unavailable | Provider temporarily down |
| 504 | Gateway Timeout | Request timed out |
Never Retried
| Status Code | Error Type | Description |
|---|---|---|
| 400 | Bad Request | Invalid request format |
| 401 | Unauthorized | Invalid API key |
| 403 | Forbidden | Insufficient permissions |
| 404 | Not Found | Model not found |
| 422 | Validation Error | Invalid parameters |
Custom Retry Conditions
Configure which errors trigger retries:
bash
curl -X POST https://api.gateflow.ai/v1/management/retry-policies \
-H "Authorization: Bearer gw_prod_admin_key" \
-H "Content-Type: application/json" \
-d '{
"name": "aggressive-retry",
"max_attempts": 5,
"initial_delay_ms": 500,
"backoff_multiplier": 1.5,
"retryable_status_codes": [429, 500, 502, 503, 504],
"retryable_error_types": ["timeout", "connection_error"],
"non_retryable_error_types": ["context_length_exceeded"]
}'Jitter
Add randomization to prevent thundering herd:
python
response = client.chat.completions.create(
model="gpt-5.2",
messages=[{"role": "user", "content": "Hello"}],
extra_body={
"gateflow": {
"retry": {
"strategy": "exponential",
"max_attempts": 3,
"jitter": True, # Adds ±25% randomization
"jitter_factor": 0.25
}
}
}
)Retry Headers
GateFlow returns retry information in response headers:
| Header | Description |
|---|---|
X-GateFlow-Retry-Count | Number of retries attempted |
X-GateFlow-Total-Latency-Ms | Total time including retries |
X-GateFlow-Provider-Attempts | Providers tried (comma-separated) |
Combining with Fallbacks
Retries work with model fallbacks:
python
response = client.chat.completions.create(
model="gpt-5.2",
messages=[{"role": "user", "content": "Hello"}],
extra_body={
"gateflow": {
"fallbacks": ["claude-sonnet-4-5-20250929", "gemini-3-pro"],
"retry": {
"max_attempts": 2, # Per provider
"initial_delay_ms": 500
}
}
}
)Disabling Retries
For latency-sensitive applications:
python
response = client.chat.completions.create(
model="gpt-5.2",
messages=[{"role": "user", "content": "Hello"}],
extra_body={
"gateflow": {
"retry": {
"enabled": False
}
}
}
)Monitoring Retries
View retry metrics in the dashboard or via API:
bash
curl https://api.gateflow.ai/v1/management/analytics/retries \
-H "Authorization: Bearer gw_prod_..." \
-G -d "start_date=2026-02-01" -d "end_date=2026-02-16"Response:
json
{
"period": "2026-02-01 to 2026-02-16",
"total_requests": 50000,
"requests_with_retries": 1250,
"retry_rate": 0.025,
"avg_retries_per_failed": 1.8,
"by_provider": {
"openai": {"retry_rate": 0.02},
"anthropic": {"retry_rate": 0.015},
"google": {"retry_rate": 0.03}
},
"by_error_type": {
"rate_limit": 800,
"timeout": 300,
"server_error": 150
}
}Next Steps
- Rate Limits - Understanding rate limiting
- Model Fallbacks - Configure fallback chains
- Request Queuing - Queue management