Model Fallbacks

Fallbacks ensure your application stays running when a provider has issues. When the primary model fails, GateFlow automatically tries alternatives.

How Fallbacks Work

Configuring Fallbacks

Via Dashboard

Go to Settings → Routing → Fallbacks
Select a primary model
Add fallback models in priority order
Click Save

Via API

bash

curl -X POST https://api.gateflow.ai/v1/management/fallback-chains \
  -H "Authorization: Bearer gw_prod_admin_key" \
  -H "Content-Type: application/json" \
  -d '{
    "primary": "gpt-5.2",
    "fallbacks": [
      "claude-sonnet-4-5-20250929",
      "gemini-3-pro"
    ]
  }'

Per-Request Fallbacks

Override fallbacks for specific requests:

python

response = client.chat.completions.create(
    model="gpt-5.2",
    messages=[...],
    extra_body={
        "gateflow": {
            "fallbacks": ["claude-sonnet-4-5-20250929", "gemini-3-pro"]
        }
    }
)

Fallback Triggers

Fallbacks activate when:

Trigger	Description	Example
Provider Error	5xx from provider	OpenAI returns 500
Timeout	Request exceeds timeout	No response in 60s
Rate Limit	Provider rate limit hit	429 from Anthropic
Model Unavailable	Model temporarily down	Scheduled maintenance
Circuit Open	Too many recent failures	Provider marked unhealthy

Fallbacks do not activate for:

Authentication errors (invalid API key)
Bad request errors (malformed input)
Content policy violations
Cost limit exceeded (your limit, not provider's)

Fallback Behavior

Response Metadata

When a fallback is used, the response indicates this:

json

{
  "model": "claude-sonnet-4-5-20250929",
  "choices": [...],
  "gateflow": {
    "routing": {
      "requested_model": "gpt-5.2",
      "selected_model": "claude-sonnet-4-5-20250929",
      "fallback_used": true,
      "fallback_reason": "provider_error",
      "attempts": [
        {"model": "gpt-5.2", "status": "failed", "error": "503"}
      ]
    }
  }
}

Streaming Fallbacks

For streaming requests, GateFlow buffers briefly before starting the stream:

If the primary starts streaming successfully, it continues even if it later fails mid-stream.

Recommended Fallback Chains

General Chat

json

{
  "primary": "gpt-5.2",
  "fallbacks": ["claude-sonnet-4-5-20250929", "gemini-3-pro"]
}

Code Generation

json

{
  "primary": "devstral-2",
  "fallbacks": ["gpt-5.2-codex", "claude-sonnet-4-5-20250929"]
}

Cost-Sensitive

json

{
  "primary": "gpt-5-mini",
  "fallbacks": ["claude-haiku-4-5-20251015", "gemini-2.5-flash"]
}

High Quality

json

{
  "primary": "claude-opus-4-5-20251107",
  "fallbacks": ["gpt-5.2", "gemini-3-pro"]
}

Reasoning Tasks

json

{
  "primary": "o3",
  "fallbacks": ["o4-mini", "claude-opus-4-5-20251107"]
}

Embeddings

json

{
  "primary": "text-embedding-3-large",
  "fallbacks": ["text-embedding-004", "embed-english-v3.0"]
}

Cross-Provider Considerations

When falling back across providers, be aware of:

Context Window Differences

Model	Context Window
GPT-5.2	256k
Claude Opus 4.5	200k
Gemini 3 Pro	2M
Mistral Large 3	128k

If your request has 150k tokens and falls back from Gemini 3 Pro to Mistral Large 3, it will fail.

Feature Differences

Feature	GPT-5.2	Claude 4.5	Gemini 3
Function calling	Yes	Yes	Yes
Vision	Yes	Yes	Yes
JSON mode	Yes	Yes	Yes
System prompts	Yes	Yes	Yes

Output Consistency

Different models may produce different outputs for the same input. For applications requiring consistency:

python

response = client.chat.completions.create(
    model="gpt-5.2",
    messages=[...],
    extra_body={
        "gateflow": {
            "fallback_mode": "fail"  # Don't use fallbacks
        }
    }
)

Or use fallbacks within the same provider:

json

{
  "primary": "gpt-5.2",
  "fallbacks": ["gpt-5.1", "gpt-5"]
}

Monitoring Fallbacks

Dashboard Metrics

Fallback rate (% of requests using fallbacks)
Fallback reasons breakdown
Model distribution when fallbacks occur

Alerts

Configure alerts for high fallback rates:

bash

curl -X POST https://api.gateflow.ai/v1/management/alerts \
  -H "Authorization: Bearer gw_prod_admin_key" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "High Fallback Rate",
    "condition": {
      "metric": "fallback_rate",
      "operator": "gt",
      "threshold": 0.1,
      "window_minutes": 15
    },
    "notify": {
      "channels": ["slack", "email"]
    }
  }'

Disabling Fallbacks

Globally

bash

curl -X PATCH https://api.gateflow.ai/v1/management/settings \
  -H "Authorization: Bearer gw_prod_admin_key" \
  -H "Content-Type: application/json" \
  -d '{
    "fallbacks_enabled": false
  }'

Per Request

python

response = client.chat.completions.create(
    model="gpt-5.2",
    messages=[...],
    extra_body={
        "gateflow": {
            "fallback_mode": "fail"
        }
    }
)

Next Steps

Cost vs Performance - Balance cost and quality
Model Change Management - Handle deprecations

Model Fallbacks ​

How Fallbacks Work ​

Configuring Fallbacks ​

Via Dashboard ​

Via API ​

Per-Request Fallbacks ​

Fallback Triggers ​

Fallback Behavior ​

Response Metadata ​

Streaming Fallbacks ​

Recommended Fallback Chains ​

General Chat ​

Code Generation ​

Cost-Sensitive ​

High Quality ​

Reasoning Tasks ​

Embeddings ​

Cross-Provider Considerations ​

Context Window Differences ​

Feature Differences ​

Output Consistency ​

Monitoring Fallbacks ​

Dashboard Metrics ​

Alerts ​

Disabling Fallbacks ​

Globally ​

Per Request ​

Next Steps ​

Model Fallbacks

How Fallbacks Work

Configuring Fallbacks

Via Dashboard

Via API

Per-Request Fallbacks

Fallback Triggers

Fallback Behavior

Response Metadata

Streaming Fallbacks

Recommended Fallback Chains

General Chat

Code Generation

Cost-Sensitive

High Quality

Reasoning Tasks

Embeddings

Cross-Provider Considerations

Context Window Differences

Feature Differences

Output Consistency

Monitoring Fallbacks

Dashboard Metrics

Alerts

Disabling Fallbacks

Globally

Per Request

Next Steps