Skip to content

Model Fallbacks

Fallbacks ensure your application stays running when a provider has issues. When the primary model fails, GateFlow automatically tries alternatives.

How Fallbacks Work

Configuring Fallbacks

Via Dashboard

  1. Go to Settings → Routing → Fallbacks
  2. Select a primary model
  3. Add fallback models in priority order
  4. Click Save

Via API

bash
curl -X POST https://api.gateflow.ai/v1/management/fallback-chains \
  -H "Authorization: Bearer gw_prod_admin_key" \
  -H "Content-Type: application/json" \
  -d '{
    "primary": "gpt-5.2",
    "fallbacks": [
      "claude-sonnet-4-5-20250929",
      "gemini-3-pro"
    ]
  }'

Per-Request Fallbacks

Override fallbacks for specific requests:

python
response = client.chat.completions.create(
    model="gpt-5.2",
    messages=[...],
    extra_body={
        "gateflow": {
            "fallbacks": ["claude-sonnet-4-5-20250929", "gemini-3-pro"]
        }
    }
)

Fallback Triggers

Fallbacks activate when:

TriggerDescriptionExample
Provider Error5xx from providerOpenAI returns 500
TimeoutRequest exceeds timeoutNo response in 60s
Rate LimitProvider rate limit hit429 from Anthropic
Model UnavailableModel temporarily downScheduled maintenance
Circuit OpenToo many recent failuresProvider marked unhealthy

Fallbacks do not activate for:

  • Authentication errors (invalid API key)
  • Bad request errors (malformed input)
  • Content policy violations
  • Cost limit exceeded (your limit, not provider's)

Fallback Behavior

Response Metadata

When a fallback is used, the response indicates this:

json
{
  "model": "claude-sonnet-4-5-20250929",
  "choices": [...],
  "gateflow": {
    "routing": {
      "requested_model": "gpt-5.2",
      "selected_model": "claude-sonnet-4-5-20250929",
      "fallback_used": true,
      "fallback_reason": "provider_error",
      "attempts": [
        {"model": "gpt-5.2", "status": "failed", "error": "503"}
      ]
    }
  }
}

Streaming Fallbacks

For streaming requests, GateFlow buffers briefly before starting the stream:

If the primary starts streaming successfully, it continues even if it later fails mid-stream.

General Chat

json
{
  "primary": "gpt-5.2",
  "fallbacks": ["claude-sonnet-4-5-20250929", "gemini-3-pro"]
}

Code Generation

json
{
  "primary": "devstral-2",
  "fallbacks": ["gpt-5.2-codex", "claude-sonnet-4-5-20250929"]
}

Cost-Sensitive

json
{
  "primary": "gpt-5-mini",
  "fallbacks": ["claude-haiku-4-5-20251015", "gemini-2.5-flash"]
}

High Quality

json
{
  "primary": "claude-opus-4-5-20251107",
  "fallbacks": ["gpt-5.2", "gemini-3-pro"]
}

Reasoning Tasks

json
{
  "primary": "o3",
  "fallbacks": ["o4-mini", "claude-opus-4-5-20251107"]
}

Embeddings

json
{
  "primary": "text-embedding-3-large",
  "fallbacks": ["text-embedding-004", "embed-english-v3.0"]
}

Cross-Provider Considerations

When falling back across providers, be aware of:

Context Window Differences

ModelContext Window
GPT-5.2256k
Claude Opus 4.5200k
Gemini 3 Pro2M
Mistral Large 3128k

If your request has 150k tokens and falls back from Gemini 3 Pro to Mistral Large 3, it will fail.

Feature Differences

FeatureGPT-5.2Claude 4.5Gemini 3
Function callingYesYesYes
VisionYesYesYes
JSON modeYesYesYes
System promptsYesYesYes

Output Consistency

Different models may produce different outputs for the same input. For applications requiring consistency:

python
response = client.chat.completions.create(
    model="gpt-5.2",
    messages=[...],
    extra_body={
        "gateflow": {
            "fallback_mode": "fail"  # Don't use fallbacks
        }
    }
)

Or use fallbacks within the same provider:

json
{
  "primary": "gpt-5.2",
  "fallbacks": ["gpt-5.1", "gpt-5"]
}

Monitoring Fallbacks

Dashboard Metrics

  • Fallback rate (% of requests using fallbacks)
  • Fallback reasons breakdown
  • Model distribution when fallbacks occur

Alerts

Configure alerts for high fallback rates:

bash
curl -X POST https://api.gateflow.ai/v1/management/alerts \
  -H "Authorization: Bearer gw_prod_admin_key" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "High Fallback Rate",
    "condition": {
      "metric": "fallback_rate",
      "operator": "gt",
      "threshold": 0.1,
      "window_minutes": 15
    },
    "notify": {
      "channels": ["slack", "email"]
    }
  }'

Disabling Fallbacks

Globally

bash
curl -X PATCH https://api.gateflow.ai/v1/management/settings \
  -H "Authorization: Bearer gw_prod_admin_key" \
  -H "Content-Type: application/json" \
  -d '{
    "fallbacks_enabled": false
  }'

Per Request

python
response = client.chat.completions.create(
    model="gpt-5.2",
    messages=[...],
    extra_body={
        "gateflow": {
            "fallback_mode": "fail"
        }
    }
)

Next Steps

Built with reliability in mind.