Skip to content

Automated Fallbacks

Automated fallbacks protect your application when models deprecate or fail. GateFlow handles the switch automatically—no code changes needed.

How It Works

Enabling Automated Fallbacks

Global Setting

Enable for all deprecated models:

bash
curl -X PATCH https://api.gateflow.ai/v1/management/settings \
  -H "Authorization: Bearer gw_prod_admin_key" \
  -H "Content-Type: application/json" \
  -d '{
    "auto_fallback_on_deprecation": true
  }'

Per-Model Configuration

Configure fallback for specific models:

bash
curl -X POST https://api.gateflow.ai/v1/management/fallback-chains \
  -H "Authorization: Bearer gw_prod_admin_key" \
  -H "Content-Type: application/json" \
  -d '{
    "primary": "gpt-4-turbo",
    "fallbacks": ["gpt-5.2", "claude-sonnet-4-5-20250929"],
    "auto_activate_on_deprecation": true
  }'

Fallback Behavior

When Fallbacks Activate

TriggerBehavior
Model deprecatedRoute to first fallback
Provider error (5xx)Try next in chain
Rate limit exceededTry next provider
TimeoutTry next in chain

Response Metadata

When a fallback is used, the response indicates this:

json
{
  "model": "gpt-5.2",
  "choices": [...],
  "gateflow": {
    "fallback": {
      "used": true,
      "reason": "model_deprecated",
      "original_model": "gpt-4-turbo",
      "fallback_model": "gpt-5.2"
    }
  }
}

Logging

All fallback events are logged:

bash
curl https://api.gateflow.ai/v1/management/logs \
  -H "Authorization: Bearer gw_prod_admin_key" \
  -G -d "event_type=fallback_activated"

Fallback Chains

Creating Chains

bash
curl -X POST https://api.gateflow.ai/v1/management/fallback-chains \
  -H "Authorization: Bearer gw_prod_admin_key" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "GPT family fallback",
    "primary": "gpt-4-turbo",
    "fallbacks": [
      {
        "model": "gpt-5.2",
        "priority": 1
      },
      {
        "model": "gpt-5",
        "priority": 2
      },
      {
        "model": "claude-sonnet-4-5-20250929",
        "priority": 3,
        "note": "Cross-provider fallback"
      }
    ]
  }'

Chain Priority

Fallbacks are tried in priority order:

Conditional Fallbacks

Route to different fallbacks based on conditions:

json
{
  "primary": "gpt-4-turbo",
  "fallbacks": [
    {
      "model": "claude-opus-4-5-20251107",
      "condition": {"context_length_gt": 100000}
    },
    {
      "model": "gpt-5.2",
      "condition": {"always": true}
    }
  ]
}

Deprecation-Specific Fallbacks

Configure what happens when specific models deprecate:

bash
curl -X POST https://api.gateflow.ai/v1/management/deprecation-fallbacks \
  -H "Authorization: Bearer gw_prod_admin_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4-turbo",
    "on_deprecation": {
      "action": "fallback",
      "fallback_to": "gpt-5.2",
      "notify": true
    }
  }'

Available Actions

ActionDescription
fallbackRoute to fallback model
failReturn error (no fallback)
queueQueue request until migration complete

Monitoring Fallbacks

Dashboard Metrics

The dashboard shows:

  • Fallback activation rate
  • Fallback reasons breakdown
  • Model distribution during fallbacks
  • Cost impact of fallbacks

Alerts

Configure alerts for fallback activity:

bash
curl -X POST https://api.gateflow.ai/v1/management/alerts \
  -H "Authorization: Bearer gw_prod_admin_key" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "High fallback rate",
    "condition": {
      "metric": "fallback_rate",
      "operator": "gt",
      "threshold": 0.1,
      "window_minutes": 15
    },
    "notify": {
      "channels": ["slack", "email"]
    }
  }'

Real-Time Status

Check current fallback status:

bash
curl https://api.gateflow.ai/v1/management/fallback-status \
  -H "Authorization: Bearer gw_prod_admin_key"

Response:

json
{
  "active_fallbacks": [
    {
      "original_model": "gpt-4-turbo",
      "fallback_model": "gpt-5.2",
      "reason": "model_deprecated",
      "activated_at": "2026-04-01T00:00:00Z",
      "requests_routed": 15420
    }
  ],
  "fallback_rate_1h": 0.12,
  "fallback_rate_24h": 0.08
}

Testing Fallbacks

Simulate Deprecation

Test how your system handles fallbacks:

bash
curl -X POST https://api.gateflow.ai/v1/management/fallback-chains/test \
  -H "Authorization: Bearer gw_prod_admin_key" \
  -H "Content-Type: application/json" \
  -d '{
    "primary": "gpt-4-turbo",
    "simulate": "deprecation",
    "test_requests": 10
  }'

Dry Run

See what would happen without actually routing:

bash
curl -X POST https://api.gateflow.ai/v1/chat/completions \
  -H "Authorization: Bearer gw_prod_admin_key" \
  -H "X-GateFlow-Dry-Run: true" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4-turbo",
    "messages": [...]
  }'

Response:

json
{
  "dry_run": true,
  "would_route_to": "gpt-5.2",
  "reason": "model_deprecated",
  "fallback_chain": ["gpt-4-turbo", "gpt-5.2", "claude-sonnet-4-5-20250929"]
}

Best Practices

1. Always Configure Fallbacks

Every model should have at least one fallback:

json
{
  "fallback_chains": [
    {"primary": "gpt-5.2", "fallbacks": ["claude-sonnet-4-5-20250929"]},
    {"primary": "claude-sonnet-4-5-20250929", "fallbacks": ["gpt-5.2"]},
    {"primary": "gpt-5-mini", "fallbacks": ["claude-haiku-4-5-20251015", "gemini-2.5-flash"]}
  ]
}

2. Test Cross-Provider Fallbacks

Ensure fallbacks to different providers work:

python
# Test that Claude fallback produces acceptable output
response = client.chat.completions.create(
    model="gpt-5.2",
    messages=[...],
    extra_body={
        "gateflow": {
            "force_fallback": "claude-sonnet-4-5-20250929"
        }
    }
)

3. Monitor Fallback Costs

Different models have different costs. Track cost changes:

bash
curl https://api.gateflow.ai/v1/management/analytics/cost-by-fallback \
  -H "Authorization: Bearer gw_prod_admin_key"

4. Document Fallback Behavior

Let your team know which fallbacks are configured and why.

Next Steps

Built with reliability in mind.