Skip to content

Intelligent Routing

GateFlow's intelligent routing engine automatically selects the optimal AI model based on task type, cost, latency, quality, or sustainability requirements. The router combines task classification with a comprehensive model capability matrix to make optimal decisions.

How Routing Works

When you send a request to GateFlow:

  1. Task Classification - Analyzes your prompt to determine task type (coding, reasoning, simple Q&A, vision, etc.)
  2. Candidate Selection - Retrieves models capable of handling that task type
  3. Optimization - Ranks candidates based on your routing mode
  4. Selection - Returns the optimal model with alternatives

Routing Modes

GateFlow supports five routing modes, set via the routing_mode parameter or X-Routing-Mode header:

ModeDescription
balancedDefault. Best quality-to-price ratio. Recommended for most use cases.
cost_optimizedSelects cheapest model that meets quality threshold.
performanceSelects highest quality model regardless of cost.
low_latencyOptimizes for fastest response time.
sustain_optimizedSelects lowest-carbon model that meets quality threshold.

Balanced Mode (Default)

Finds the sweet spot between cost and quality - recommended for most users:

python
response = client.chat.completions.create(
    model="auto",  # Let GateFlow select
    messages=[{"role": "user", "content": "Hello"}],
    extra_body={"routing_mode": "balanced"}
)

Cost-Optimized Mode

Selects the cheapest model that can handle your task (quality score >= 7):

python
response = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "Summarize this text..."}],
    extra_body={"routing_mode": "cost_optimized"}
)

Performance Mode

Selects the highest quality model regardless of cost:

python
response = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "Complex reasoning task..."}],
    extra_body={"routing_mode": "performance"}
)

Sustain Mode

Selects the lowest-carbon model that meets quality requirements:

python
response = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "General question..."}],
    extra_body={"routing_mode": "sustain_optimized"}
)

Direct Mode

Bypass intelligent routing and specify exact model:

python
response = client.chat.completions.create(
    model="gpt-4o",  # Specify exact model
    messages=[{"role": "user", "content": "Hello"}]
)

Task Classification

The router automatically classifies your prompt into task types:

Task TypeDescriptionOptimal Models
simple_qaQuick factual questionsGPT-4o-mini, Claude Haiku, Gemini Flash
codingCode generation, debuggingDevstral-2, Claude Sonnet, GPT-4o
complex_reasoningMulti-step logic, analysisClaude Opus, GPT-4o, o1
visionImage understandingGPT-4o, Claude Sonnet, Gemini Pro
creativeCreative writing, storytellingClaude Opus, GPT-4o
summarizationText summarizationGPT-4o-mini, Gemini Flash
embeddingsText to vectorstext-embedding-3-small, mistral-embed

Override automatic classification with the X-GateFlow-Task-Type header:

bash
curl https://api.gateflow.ai/v1/chat/completions \
  -H "Authorization: Bearer gw_prod_..." \
  -H "X-GateFlow-Task-Type: coding" \
  -H "Content-Type: application/json" \
  -d '{"model": "auto", "messages": [...]}'

Routing Rules

Create custom routing rules in the dashboard or via API.

Rule Structure

json
{
  "name": "Route coding questions to Codestral",
  "conditions": {
    "any": [
      {"field": "message_content", "contains": "code"},
      {"field": "message_content", "contains": "function"},
      {"field": "message_content", "contains": "programming"}
    ]
  },
  "action": {
    "route_to": "codestral-latest",
    "fallback": ["gpt-4o", "claude-3-5-sonnet"]
  },
  "priority": 100
}

Condition Types

ConditionDescriptionExample
containsMessage contains text{"field": "message_content", "contains": "code"}
matchesRegex match{"field": "message_content", "matches": "\\bSQL\\b"}
starts_withMessage starts with{"field": "system_prompt", "starts_with": "You are a"}
header_equalsCustom header value{"field": "header.x-task-type", "equals": "summary"}
token_count_gtToken count threshold{"field": "token_count", "gt": 1000}

Creating Rules via API

bash
curl -X POST https://api.gateflow.ai/v1/management/routing-rules \
  -H "Authorization: Bearer gw_prod_admin_key" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Long context to Claude",
    "conditions": {
      "all": [
        {"field": "token_count", "gt": 50000}
      ]
    },
    "action": {
      "route_to": "claude-3-5-sonnet",
      "reason": "Claude has 200k context window"
    },
    "priority": 90
  }'

Rule Priority

Rules are evaluated in priority order (highest first):

Priority 100: Code tasks → Codestral
Priority 90: Long context → Claude
Priority 80: Cost-sensitive → GPT-4o-mini
Priority 0: Default → GPT-4o

Model Aliases

Create friendly names for models or model groups:

bash
curl -X POST https://api.gateflow.ai/v1/management/model-aliases \
  -H "Authorization: Bearer gw_prod_admin_key" \
  -H "Content-Type: application/json" \
  -d '{
    "alias": "fast",
    "target": "gpt-4o-mini",
    "fallbacks": ["claude-3-haiku", "gemini-1.5-flash"]
  }'

Use in requests:

python
response = client.chat.completions.create(
    model="fast",  # Resolves to gpt-4o-mini with fallbacks
    messages=[...]
)

Dynamic Aliases

Aliases can point to different models based on conditions:

json
{
  "alias": "smart",
  "rules": [
    {
      "conditions": {"field": "token_count", "gt": 100000},
      "target": "claude-3-5-sonnet"
    },
    {
      "conditions": {"field": "task_type", "equals": "code"},
      "target": "codestral-latest"
    }
  ],
  "default": "gpt-4o"
}

A/B Testing

Test different models with traffic splitting:

bash
curl -X POST https://api.gateflow.ai/v1/management/experiments \
  -H "Authorization: Bearer gw_prod_admin_key" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "GPT-4o vs Claude comparison",
    "model_alias": "primary",
    "variants": [
      {"model": "gpt-4o", "weight": 50},
      {"model": "claude-3-5-sonnet", "weight": 50}
    ],
    "metrics": ["latency", "cost", "user_feedback"]
  }'

View results:

bash
curl https://api.gateflow.ai/v1/management/experiments/exp_123/results \
  -H "Authorization: Bearer gw_prod_admin_key"

Request Headers

Pass routing hints via headers:

bash
curl https://api.gateflow.ai/v1/chat/completions \
  -H "Authorization: Bearer gw_prod_key" \
  -H "X-GateFlow-Task-Type: code_generation" \
  -H "X-GateFlow-Optimization: latency" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [...]
  }'

Available Headers:

HeaderDescription
X-GateFlow-Task-TypeHint for task classification
X-GateFlow-Optimizationcost, latency, or quality
X-GateFlow-PriorityRequest priority (1-10)
X-GateFlow-Cacheskip to bypass cache
X-GateFlow-Trace-IdCustom trace ID for logging

Viewing Routing Decisions

Each response includes routing metadata:

json
{
  "id": "chatcmpl-123",
  "model": "gpt-4o",
  "choices": [...],
  "usage": {...},
  "gateflow": {
    "routing": {
      "requested_model": "auto",
      "selected_model": "gpt-4o",
      "reason": "task_classification:general",
      "fallbacks_tried": [],
      "latency_ms": 234
    }
  }
}

Monitoring Routing

Dashboard analytics show:

  • Model distribution over time
  • Fallback frequency
  • Routing rule hit rates
  • A/B test results

Next Steps

Built with reliability in mind.