Intelligent Routing

GateFlow's intelligent routing engine automatically selects the optimal AI model based on task type, cost, latency, quality, or sustainability requirements. The router combines task classification with a comprehensive model capability matrix to make optimal decisions.

How Routing Works

When you send a request to GateFlow:

Task Classification - Analyzes your prompt to determine task type (coding, reasoning, simple Q&A, vision, etc.)
Candidate Selection - Retrieves models capable of handling that task type
Optimization - Ranks candidates based on your routing mode
Selection - Returns the optimal model with alternatives

Routing Modes

GateFlow supports five routing modes, set via the routing_mode parameter or X-Routing-Mode header:

Mode	Description
`balanced`	Default. Best quality-to-price ratio. Recommended for most use cases.
`cost_optimized`	Selects cheapest model that meets quality threshold.
`performance`	Selects highest quality model regardless of cost.
`low_latency`	Optimizes for fastest response time.
`sustain_optimized`	Selects lowest-carbon model that meets quality threshold.

Balanced Mode (Default)

Finds the sweet spot between cost and quality - recommended for most users:

python

response = client.chat.completions.create(
    model="auto",  # Let GateFlow select
    messages=[{"role": "user", "content": "Hello"}],
    extra_body={"routing_mode": "balanced"}
)

Cost-Optimized Mode

Selects the cheapest model that can handle your task (quality score >= 7):

python

response = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "Summarize this text..."}],
    extra_body={"routing_mode": "cost_optimized"}
)

Performance Mode

Selects the highest quality model regardless of cost:

python

response = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "Complex reasoning task..."}],
    extra_body={"routing_mode": "performance"}
)

Sustain Mode

Selects the lowest-carbon model that meets quality requirements:

python

response = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "General question..."}],
    extra_body={"routing_mode": "sustain_optimized"}
)

Direct Mode

Bypass intelligent routing and specify exact model:

python

response = client.chat.completions.create(
    model="gpt-4o",  # Specify exact model
    messages=[{"role": "user", "content": "Hello"}]
)

Task Classification

The router automatically classifies your prompt into task types:

Task Type	Description	Optimal Models
`simple_qa`	Quick factual questions	GPT-4o-mini, Claude Haiku, Gemini Flash
`coding`	Code generation, debugging	Devstral-2, Claude Sonnet, GPT-4o
`complex_reasoning`	Multi-step logic, analysis	Claude Opus, GPT-4o, o1
`vision`	Image understanding	GPT-4o, Claude Sonnet, Gemini Pro
`creative`	Creative writing, storytelling	Claude Opus, GPT-4o
`summarization`	Text summarization	GPT-4o-mini, Gemini Flash
`embeddings`	Text to vectors	text-embedding-3-small, mistral-embed

Override automatic classification with the X-GateFlow-Task-Type header:

bash

curl https://api.gateflow.ai/v1/chat/completions \
  -H "Authorization: Bearer gw_prod_..." \
  -H "X-GateFlow-Task-Type: coding" \
  -H "Content-Type: application/json" \
  -d '{"model": "auto", "messages": [...]}'

Routing Rules

Create custom routing rules in the dashboard or via API.

Rule Structure

json

{
  "name": "Route coding questions to Codestral",
  "conditions": {
    "any": [
      {"field": "message_content", "contains": "code"},
      {"field": "message_content", "contains": "function"},
      {"field": "message_content", "contains": "programming"}
    ]
  },
  "action": {
    "route_to": "codestral-latest",
    "fallback": ["gpt-4o", "claude-3-5-sonnet"]
  },
  "priority": 100
}

Condition Types

Condition	Description	Example
`contains`	Message contains text	`{"field": "message_content", "contains": "code"}`
`matches`	Regex match	`{"field": "message_content", "matches": "\\bSQL\\b"}`
`starts_with`	Message starts with	`{"field": "system_prompt", "starts_with": "You are a"}`
`header_equals`	Custom header value	`{"field": "header.x-task-type", "equals": "summary"}`
`token_count_gt`	Token count threshold	`{"field": "token_count", "gt": 1000}`

Creating Rules via API

bash

curl -X POST https://api.gateflow.ai/v1/management/routing-rules \
  -H "Authorization: Bearer gw_prod_admin_key" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Long context to Claude",
    "conditions": {
      "all": [
        {"field": "token_count", "gt": 50000}
      ]
    },
    "action": {
      "route_to": "claude-3-5-sonnet",
      "reason": "Claude has 200k context window"
    },
    "priority": 90
  }'

Rule Priority

Rules are evaluated in priority order (highest first):

Priority 100: Code tasks → Codestral
Priority 90: Long context → Claude
Priority 80: Cost-sensitive → GPT-4o-mini
Priority 0: Default → GPT-4o

Model Aliases

Create friendly names for models or model groups:

bash

curl -X POST https://api.gateflow.ai/v1/management/model-aliases \
  -H "Authorization: Bearer gw_prod_admin_key" \
  -H "Content-Type: application/json" \
  -d '{
    "alias": "fast",
    "target": "gpt-4o-mini",
    "fallbacks": ["claude-3-haiku", "gemini-1.5-flash"]
  }'

Use in requests:

python

response = client.chat.completions.create(
    model="fast",  # Resolves to gpt-4o-mini with fallbacks
    messages=[...]
)

Dynamic Aliases

Aliases can point to different models based on conditions:

json

{
  "alias": "smart",
  "rules": [
    {
      "conditions": {"field": "token_count", "gt": 100000},
      "target": "claude-3-5-sonnet"
    },
    {
      "conditions": {"field": "task_type", "equals": "code"},
      "target": "codestral-latest"
    }
  ],
  "default": "gpt-4o"
}

A/B Testing

Test different models with traffic splitting:

bash

curl -X POST https://api.gateflow.ai/v1/management/experiments \
  -H "Authorization: Bearer gw_prod_admin_key" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "GPT-4o vs Claude comparison",
    "model_alias": "primary",
    "variants": [
      {"model": "gpt-4o", "weight": 50},
      {"model": "claude-3-5-sonnet", "weight": 50}
    ],
    "metrics": ["latency", "cost", "user_feedback"]
  }'

View results:

bash

curl https://api.gateflow.ai/v1/management/experiments/exp_123/results \
  -H "Authorization: Bearer gw_prod_admin_key"

Request Headers

Pass routing hints via headers:

bash

curl https://api.gateflow.ai/v1/chat/completions \
  -H "Authorization: Bearer gw_prod_key" \
  -H "X-GateFlow-Task-Type: code_generation" \
  -H "X-GateFlow-Optimization: latency" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [...]
  }'

Available Headers:

Header	Description
`X-GateFlow-Task-Type`	Hint for task classification
`X-GateFlow-Optimization`	`cost`, `latency`, or `quality`
`X-GateFlow-Priority`	Request priority (1-10)
`X-GateFlow-Cache`	`skip` to bypass cache
`X-GateFlow-Trace-Id`	Custom trace ID for logging

Viewing Routing Decisions

Each response includes routing metadata:

json

{
  "id": "chatcmpl-123",
  "model": "gpt-4o",
  "choices": [...],
  "usage": {...},
  "gateflow": {
    "routing": {
      "requested_model": "auto",
      "selected_model": "gpt-4o",
      "reason": "task_classification:general",
      "fallbacks_tried": [],
      "latency_ms": 234
    }
  }
}

Monitoring Routing

Dashboard analytics show:

Model distribution over time
Fallback frequency
Routing rule hit rates
A/B test results

Next Steps

Model Fallbacks - Configure fallback chains
Cost vs Performance - Optimization strategies

Intelligent Routing ​

How Routing Works ​

Routing Modes ​

Balanced Mode (Default) ​

Cost-Optimized Mode ​

Performance Mode ​

Sustain Mode ​

Direct Mode ​

Task Classification ​

Routing Rules ​

Rule Structure ​

Condition Types ​

Creating Rules via API ​

Rule Priority ​

Model Aliases ​

Dynamic Aliases ​

A/B Testing ​

Request Headers ​

Viewing Routing Decisions ​

Monitoring Routing ​

Next Steps ​

Intelligent Routing

How Routing Works

Routing Modes

Balanced Mode (Default)

Cost-Optimized Mode

Performance Mode

Sustain Mode

Direct Mode

Task Classification

Routing Rules

Rule Structure

Condition Types

Creating Rules via API

Rule Priority

Model Aliases

Dynamic Aliases

A/B Testing

Request Headers

Viewing Routing Decisions

Monitoring Routing

Next Steps