Appearance
Intelligent Routing
GateFlow's intelligent routing engine automatically selects the optimal AI model based on task type, cost, latency, quality, or sustainability requirements. The router combines task classification with a comprehensive model capability matrix to make optimal decisions.
How Routing Works
When you send a request to GateFlow:
- Task Classification - Analyzes your prompt to determine task type (coding, reasoning, simple Q&A, vision, etc.)
- Candidate Selection - Retrieves models capable of handling that task type
- Optimization - Ranks candidates based on your routing mode
- Selection - Returns the optimal model with alternatives
Routing Modes
GateFlow supports five routing modes, set via the routing_mode parameter or X-Routing-Mode header:
| Mode | Description |
|---|---|
balanced | Default. Best quality-to-price ratio. Recommended for most use cases. |
cost_optimized | Selects cheapest model that meets quality threshold. |
performance | Selects highest quality model regardless of cost. |
low_latency | Optimizes for fastest response time. |
sustain_optimized | Selects lowest-carbon model that meets quality threshold. |
Balanced Mode (Default)
Finds the sweet spot between cost and quality - recommended for most users:
python
response = client.chat.completions.create(
model="auto", # Let GateFlow select
messages=[{"role": "user", "content": "Hello"}],
extra_body={"routing_mode": "balanced"}
)Cost-Optimized Mode
Selects the cheapest model that can handle your task (quality score >= 7):
python
response = client.chat.completions.create(
model="auto",
messages=[{"role": "user", "content": "Summarize this text..."}],
extra_body={"routing_mode": "cost_optimized"}
)Performance Mode
Selects the highest quality model regardless of cost:
python
response = client.chat.completions.create(
model="auto",
messages=[{"role": "user", "content": "Complex reasoning task..."}],
extra_body={"routing_mode": "performance"}
)Sustain Mode
Selects the lowest-carbon model that meets quality requirements:
python
response = client.chat.completions.create(
model="auto",
messages=[{"role": "user", "content": "General question..."}],
extra_body={"routing_mode": "sustain_optimized"}
)Direct Mode
Bypass intelligent routing and specify exact model:
python
response = client.chat.completions.create(
model="gpt-4o", # Specify exact model
messages=[{"role": "user", "content": "Hello"}]
)Task Classification
The router automatically classifies your prompt into task types:
| Task Type | Description | Optimal Models |
|---|---|---|
simple_qa | Quick factual questions | GPT-4o-mini, Claude Haiku, Gemini Flash |
coding | Code generation, debugging | Devstral-2, Claude Sonnet, GPT-4o |
complex_reasoning | Multi-step logic, analysis | Claude Opus, GPT-4o, o1 |
vision | Image understanding | GPT-4o, Claude Sonnet, Gemini Pro |
creative | Creative writing, storytelling | Claude Opus, GPT-4o |
summarization | Text summarization | GPT-4o-mini, Gemini Flash |
embeddings | Text to vectors | text-embedding-3-small, mistral-embed |
Override automatic classification with the X-GateFlow-Task-Type header:
bash
curl https://api.gateflow.ai/v1/chat/completions \
-H "Authorization: Bearer gw_prod_..." \
-H "X-GateFlow-Task-Type: coding" \
-H "Content-Type: application/json" \
-d '{"model": "auto", "messages": [...]}'Routing Rules
Create custom routing rules in the dashboard or via API.
Rule Structure
json
{
"name": "Route coding questions to Codestral",
"conditions": {
"any": [
{"field": "message_content", "contains": "code"},
{"field": "message_content", "contains": "function"},
{"field": "message_content", "contains": "programming"}
]
},
"action": {
"route_to": "codestral-latest",
"fallback": ["gpt-4o", "claude-3-5-sonnet"]
},
"priority": 100
}Condition Types
| Condition | Description | Example |
|---|---|---|
contains | Message contains text | {"field": "message_content", "contains": "code"} |
matches | Regex match | {"field": "message_content", "matches": "\\bSQL\\b"} |
starts_with | Message starts with | {"field": "system_prompt", "starts_with": "You are a"} |
header_equals | Custom header value | {"field": "header.x-task-type", "equals": "summary"} |
token_count_gt | Token count threshold | {"field": "token_count", "gt": 1000} |
Creating Rules via API
bash
curl -X POST https://api.gateflow.ai/v1/management/routing-rules \
-H "Authorization: Bearer gw_prod_admin_key" \
-H "Content-Type: application/json" \
-d '{
"name": "Long context to Claude",
"conditions": {
"all": [
{"field": "token_count", "gt": 50000}
]
},
"action": {
"route_to": "claude-3-5-sonnet",
"reason": "Claude has 200k context window"
},
"priority": 90
}'Rule Priority
Rules are evaluated in priority order (highest first):
Priority 100: Code tasks → Codestral
Priority 90: Long context → Claude
Priority 80: Cost-sensitive → GPT-4o-mini
Priority 0: Default → GPT-4oModel Aliases
Create friendly names for models or model groups:
bash
curl -X POST https://api.gateflow.ai/v1/management/model-aliases \
-H "Authorization: Bearer gw_prod_admin_key" \
-H "Content-Type: application/json" \
-d '{
"alias": "fast",
"target": "gpt-4o-mini",
"fallbacks": ["claude-3-haiku", "gemini-1.5-flash"]
}'Use in requests:
python
response = client.chat.completions.create(
model="fast", # Resolves to gpt-4o-mini with fallbacks
messages=[...]
)Dynamic Aliases
Aliases can point to different models based on conditions:
json
{
"alias": "smart",
"rules": [
{
"conditions": {"field": "token_count", "gt": 100000},
"target": "claude-3-5-sonnet"
},
{
"conditions": {"field": "task_type", "equals": "code"},
"target": "codestral-latest"
}
],
"default": "gpt-4o"
}A/B Testing
Test different models with traffic splitting:
bash
curl -X POST https://api.gateflow.ai/v1/management/experiments \
-H "Authorization: Bearer gw_prod_admin_key" \
-H "Content-Type: application/json" \
-d '{
"name": "GPT-4o vs Claude comparison",
"model_alias": "primary",
"variants": [
{"model": "gpt-4o", "weight": 50},
{"model": "claude-3-5-sonnet", "weight": 50}
],
"metrics": ["latency", "cost", "user_feedback"]
}'View results:
bash
curl https://api.gateflow.ai/v1/management/experiments/exp_123/results \
-H "Authorization: Bearer gw_prod_admin_key"Request Headers
Pass routing hints via headers:
bash
curl https://api.gateflow.ai/v1/chat/completions \
-H "Authorization: Bearer gw_prod_key" \
-H "X-GateFlow-Task-Type: code_generation" \
-H "X-GateFlow-Optimization: latency" \
-H "Content-Type: application/json" \
-d '{
"model": "auto",
"messages": [...]
}'Available Headers:
| Header | Description |
|---|---|
X-GateFlow-Task-Type | Hint for task classification |
X-GateFlow-Optimization | cost, latency, or quality |
X-GateFlow-Priority | Request priority (1-10) |
X-GateFlow-Cache | skip to bypass cache |
X-GateFlow-Trace-Id | Custom trace ID for logging |
Viewing Routing Decisions
Each response includes routing metadata:
json
{
"id": "chatcmpl-123",
"model": "gpt-4o",
"choices": [...],
"usage": {...},
"gateflow": {
"routing": {
"requested_model": "auto",
"selected_model": "gpt-4o",
"reason": "task_classification:general",
"fallbacks_tried": [],
"latency_ms": 234
}
}
}Monitoring Routing
Dashboard analytics show:
- Model distribution over time
- Fallback frequency
- Routing rule hit rates
- A/B test results
Next Steps
- Model Fallbacks - Configure fallback chains
- Cost vs Performance - Optimization strategies