Appearance
Routing Feedback Loop
The defining feature of GateFlow Eval: evaluation scores automatically influence routing decisions. When a model's quality drifts, traffic shifts automatically.
How It Works
┌─────────────────────────────────────────────────────────────┐
│ Closed-Loop System │
│ │
│ Production Eval Score Routing │
│ Traffic ───► Sampling ─► Engine ────► Weights │
│ ▲ │ │
│ │ │ │
│ └──────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘- Production traffic flows through GateFlow
- Samples are continuously evaluated
- Scores update model quality ratings
- Routing weights adjust based on quality
- Better models receive more traffic
Configuration
Basic Setup
python
from gateflow import EvalClient
client = EvalClient(api_key="gf-...")
# Enable eval-driven routing
client.configure_routing_feedback(
enabled=True,
suites=["quality-general", "safety-core"],
models=["gpt-4o", "claude-opus-4-5", "gemini-2.5-pro"]
)Advanced Configuration
python
client.configure_routing_feedback(
enabled=True,
# Which evals influence routing
suites=["quality-general", "safety-core"],
suite_weights={
"quality-general": 0.6,
"safety-core": 0.4
},
# Models in the routing pool
models=["gpt-4o", "claude-opus-4-5", "gemini-2.5-pro"],
# Routing behavior
routing={
"min_samples": 100, # Require 100 samples before adjusting
"window": "24h", # Rolling window for scores
"smoothing": "exponential", # Smooth score changes
"alpha": 0.3 # Smoothing factor
},
# Constraints
constraints={
"min_traffic_share": 0.1, # No model below 10%
"max_traffic_share": 0.7, # No model above 70%
"safety_floor": 95, # Block if safety < 95%
"quality_floor": 70 # Minimum quality score
}
)Routing Algorithms
Score-Proportional Routing
Traffic distributed proportionally to scores:
python
routing = {
"algorithm": "proportional",
"config": {
"score_power": 2 # Exaggerate differences (score^2)
}
}
# If scores are gpt-4o=90, claude=95, gemini=85:
# gpt-4o: 90^2 / (90^2 + 95^2 + 85^2) = 33%
# claude: 95^2 / ... = 37%
# gemini: 85^2 / ... = 30%Threshold-Based Routing
Strict cutoffs for quality:
python
routing = {
"algorithm": "threshold",
"config": {
"tiers": [
{"min_score": 90, "traffic_share": 0.7},
{"min_score": 80, "traffic_share": 0.25},
{"min_score": 70, "traffic_share": 0.05},
{"min_score": 0, "traffic_share": 0.0} # Blocked
]
}
}Epsilon-Greedy Exploration
Balance exploitation with exploration:
python
routing = {
"algorithm": "epsilon_greedy",
"config": {
"epsilon": 0.1, # 10% random exploration
"exploitation": "best_score" # Route 90% to highest scorer
}
}Thompson Sampling
Bayesian approach for uncertainty-aware routing:
python
routing = {
"algorithm": "thompson_sampling",
"config": {
"prior": "beta(1, 1)", # Uninformative prior
"min_samples": 30 # Per model before converging
}
}Viewing Routing State
Dashboard
Navigate to Eval → Routing to see:
- Current traffic distribution
- Score history per model
- Routing decision log
API
python
# Get current routing state
state = client.get_routing_state()
for model in state.models:
print(f"{model.name}:")
print(f" Score: {model.current_score}")
print(f" Traffic: {model.traffic_share * 100:.1f}%")
print(f" Samples: {model.sample_count}")
print(f" Trend: {model.score_trend}")
# Example output:
# gpt-4o:
# Score: 91.2
# Traffic: 35.0%
# Samples: 1247
# Trend: stable
# claude-opus-4-5:
# Score: 94.5
# Traffic: 45.0%
# Samples: 1582
# Trend: improving
# gemini-2.5-pro:
# Score: 87.3
# Traffic: 20.0%
# Samples: 703
# Trend: decliningRouting Events
Event Types
routing.weights_updated- Traffic shares changedrouting.model_blocked- Model fell below thresholdrouting.model_restored- Model recovered above thresholdrouting.manual_override- Human intervention
Subscribing to Events
python
# Webhook configuration
client.configure_webhooks(
events=["routing.weights_updated", "routing.model_blocked"],
url="https://your-app.com/webhooks/gateflow"
)
# Or poll for events
events = client.get_routing_events(
time_range="24h",
types=["routing.weights_updated"]
)
for event in events:
print(f"{event.timestamp}: {event.type}")
print(f" Old weights: {event.old_weights}")
print(f" New weights: {event.new_weights}")
print(f" Reason: {event.reason}")Manual Overrides
Sometimes you need to intervene:
python
# Temporarily boost a model
client.set_routing_override(
model="gpt-4o",
traffic_share=0.5,
duration="2h",
reason="Testing new deployment"
)
# Block a model entirely
client.set_routing_override(
model="gemini-2.5-pro",
traffic_share=0.0,
reason="Known issue being investigated"
)
# Remove all overrides
client.clear_routing_overrides()Safety Rails
Hard Blocks
Models scoring below safety threshold are blocked:
python
constraints = {
"safety_floor": 95, # Block if safety-core < 95%
"block_action": "remove_from_pool",
"alert": True
}Soft Degradation
Gradually reduce traffic as scores drop:
python
constraints = {
"soft_degradation": {
"start_at": 90, # Start reducing at 90%
"min_traffic_at": 70 # Minimum traffic at 70%
"curve": "linear"
}
}A/B Testing Integration
Run controlled experiments alongside eval-driven routing:
python
# Create an A/B test
experiment = client.create_experiment(
name="new-model-test",
variants={
"control": {"model": "gpt-4o", "traffic": 0.5},
"treatment": {"model": "gpt-4o-new", "traffic": 0.5}
},
eval_suites=["quality-general"],
duration="7d",
success_metric="quality_score"
)
# After experiment, winner enters routing pool
results = client.get_experiment_results(experiment.id)
if results.significant and results.winner == "treatment":
client.add_to_routing_pool("gpt-4o-new")Next Steps
- Drift Detection - Automatic alerting
- Traffic Sampling - Configure sample rates
- Compliance Reports - Document routing decisions