Skip to content

Routing Feedback Loop

The defining feature of GateFlow Eval: evaluation scores automatically influence routing decisions. When a model's quality drifts, traffic shifts automatically.

How It Works

┌─────────────────────────────────────────────────────────────┐
│                    Closed-Loop System                        │
│                                                              │
│  Production    Eval       Score        Routing              │
│  Traffic  ───► Sampling ─► Engine ────► Weights             │
│     ▲                                      │                │
│     │                                      │                │
│     └──────────────────────────────────────┘                │
│                                                              │
└─────────────────────────────────────────────────────────────┘
  1. Production traffic flows through GateFlow
  2. Samples are continuously evaluated
  3. Scores update model quality ratings
  4. Routing weights adjust based on quality
  5. Better models receive more traffic

Configuration

Basic Setup

python
from gateflow import EvalClient

client = EvalClient(api_key="gf-...")

# Enable eval-driven routing
client.configure_routing_feedback(
    enabled=True,
    suites=["quality-general", "safety-core"],
    models=["gpt-4o", "claude-opus-4-5", "gemini-2.5-pro"]
)

Advanced Configuration

python
client.configure_routing_feedback(
    enabled=True,

    # Which evals influence routing
    suites=["quality-general", "safety-core"],
    suite_weights={
        "quality-general": 0.6,
        "safety-core": 0.4
    },

    # Models in the routing pool
    models=["gpt-4o", "claude-opus-4-5", "gemini-2.5-pro"],

    # Routing behavior
    routing={
        "min_samples": 100,          # Require 100 samples before adjusting
        "window": "24h",             # Rolling window for scores
        "smoothing": "exponential",  # Smooth score changes
        "alpha": 0.3                 # Smoothing factor
    },

    # Constraints
    constraints={
        "min_traffic_share": 0.1,    # No model below 10%
        "max_traffic_share": 0.7,    # No model above 70%
        "safety_floor": 95,          # Block if safety < 95%
        "quality_floor": 70          # Minimum quality score
    }
)

Routing Algorithms

Score-Proportional Routing

Traffic distributed proportionally to scores:

python
routing = {
    "algorithm": "proportional",
    "config": {
        "score_power": 2  # Exaggerate differences (score^2)
    }
}

# If scores are gpt-4o=90, claude=95, gemini=85:
# gpt-4o:  90^2 / (90^2 + 95^2 + 85^2) = 33%
# claude:  95^2 / ... = 37%
# gemini:  85^2 / ... = 30%

Threshold-Based Routing

Strict cutoffs for quality:

python
routing = {
    "algorithm": "threshold",
    "config": {
        "tiers": [
            {"min_score": 90, "traffic_share": 0.7},
            {"min_score": 80, "traffic_share": 0.25},
            {"min_score": 70, "traffic_share": 0.05},
            {"min_score": 0, "traffic_share": 0.0}  # Blocked
        ]
    }
}

Epsilon-Greedy Exploration

Balance exploitation with exploration:

python
routing = {
    "algorithm": "epsilon_greedy",
    "config": {
        "epsilon": 0.1,  # 10% random exploration
        "exploitation": "best_score"  # Route 90% to highest scorer
    }
}

Thompson Sampling

Bayesian approach for uncertainty-aware routing:

python
routing = {
    "algorithm": "thompson_sampling",
    "config": {
        "prior": "beta(1, 1)",  # Uninformative prior
        "min_samples": 30       # Per model before converging
    }
}

Viewing Routing State

Dashboard

Navigate to Eval → Routing to see:

  • Current traffic distribution
  • Score history per model
  • Routing decision log

API

python
# Get current routing state
state = client.get_routing_state()

for model in state.models:
    print(f"{model.name}:")
    print(f"  Score: {model.current_score}")
    print(f"  Traffic: {model.traffic_share * 100:.1f}%")
    print(f"  Samples: {model.sample_count}")
    print(f"  Trend: {model.score_trend}")

# Example output:
# gpt-4o:
#   Score: 91.2
#   Traffic: 35.0%
#   Samples: 1247
#   Trend: stable
# claude-opus-4-5:
#   Score: 94.5
#   Traffic: 45.0%
#   Samples: 1582
#   Trend: improving
# gemini-2.5-pro:
#   Score: 87.3
#   Traffic: 20.0%
#   Samples: 703
#   Trend: declining

Routing Events

Event Types

  • routing.weights_updated - Traffic shares changed
  • routing.model_blocked - Model fell below threshold
  • routing.model_restored - Model recovered above threshold
  • routing.manual_override - Human intervention

Subscribing to Events

python
# Webhook configuration
client.configure_webhooks(
    events=["routing.weights_updated", "routing.model_blocked"],
    url="https://your-app.com/webhooks/gateflow"
)

# Or poll for events
events = client.get_routing_events(
    time_range="24h",
    types=["routing.weights_updated"]
)

for event in events:
    print(f"{event.timestamp}: {event.type}")
    print(f"  Old weights: {event.old_weights}")
    print(f"  New weights: {event.new_weights}")
    print(f"  Reason: {event.reason}")

Manual Overrides

Sometimes you need to intervene:

python
# Temporarily boost a model
client.set_routing_override(
    model="gpt-4o",
    traffic_share=0.5,
    duration="2h",
    reason="Testing new deployment"
)

# Block a model entirely
client.set_routing_override(
    model="gemini-2.5-pro",
    traffic_share=0.0,
    reason="Known issue being investigated"
)

# Remove all overrides
client.clear_routing_overrides()

Safety Rails

Hard Blocks

Models scoring below safety threshold are blocked:

python
constraints = {
    "safety_floor": 95,  # Block if safety-core < 95%
    "block_action": "remove_from_pool",
    "alert": True
}

Soft Degradation

Gradually reduce traffic as scores drop:

python
constraints = {
    "soft_degradation": {
        "start_at": 90,      # Start reducing at 90%
        "min_traffic_at": 70  # Minimum traffic at 70%
        "curve": "linear"
    }
}

A/B Testing Integration

Run controlled experiments alongside eval-driven routing:

python
# Create an A/B test
experiment = client.create_experiment(
    name="new-model-test",
    variants={
        "control": {"model": "gpt-4o", "traffic": 0.5},
        "treatment": {"model": "gpt-4o-new", "traffic": 0.5}
    },
    eval_suites=["quality-general"],
    duration="7d",
    success_metric="quality_score"
)

# After experiment, winner enters routing pool
results = client.get_experiment_results(experiment.id)
if results.significant and results.winner == "treatment":
    client.add_to_routing_pool("gpt-4o-new")

Next Steps

Built with reliability in mind.