Routing Feedback Loop

The defining feature of GateFlow Eval: evaluation scores automatically influence routing decisions. When a model's quality drifts, traffic shifts automatically.

How It Works

┌─────────────────────────────────────────────────────────────┐
│                    Closed-Loop System                        │
│                                                              │
│  Production    Eval       Score        Routing              │
│  Traffic  ───► Sampling ─► Engine ────► Weights             │
│     ▲                                      │                │
│     │                                      │                │
│     └──────────────────────────────────────┘                │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Production traffic flows through GateFlow
Samples are continuously evaluated
Scores update model quality ratings
Routing weights adjust based on quality
Better models receive more traffic

Configuration

Basic Setup

python

from gateflow import EvalClient

client = EvalClient(api_key="gf-...")

# Enable eval-driven routing
client.configure_routing_feedback(
    enabled=True,
    suites=["quality-general", "safety-core"],
    models=["gpt-4o", "claude-opus-4-5", "gemini-2.5-pro"]
)

Advanced Configuration

python

client.configure_routing_feedback(
    enabled=True,

    # Which evals influence routing
    suites=["quality-general", "safety-core"],
    suite_weights={
        "quality-general": 0.6,
        "safety-core": 0.4
    },

    # Models in the routing pool
    models=["gpt-4o", "claude-opus-4-5", "gemini-2.5-pro"],

    # Routing behavior
    routing={
        "min_samples": 100,          # Require 100 samples before adjusting
        "window": "24h",             # Rolling window for scores
        "smoothing": "exponential",  # Smooth score changes
        "alpha": 0.3                 # Smoothing factor
    },

    # Constraints
    constraints={
        "min_traffic_share": 0.1,    # No model below 10%
        "max_traffic_share": 0.7,    # No model above 70%
        "safety_floor": 95,          # Block if safety < 95%
        "quality_floor": 70          # Minimum quality score
    }
)

Routing Algorithms

Score-Proportional Routing

Traffic distributed proportionally to scores:

python

routing = {
    "algorithm": "proportional",
    "config": {
        "score_power": 2  # Exaggerate differences (score^2)
    }
}

# If scores are gpt-4o=90, claude=95, gemini=85:
# gpt-4o:  90^2 / (90^2 + 95^2 + 85^2) = 33%
# claude:  95^2 / ... = 37%
# gemini:  85^2 / ... = 30%

Threshold-Based Routing

Strict cutoffs for quality:

python

routing = {
    "algorithm": "threshold",
    "config": {
        "tiers": [
            {"min_score": 90, "traffic_share": 0.7},
            {"min_score": 80, "traffic_share": 0.25},
            {"min_score": 70, "traffic_share": 0.05},
            {"min_score": 0, "traffic_share": 0.0}  # Blocked
        ]
    }
}

Epsilon-Greedy Exploration

Balance exploitation with exploration:

python

routing = {
    "algorithm": "epsilon_greedy",
    "config": {
        "epsilon": 0.1,  # 10% random exploration
        "exploitation": "best_score"  # Route 90% to highest scorer
    }
}

Thompson Sampling

Bayesian approach for uncertainty-aware routing:

python

routing = {
    "algorithm": "thompson_sampling",
    "config": {
        "prior": "beta(1, 1)",  # Uninformative prior
        "min_samples": 30       # Per model before converging
    }
}

Viewing Routing State

Dashboard

Navigate to Eval → Routing to see:

Current traffic distribution
Score history per model
Routing decision log

API

python

# Get current routing state
state = client.get_routing_state()

for model in state.models:
    print(f"{model.name}:")
    print(f"  Score: {model.current_score}")
    print(f"  Traffic: {model.traffic_share * 100:.1f}%")
    print(f"  Samples: {model.sample_count}")
    print(f"  Trend: {model.score_trend}")

# Example output:
# gpt-4o:
#   Score: 91.2
#   Traffic: 35.0%
#   Samples: 1247
#   Trend: stable
# claude-opus-4-5:
#   Score: 94.5
#   Traffic: 45.0%
#   Samples: 1582
#   Trend: improving
# gemini-2.5-pro:
#   Score: 87.3
#   Traffic: 20.0%
#   Samples: 703
#   Trend: declining

Routing Events

Event Types

routing.weights_updated - Traffic shares changed
routing.model_blocked - Model fell below threshold
routing.model_restored - Model recovered above threshold
routing.manual_override - Human intervention

Subscribing to Events

python

# Webhook configuration
client.configure_webhooks(
    events=["routing.weights_updated", "routing.model_blocked"],
    url="https://your-app.com/webhooks/gateflow"
)

# Or poll for events
events = client.get_routing_events(
    time_range="24h",
    types=["routing.weights_updated"]
)

for event in events:
    print(f"{event.timestamp}: {event.type}")
    print(f"  Old weights: {event.old_weights}")
    print(f"  New weights: {event.new_weights}")
    print(f"  Reason: {event.reason}")

Manual Overrides

Sometimes you need to intervene:

python

# Temporarily boost a model
client.set_routing_override(
    model="gpt-4o",
    traffic_share=0.5,
    duration="2h",
    reason="Testing new deployment"
)

# Block a model entirely
client.set_routing_override(
    model="gemini-2.5-pro",
    traffic_share=0.0,
    reason="Known issue being investigated"
)

# Remove all overrides
client.clear_routing_overrides()

Safety Rails

Hard Blocks

Models scoring below safety threshold are blocked:

python

constraints = {
    "safety_floor": 95,  # Block if safety-core < 95%
    "block_action": "remove_from_pool",
    "alert": True
}

Soft Degradation

Gradually reduce traffic as scores drop:

python

constraints = {
    "soft_degradation": {
        "start_at": 90,      # Start reducing at 90%
        "min_traffic_at": 70  # Minimum traffic at 70%
        "curve": "linear"
    }
}

A/B Testing Integration

Run controlled experiments alongside eval-driven routing:

python

# Create an A/B test
experiment = client.create_experiment(
    name="new-model-test",
    variants={
        "control": {"model": "gpt-4o", "traffic": 0.5},
        "treatment": {"model": "gpt-4o-new", "traffic": 0.5}
    },
    eval_suites=["quality-general"],
    duration="7d",
    success_metric="quality_score"
)

# After experiment, winner enters routing pool
results = client.get_experiment_results(experiment.id)
if results.significant and results.winner == "treatment":
    client.add_to_routing_pool("gpt-4o-new")

Next Steps

Drift Detection - Automatic alerting
Traffic Sampling - Configure sample rates
Compliance Reports - Document routing decisions

Routing Feedback Loop ​

How It Works ​

Configuration ​

Basic Setup ​

Advanced Configuration ​

Routing Algorithms ​

Score-Proportional Routing ​

Threshold-Based Routing ​

Epsilon-Greedy Exploration ​

Thompson Sampling ​

Viewing Routing State ​

Dashboard ​

API ​

Routing Events ​

Event Types ​

Subscribing to Events ​

Manual Overrides ​

Safety Rails ​

Hard Blocks ​

Soft Degradation ​

A/B Testing Integration ​

Next Steps ​

Routing Feedback Loop

How It Works

Configuration

Basic Setup

Advanced Configuration

Routing Algorithms

Score-Proportional Routing

Threshold-Based Routing

Epsilon-Greedy Exploration

Thompson Sampling

Viewing Routing State

Dashboard

API

Routing Events

Event Types

Subscribing to Events

Manual Overrides

Safety Rails

Hard Blocks

Soft Degradation

A/B Testing Integration

Next Steps