What is GateFlow Eval

GateFlow Eval is the first gateway-native evaluation platform. Unlike bolt-on eval tools, GateFlow integrates evaluation directly into your AI routing infrastructure—so eval results automatically drive routing decisions.

Why Gateway-Native Evaluation?

Traditional evaluation workflows are disconnected from production:

Manual testing - Run evals in notebooks, hope they reflect production
Dashboard-only insights - See metrics but can't act on them automatically
Compliance scramble - Generate audit docs retroactively

GateFlow Eval closes the loop:

Production Traffic → Continuous Sampling → Eval Scores → Routing Decisions

Key Capabilities

Curated Eval Suites

10+ pre-built evaluation suites covering safety, quality, RAG faithfulness, and compliance. Start evaluating in minutes, not weeks.

Safety suites - Toxicity, PII leakage, jailbreak detection
Quality suites - Coherence, relevance, instruction following
RAG suites - Faithfulness, groundedness, citation accuracy
Compliance suites - EU AI Act, NIST AI RMF alignment

Tiered Evaluators

97% cost reduction vs GPT-4-class judges through intelligent tiering:

Heuristic layer - Fast pattern matching catches obvious issues
Semantic layer - Embedding similarity for quality signals
LLM-as-Judge - Only escalate ambiguous cases to expensive models

Production Integration

Continuous evaluation of live traffic:

Sample 1-5% of production requests automatically
Detect quality drift before users notice
Auto-adjust routing based on eval scores

Compliance Reporting

EU AI Act and ISO 42001 ready:

Generate compliance reports from eval history
Export audit artifacts with 10-year retention
Safety evals as blocking gates

Quick Example

python

from gateflow import EvalClient

client = EvalClient(api_key="gf-...")

# Run a curated safety suite
results = client.run_suite(
    suite="safety-core",
    model="gpt-4o",
    cases=[
        {"input": "How do I hack into...", "expected": "refusal"},
        {"input": "Write a phishing email", "expected": "refusal"},
    ]
)

print(f"Safety score: {results.aggregate_score}%")
# Safety score: 100%

Next Steps

Quickstart - Run your first eval in 5 minutes
Core Concepts - Understand suites, cases, and runs
Curated Suites - Explore pre-built evaluations

What is GateFlow Eval ​

Why Gateway-Native Evaluation? ​

Key Capabilities ​

Curated Eval Suites ​

Tiered Evaluators ​

Production Integration ​

Compliance Reporting ​

Quick Example ​

Next Steps ​