Appearance
Quickstart
Run your first evaluation in 5 minutes.
Prerequisites
- GateFlow API key (get one here)
- Python 3.8+ or Node.js 18+
Installation
bash
pip install gateflowbash
npm install @gateflow/sdkRun Your First Eval
Option 1: Use a Curated Suite
The fastest way to start—use a pre-built evaluation suite:
python
from gateflow import EvalClient
client = EvalClient(api_key="gf-...")
# Run the safety-core suite against your model
results = client.run_suite(
suite="safety-core",
model="gpt-4o"
)
print(f"Overall score: {results.aggregate_score}%")
print(f"Cases passed: {results.passed}/{results.total}")javascript
import { EvalClient } from '@gateflow/sdk';
const client = new EvalClient({ apiKey: 'gf-...' });
const results = await client.runSuite({
suite: 'safety-core',
model: 'gpt-4o'
});
console.log(`Overall score: ${results.aggregateScore}%`);
console.log(`Cases passed: ${results.passed}/${results.total}`);Option 2: Evaluate Custom Cases
Test your own inputs and expected outputs:
python
from gateflow import EvalClient
client = EvalClient(api_key="gf-...")
# Define your test cases
cases = [
{
"input": "What's the capital of France?",
"expected": "Paris",
"evaluator": "exact_match"
},
{
"input": "Summarize this article...",
"expected_criteria": ["concise", "accurate", "neutral"],
"evaluator": "llm_judge"
}
]
results = client.evaluate(
model="gpt-4o",
cases=cases
)
for result in results:
print(f"{result.case_id}: {result.score} - {result.reasoning}")javascript
import { EvalClient } from '@gateflow/sdk';
const client = new EvalClient({ apiKey: 'gf-...' });
const cases = [
{
input: "What's the capital of France?",
expected: "Paris",
evaluator: "exact_match"
},
{
input: "Summarize this article...",
expectedCriteria: ["concise", "accurate", "neutral"],
evaluator: "llm_judge"
}
];
const results = await client.evaluate({
model: 'gpt-4o',
cases
});
results.forEach(r => console.log(`${r.caseId}: ${r.score} - ${r.reasoning}`));Enable Production Sampling
Add automatic evaluation of live traffic:
python
from openai import OpenAI
# Just use GateFlow as your base URL - sampling is automatic
client = OpenAI(
api_key="gf-...",
base_url="https://api.gateflow.ai/v1"
)
# Normal inference - 2.5% of requests are automatically sampled for eval
response = client.chat.completions.create(
model="auto",
messages=[{"role": "user", "content": "Hello!"}]
)Configure sampling rate in your dashboard or via API:
python
from gateflow import EvalClient
client = EvalClient(api_key="gf-...")
client.configure_sampling(
rate=0.025, # 2.5% of traffic
suites=["safety-core", "quality-general"]
)View Results
Results are available in the dashboard or via API:
python
# Get recent eval runs
runs = client.list_runs(limit=10)
for run in runs:
print(f"{run.suite}: {run.score}% ({run.timestamp})")
# Get detailed results for a specific run
details = client.get_run(run_id="run_abc123")
for case in details.cases:
print(f" {case.input[:50]}... → {case.score}")Next Steps
- Core Concepts - Understand the evaluation model
- Curated Suites - Explore pre-built evaluations
- LLM-as-Judge - Configure LLM evaluators
- Traffic Sampling - Set up production evaluation