Appearance
Heuristic Evaluators
Fast, rule-based evaluation methods that don't require LLM calls. Use heuristics for clear-cut checks and as the first layer in tiered evaluation.
Why Heuristics?
| Method | Speed | Cost | Best For |
|---|---|---|---|
| Heuristic | <1ms | Free | Format checks, exact matches |
| LLM Judge | ~2s | $$ | Nuanced quality assessment |
Heuristics are ideal for:
- Binary pass/fail checks
- Format validation
- Constraint verification
- Pre-filtering before expensive LLM evaluation
Available Heuristics
Text Matching
python
from gateflow import EvalClient
client = EvalClient(api_key="gf-...")
# Exact match
results = client.evaluate(
cases=[{"input": "What is 2+2?", "response": "4", "expected": "4"}],
evaluator="exact_match"
)
# Case-insensitive match
results = client.evaluate(
cases=[{"response": "PARIS", "expected": "paris"}],
evaluator="exact_match",
config={"case_sensitive": False}
)
# Contains check
results = client.evaluate(
cases=[{"response": "The capital is Paris", "expected": "Paris"}],
evaluator="contains"
)
# Regex match
results = client.evaluate(
cases=[{"response": "Order #12345", "pattern": r"Order #\d{5}"}],
evaluator="regex_match"
)Format Validation
python
# JSON validity
results = client.evaluate(
cases=[{"response": '{"name": "test", "value": 123}'}],
evaluator="json_valid"
)
# JSON schema compliance
results = client.evaluate(
cases=[{"response": '{"name": "test"}'}],
evaluator="json_schema",
config={
"schema": {
"type": "object",
"required": ["name", "value"],
"properties": {
"name": {"type": "string"},
"value": {"type": "number"}
}
}
}
)
# Markdown structure
results = client.evaluate(
cases=[{"response": "# Title\n\nParagraph..."}],
evaluator="markdown_valid"
)Length Constraints
python
# Character count
results = client.evaluate(
cases=[{"response": "Short answer"}],
evaluator="length_check",
config={"min_chars": 10, "max_chars": 100}
)
# Word count
results = client.evaluate(
cases=[{"response": "This is a five word sentence."}],
evaluator="word_count",
config={"min_words": 3, "max_words": 10}
)
# Sentence count
results = client.evaluate(
cases=[{"response": "First. Second. Third."}],
evaluator="sentence_count",
config={"exact": 3}
)Content Checks
python
# PII detection
results = client.evaluate(
cases=[{"response": "Contact john@email.com for help"}],
evaluator="pii_check",
config={"types": ["email", "phone", "ssn"]}
)
# Profanity filter
results = client.evaluate(
cases=[{"response": "This is a clean response"}],
evaluator="profanity_check"
)
# Language detection
results = client.evaluate(
cases=[{"response": "Bonjour, comment allez-vous?"}],
evaluator="language_check",
config={"expected": "en", "allow": ["en", "fr"]}
)List and Structure Checks
python
# Bullet point count
results = client.evaluate(
cases=[{"response": "• Item 1\n• Item 2\n• Item 3"}],
evaluator="bullet_count",
config={"min": 3, "max": 5}
)
# Numbered list validation
results = client.evaluate(
cases=[{"response": "1. First\n2. Second\n3. Third"}],
evaluator="numbered_list",
config={"sequential": True}
)
# Section headers
results = client.evaluate(
cases=[{"response": "## Overview\n...\n## Details\n..."}],
evaluator="has_sections",
config={"required": ["Overview", "Details"]}
)Combining Heuristics
All Must Pass
python
results = client.evaluate(
cases=cases,
evaluator="composite",
config={
"mode": "all", # All must pass
"checks": [
{"type": "json_valid"},
{"type": "length_check", "min_chars": 100},
{"type": "contains", "substring": "conclusion"}
]
}
)Any Must Pass
python
results = client.evaluate(
cases=cases,
evaluator="composite",
config={
"mode": "any", # At least one must pass
"checks": [
{"type": "exact_match", "expected": "N/A"},
{"type": "length_check", "min_chars": 50}
]
}
)Weighted Scoring
python
results = client.evaluate(
cases=cases,
evaluator="composite",
config={
"mode": "weighted",
"checks": [
{"type": "json_valid", "weight": 0.3},
{"type": "length_check", "min_chars": 100, "weight": 0.3},
{"type": "contains", "substring": "summary", "weight": 0.4}
]
}
)
# Score = sum of (passed * weight) * 100Custom Heuristics
Define your own heuristic functions:
python
def check_citation_format(response: str) -> bool:
"""Check if citations follow [Author, Year] format"""
import re
citations = re.findall(r'\[([^\]]+)\]', response)
pattern = r'^[A-Z][a-z]+, \d{4}$'
return all(re.match(pattern, c) for c in citations)
# Register custom heuristic
client.register_heuristic(
name="citation_format",
function=check_citation_format,
description="Validates [Author, Year] citation format"
)
# Use it
results = client.evaluate(
cases=cases,
evaluator="citation_format"
)Performance Benchmarks
| Heuristic | Throughput | Memory |
|---|---|---|
| exact_match | 100k/sec | <1MB |
| contains | 80k/sec | <1MB |
| regex_match | 50k/sec | <1MB |
| json_valid | 30k/sec | <1MB |
| json_schema | 20k/sec | <5MB |
| pii_check | 10k/sec | <10MB |
Best Practices
- Use heuristics first - Check format before semantic quality
- Fail fast - If heuristics fail, skip expensive LLM evals
- Be specific - Narrow checks are more reliable than broad ones
- Combine strategically - Use composite evaluators for complex requirements
Next Steps
- Tiered Approach - Combine heuristics with LLM judges
- LLM-as-Judge - For nuanced evaluation
- Traffic Sampling - Apply heuristics at scale