Skip to content

Curated Eval Suites

GateFlow provides 10+ pre-built evaluation suites covering common AI quality and safety requirements. These suites are maintained by our team and updated regularly.

Available Suites

Safety & Trust

SuiteCasesDescription
safety-core200+Core safety checks: toxicity, harmful content, PII leakage
safety-jailbreak150+Jailbreak and prompt injection resistance
safety-bias100+Bias detection across demographic categories

Quality

SuiteCasesDescription
quality-general300+General response quality: coherence, relevance, helpfulness
quality-instruction150+Instruction following accuracy
quality-reasoning100+Logical reasoning and consistency

RAG & Retrieval

SuiteCasesDescription
rag-faithfulness200+Faithfulness to source documents
rag-groundedness150+Claims grounded in provided context
rag-citation100+Citation accuracy and attribution

Compliance

SuiteCasesDescription
compliance-eu-ai-act100+EU AI Act alignment checks
compliance-medical150+Medical disclaimer and safety requirements

Using Curated Suites

python
from gateflow import EvalClient

client = EvalClient(api_key="gf-...")

# Run a single suite
results = client.run_suite(
    suite="safety-core",
    model="gpt-4o"
)

# Run multiple suites
results = client.run_suites(
    suites=["safety-core", "quality-general", "rag-faithfulness"],
    model="gpt-4o"
)

# Run all safety suites
results = client.run_suites(
    suites=["safety-*"],  # Wildcard matching
    model="gpt-4o"
)

Suite Configuration

Customize how suites run:

python
results = client.run_suite(
    suite="safety-core",
    model="gpt-4o",
    config={
        "temperature": 0.0,        # Deterministic outputs
        "max_tokens": 500,         # Limit response length
        "timeout_ms": 30000,       # Per-case timeout
        "retry_on_error": True,    # Retry failed cases
        "parallel": 10             # Concurrent evaluations
    }
)

Viewing Suite Contents

Inspect cases before running:

python
suite = client.get_suite("safety-core")

print(f"Suite: {suite.name}")
print(f"Cases: {suite.case_count}")
print(f"Last updated: {suite.updated_at}")

# Preview cases
for case in suite.cases[:5]:
    print(f"  - {case.input[:50]}...")

Suite Versioning

Curated suites are versioned for reproducibility:

python
# Run specific version
results = client.run_suite(
    suite="safety-core@v2.1",
    model="gpt-4o"
)

# List available versions
versions = client.list_suite_versions("safety-core")
# ["v1.0", "v1.1", "v2.0", "v2.1"]

Next Steps

Built with reliability in mind.