Skip to content

Core Concepts

Understanding these concepts will help you get the most out of GateFlow.

Providers

A provider is an AI service like OpenAI, Anthropic, Google, Mistral, or Cohere. GateFlow connects to providers on your behalf.

Supported Providers

ProviderModelsCapabilities
OpenAIGPT-5.2, GPT-5, o3, Whisper, TTSChat, Embeddings, STT, TTS
AnthropicClaude Opus 4.5, Claude Sonnet 4.5, Claude Haiku 4.5Chat
GoogleGemini 3 Pro, Gemini 2.5 Pro/FlashChat, Embeddings
MistralMistral Large 3, Small 3, VoxtralChat, Embeddings, STT
CohereCommand R+, Command R, Embed, RerankChat, Embeddings, Rerank
ElevenLabsMultilingual v2, Turbo v2.5TTS

Provider Credentials

You bring your own API keys from each provider. GateFlow stores them encrypted and uses them to make requests on your behalf.

python
# Your app only needs a GateFlow key
client = OpenAI(
    base_url="https://api.gateflow.ai/v1",
    api_key="gw_prod_..."  # GateFlow key
)

# GateFlow uses your stored provider keys internally

Models

A model is a specific AI model from a provider, like gpt-5.2 or claude-sonnet-4-5-20250929.

Model Naming

GateFlow uses the provider's original model names:

python
# OpenAI models
model="gpt-5.2"
model="gpt-5"
model="gpt-5-mini"
model="o3"

# Anthropic models
model="claude-opus-4-5-20251107"
model="claude-sonnet-4-5-20250929"
model="claude-haiku-4-5-20251015"

# Google models
model="gemini-3-pro"
model="gemini-2.5-flash"

Model Aliases

You can create custom aliases for easier management:

python
# In dashboard: alias "fast" → "gpt-5-mini"
model="fast"  # Routes to gpt-5-mini

Routing

Routing determines which model handles a request. GateFlow supports several routing strategies.

Direct Routing

Specify the model explicitly:

python
response = client.chat.completions.create(
    model="gpt-5.2",  # Goes directly to GPT-5.2
    messages=[...]
)

Fallback Routing

Configure fallback models for reliability:

python
# In dashboard: gpt-5.2 → claude-sonnet-4-5 → gemini-3-pro
response = client.chat.completions.create(
    model="gpt-5.2",  # Tries gpt-5.2 first
    messages=[...]    # Falls back to Claude if OpenAI is down
)

Task-Based Routing

Route based on the task type:

python
response = client.chat.completions.create(
    model="auto",  # Let GateFlow choose
    messages=[...],
    extra_body={
        "gateflow": {
            "task_type": "code_generation"  # Routes to code-optimized model
        }
    }
)

Caching

GateFlow's semantic cache stores responses and returns cached results for similar queries.

How It Works

  1. Request comes in with a prompt
  2. GateFlow generates an embedding of the prompt
  3. Searches cache for similar prompts (configurable threshold)
  4. If found: return cached response (instant, free)
  5. If not found: forward to provider, cache response

Cache Hit Example

python
# First request - goes to provider
client.chat.completions.create(
    model="gpt-5.2",
    messages=[{"role": "user", "content": "What is Python?"}]
)  # Takes ~1 second, costs tokens

# Similar request - hits cache
client.chat.completions.create(
    model="gpt-5.2",
    messages=[{"role": "user", "content": "Can you explain Python?"}]
)  # Returns instantly, free

Cache Control

Disable caching for specific requests:

python
response = client.chat.completions.create(
    model="gpt-5.2",
    messages=[...],
    extra_body={
        "gateflow": {
            "cache": "skip"  # Bypass cache
        }
    }
)

API Keys

GateFlow uses API keys to authenticate your requests.

Key Prefixes

PrefixScopeUse Case
gw_dev_DevelopmentTesting, sandbox environments
gw_prod_ProductionLive applications

Key Permissions

Keys can be scoped to specific:

  • Models (only allow certain models)
  • IP addresses (restrict to your servers)
  • Rate limits (requests per minute)
  • Cost limits (max spend per day/month)

Organizations

Organizations group users, keys, and resources together.

Structure

Organization (Acme Corp)
├── Workspace (Production)
│   ├── API Keys
│   ├── Provider Configs
│   └── Routing Rules
├── Workspace (Staging)
│   └── ...
└── Members
    ├── Admin (full access)
    ├── Developer (can use keys)
    └── Viewer (read-only)

Role-Based Access Control

RolePermissions
OwnerEverything
AdminManage keys, providers, members
DeveloperUse API keys, view analytics
ViewerView analytics only

Request Flow

Here's what happens when you make a request:

1. Your App sends request to GateFlow
2. Authentication: Validate API key
3. Rate Limiting: Check against limits
4. Cache Check: Look for cached response
5. Routing: Select provider/model
6. Request: Forward to provider
7. Response: Return to your app
8. Logging: Record for analytics

Next Steps

Built with reliability in mind.