Core Concepts

Understanding these concepts will help you get the most out of GateFlow.

Providers

A provider is an AI service like OpenAI, Anthropic, Google, Mistral, or Cohere. GateFlow connects to providers on your behalf.

Supported Providers

Provider	Models	Capabilities
OpenAI	GPT-5.2, GPT-5, o3, Whisper, TTS	Chat, Embeddings, STT, TTS
Anthropic	Claude Opus 4.5, Claude Sonnet 4.5, Claude Haiku 4.5	Chat
Google	Gemini 3 Pro, Gemini 2.5 Pro/Flash	Chat, Embeddings
Mistral	Mistral Large 3, Small 3, Voxtral	Chat, Embeddings, STT
Cohere	Command R+, Command R, Embed, Rerank	Chat, Embeddings, Rerank
ElevenLabs	Multilingual v2, Turbo v2.5	TTS

Provider Credentials

You bring your own API keys from each provider. GateFlow stores them encrypted and uses them to make requests on your behalf.

python

# Your app only needs a GateFlow key
client = OpenAI(
    base_url="https://api.gateflow.ai/v1",
    api_key="gw_prod_..."  # GateFlow key
)

# GateFlow uses your stored provider keys internally

Models

A model is a specific AI model from a provider, like gpt-5.2 or claude-sonnet-4-5-20250929.

Model Naming

GateFlow uses the provider's original model names:

python

# OpenAI models
model="gpt-5.2"
model="gpt-5"
model="gpt-5-mini"
model="o3"

# Anthropic models
model="claude-opus-4-5-20251107"
model="claude-sonnet-4-5-20250929"
model="claude-haiku-4-5-20251015"

# Google models
model="gemini-3-pro"
model="gemini-2.5-flash"

Model Aliases

You can create custom aliases for easier management:

python

# In dashboard: alias "fast" → "gpt-5-mini"
model="fast"  # Routes to gpt-5-mini

Routing

Routing determines which model handles a request. GateFlow supports several routing strategies.

Direct Routing

Specify the model explicitly:

python

response = client.chat.completions.create(
    model="gpt-5.2",  # Goes directly to GPT-5.2
    messages=[...]
)

Fallback Routing

Configure fallback models for reliability:

python

# In dashboard: gpt-5.2 → claude-sonnet-4-5 → gemini-3-pro
response = client.chat.completions.create(
    model="gpt-5.2",  # Tries gpt-5.2 first
    messages=[...]    # Falls back to Claude if OpenAI is down
)

Task-Based Routing

Route based on the task type:

python

response = client.chat.completions.create(
    model="auto",  # Let GateFlow choose
    messages=[...],
    extra_body={
        "gateflow": {
            "task_type": "code_generation"  # Routes to code-optimized model
        }
    }
)

Caching

GateFlow's semantic cache stores responses and returns cached results for similar queries.

How It Works

Request comes in with a prompt
GateFlow generates an embedding of the prompt
Searches cache for similar prompts (configurable threshold)
If found: return cached response (instant, free)
If not found: forward to provider, cache response

Cache Hit Example

python

# First request - goes to provider
client.chat.completions.create(
    model="gpt-5.2",
    messages=[{"role": "user", "content": "What is Python?"}]
)  # Takes ~1 second, costs tokens

# Similar request - hits cache
client.chat.completions.create(
    model="gpt-5.2",
    messages=[{"role": "user", "content": "Can you explain Python?"}]
)  # Returns instantly, free

Cache Control

Disable caching for specific requests:

python

response = client.chat.completions.create(
    model="gpt-5.2",
    messages=[...],
    extra_body={
        "gateflow": {
            "cache": "skip"  # Bypass cache
        }
    }
)

API Keys

GateFlow uses API keys to authenticate your requests.

Key Prefixes

Prefix	Scope	Use Case
`gw_dev_`	Development	Testing, sandbox environments
`gw_prod_`	Production	Live applications

Key Permissions

Keys can be scoped to specific:

Models (only allow certain models)
IP addresses (restrict to your servers)
Rate limits (requests per minute)
Cost limits (max spend per day/month)

Organizations

Organizations group users, keys, and resources together.

Structure

Organization (Acme Corp)
├── Workspace (Production)
│   ├── API Keys
│   ├── Provider Configs
│   └── Routing Rules
├── Workspace (Staging)
│   └── ...
└── Members
    ├── Admin (full access)
    ├── Developer (can use keys)
    └── Viewer (read-only)

Role-Based Access Control

Role	Permissions
Owner	Everything
Admin	Manage keys, providers, members
Developer	Use API keys, view analytics
Viewer	View analytics only

Request Flow

Here's what happens when you make a request:

1. Your App sends request to GateFlow
2. Authentication: Validate API key
3. Rate Limiting: Check against limits
4. Cache Check: Look for cached response
5. Routing: Select provider/model
6. Request: Forward to provider
7. Response: Return to your app
8. Logging: Record for analytics

Next Steps

Provider Configuration - Set up your providers
Intelligent Routing - Advanced routing strategies
Semantic Caching - Configure caching

Core Concepts ​

Providers ​

Supported Providers ​

Provider Credentials ​

Models ​

Model Naming ​

Model Aliases ​

Routing ​

Direct Routing ​

Fallback Routing ​

Task-Based Routing ​

Caching ​

How It Works ​

Cache Hit Example ​

Cache Control ​

API Keys ​

Key Prefixes ​

Key Permissions ​

Organizations ​

Structure ​

Role-Based Access Control ​

Request Flow ​

Next Steps ​

Core Concepts

Providers

Supported Providers

Provider Credentials

Models

Model Naming

Model Aliases

Routing

Direct Routing

Fallback Routing

Task-Based Routing

Caching

How It Works

Cache Hit Example

Cache Control

API Keys

Key Prefixes

Key Permissions

Organizations

Structure

Role-Based Access Control

Request Flow

Next Steps