Architecture

This page describes how GateFlow works under the hood.

System Overview

Components

API Gateway

The entry point for all requests. Handles:

Authentication: Validates API keys
Rate Limiting: Enforces per-key and per-org limits
Request Parsing: Normalizes different client formats
Response Streaming: Supports SSE for streaming responses

Technologies: Python FastAPI, async I/O

Routing Engine

Determines which provider and model handles each request:

Direct Routing: Model specified in request
Fallback Chains: Sequential attempts if primary fails
Task Classification: ML model classifies task type
Load Balancing: Distributes across providers

Decision Flow:

1. Parse model from request
2. Check if model is an alias → resolve
3. Check if model is deprecated → use replacement
4. Get fallback chain for model
5. For each model in chain:
   a. Check provider health
   b. Check rate limits
   c. If available, route request
   d. If failed, try next in chain

Cache Layer

Semantic caching with pgvector:

Embedding Generation: Creates vector for each prompt
Similarity Search: Finds cached responses above threshold
Cache Storage: PostgreSQL with pgvector extension
TTL Management: Automatic expiration

Cache Key Components:

Prompt embedding (vector)
Model name
Temperature setting
Organization ID

Provider Connectors

Adapters for each AI provider:

python

# Simplified connector interface
class ProviderConnector:
    async def chat_completion(self, request) -> Response
    async def embedding(self, request) -> Response
    async def health_check(self) -> bool

Each connector handles:

Request format translation
Response normalization
Error mapping
Retry logic

Data Layer

PostgreSQL - Primary database:

Organizations, users, API keys
Provider configurations
Routing rules
Request logs

pgvector - Vector similarity:

Semantic cache storage
Document embeddings (Data Pillar)

Redis - Fast operations:

Rate limit counters
Session cache
Real-time metrics

Request Lifecycle

1. Request Received

http

POST /v1/chat/completions HTTP/1.1
Host: api.gateflow.ai
Authorization: Bearer gw_prod_xxx
Content-Type: application/json

{"model": "gpt-4o", "messages": [...]}

2. Authentication

python

async def authenticate(request):
    api_key = extract_api_key(request)
    key_data = await db.get_api_key(api_key)

    if not key_data:
        raise AuthenticationError("Invalid API key")

    if key_data.revoked:
        raise AuthenticationError("API key revoked")

    return key_data.organization_id

3. Rate Limiting

python

async def check_rate_limit(org_id, key_id):
    # Check organization limits
    org_count = await redis.incr(f"ratelimit:org:{org_id}")
    if org_count > ORG_LIMIT:
        raise RateLimitError("Organization rate limit exceeded")

    # Check key limits
    key_count = await redis.incr(f"ratelimit:key:{key_id}")
    if key_count > KEY_LIMIT:
        raise RateLimitError("API key rate limit exceeded")

4. Cache Lookup

python

async def check_cache(request):
    # Generate embedding
    embedding = await embed(request.messages)

    # Search for similar cached responses
    results = await pgvector.similarity_search(
        embedding,
        threshold=0.95,
        filters={"model": request.model, "org_id": org_id}
    )

    if results:
        return CacheHit(results[0].response)

    return CacheMiss()

5. Provider Selection

python

async def select_provider(model, org_id):
    # Get fallback chain
    chain = await get_fallback_chain(model, org_id)

    for candidate in chain:
        provider = get_provider(candidate.provider)

        # Check health
        if not await provider.is_healthy():
            continue

        # Check rate limits
        if await provider.is_rate_limited(org_id):
            continue

        return provider, candidate.model

    raise NoAvailableProvider("All providers exhausted")

6. Provider Request

python

async def call_provider(provider, model, request):
    try:
        response = await provider.chat_completion(
            model=model,
            messages=request.messages,
            **request.parameters
        )
        return response
    except ProviderError as e:
        # Record failure for circuit breaker
        await record_failure(provider)
        raise

7. Response Processing

python

async def process_response(response, request):
    # Calculate cost
    cost = calculate_cost(
        model=response.model,
        input_tokens=response.usage.prompt_tokens,
        output_tokens=response.usage.completion_tokens
    )

    # Cache response
    await cache_response(request, response)

    # Log for analytics
    await log_request(request, response, cost)

    return response

Reliability

Circuit Breaker

Prevents cascading failures when a provider is down:

State: CLOSED (normal)
  ↓ failures > threshold
State: OPEN (failing fast)
  ↓ timeout expires
State: HALF-OPEN (testing)
  ↓ success
State: CLOSED

Retry Logic

Exponential backoff with jitter:

python

retry_delays = [1s, 2s, 4s, 8s]  # With ±25% jitter

Retries on:

Network timeouts
5xx errors
Rate limit errors (after delay)

Does not retry:

Authentication errors
Invalid request errors
Content policy violations

Health Checks

Continuous monitoring of provider status:

python

# Every 30 seconds per provider
async def health_check_loop():
    while True:
        for provider in providers:
            try:
                await provider.ping()
                provider.mark_healthy()
            except:
                provider.mark_unhealthy()
        await sleep(30)

Scalability

Horizontal Scaling

API Gateway scales horizontally behind a load balancer:

Load Balancer
     │
     ├── API Instance 1
     ├── API Instance 2
     ├── API Instance 3
     └── API Instance N

Database Scaling

Read Replicas: Analytics queries on replicas
Connection Pooling: PgBouncer for efficient connections
Partitioning: Request logs partitioned by date

Caching Tiers

L1: In-memory (hot keys)
L2: Redis (warm keys)
L3: PostgreSQL/pgvector (all keys)

Security

Data Encryption

In Transit: TLS 1.3 for all connections
At Rest: AES-256 for stored credentials
Provider Keys: Encrypted with per-org keys

Isolation

Multi-tenant: Row-level security in PostgreSQL
Network: Provider requests from isolated workers
Secrets: HSM for master encryption keys

Next Steps

Provider Configuration - Set up providers
Intelligent Routing - Configure routing
Data & Compliance - Security and compliance features

Architecture ​

System Overview ​

Components ​

API Gateway ​

Routing Engine ​

Cache Layer ​

Provider Connectors ​

Data Layer ​

Request Lifecycle ​

1. Request Received ​

2. Authentication ​

3. Rate Limiting ​

4. Cache Lookup ​

5. Provider Selection ​

6. Provider Request ​

7. Response Processing ​

Reliability ​

Circuit Breaker ​

Retry Logic ​

Health Checks ​

Scalability ​

Horizontal Scaling ​

Database Scaling ​

Caching Tiers ​

Security ​

Data Encryption ​

Isolation ​

Next Steps ​

Architecture

System Overview

Components

API Gateway

Routing Engine

Cache Layer

Provider Connectors

Data Layer

Request Lifecycle

1. Request Received

2. Authentication

3. Rate Limiting

4. Cache Lookup

5. Provider Selection

6. Provider Request

7. Response Processing

Reliability

Circuit Breaker

Retry Logic

Health Checks

Scalability

Horizontal Scaling

Database Scaling

Caching Tiers

Security

Data Encryption

Isolation

Next Steps