Skip to content

Chat Completions

Generate AI responses from a conversation with intelligent routing across multiple providers.

POST /v1/chat/completions

Overview

The Chat Completions endpoint is fully OpenAI-compatible and routes requests to the optimal provider based on task complexity and routing mode. GateFlow automatically:

  • Classifies your prompt's task type (coding, reasoning, simple Q&A, etc.)
  • Routes to the best model for that task type
  • Tracks requests with detailed timing, cost, and routing metrics
  • Handles rate limiting and request queuing
  • Provides energy and carbon tracking for sustainability

Request

bash
curl https://api.gateflow.ai/v1/chat/completions \
  -H "Authorization: Bearer gw_prod_..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Hello!"}
    ],
    "routing_mode": "balanced"
  }'
python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.gateflow.ai/v1",
    api_key="gw_prod_..."
)

response = client.chat.completions.create(
    model="auto",  # Let GateFlow select the best model
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"}
    ],
    extra_body={"routing_mode": "balanced"}
)
typescript
import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://api.gateflow.ai/v1',
  apiKey: 'gw_prod_...',
});

const response = await client.chat.completions.create({
  model: 'auto',
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'Hello!' }
  ],
});

Parameters

Required

ParameterTypeDescription
modelstringModel ID (e.g., gpt-4o, claude-3-5-sonnet) or auto for intelligent routing
messagesarrayConversation messages

Optional

ParameterTypeDefaultDescription
temperaturenumber1.0Sampling temperature (0-2)
max_tokensintegervariesMaximum tokens to generate
top_pnumber1.0Nucleus sampling parameter
streambooleanfalseEnable SSE streaming
stopstring/arraynullStop sequences
presence_penaltynumber0Penalize repeated topics (-2 to 2)
frequency_penaltynumber0Penalize repeated tokens (-2 to 2)
toolsarraynullTools (functions) available to the model
tool_choicestring/objectnullTool choice: auto, none, required, or specific function

GateFlow Routing Parameters

ParameterTypeDefaultDescription
routing_modestringbalancedRouting strategy (see below)
providerstringnullForce specific provider (bypasses routing)

Routing Modes

GateFlow supports five intelligent routing modes:

ModeDescription
balancedDefault. Best quality-to-price ratio. Recommended for most use cases.
cost_optimizedSelects cheapest model that meets quality threshold.
performanceSelects highest quality model regardless of cost.
low_latencyOptimizes for fastest response time.
sustain_optimizedSelects lowest-carbon model that meets quality threshold.

Use the X-Routing-Mode header or routing_mode body parameter:

bash
curl https://api.gateflow.ai/v1/chat/completions \
  -H "Authorization: Bearer gw_prod_..." \
  -H "X-Routing-Mode: cost_optimized" \
  -H "Content-Type: application/json" \
  -d '{"model": "auto", "messages": [...]}'

Messages

Each message has:

FieldTypeRequiredDescription
rolestringYessystem, user, assistant, or tool
contentstring/arrayConditionalMessage content (required except for assistant with tool_calls)
namestringNoParticipant name
tool_callsarrayNoTool calls made by assistant (for role=assistant)
tool_call_idstringNoID of tool call being responded to (for role=tool)

Content Types

json
// Text content
{"role": "user", "content": "Hello!"}

// Multimodal content (vision models)
{
  "role": "user",
  "content": [
    {"type": "text", "text": "What's in this image?"},
    {"type": "image_url", "image_url": {"url": "https://..."}}
  ]
}

// Tool response
{
  "role": "tool",
  "tool_call_id": "call_abc123",
  "content": "{\"temperature\": 72, \"condition\": \"sunny\"}"
}

Function/Tool Calling

GateFlow supports OpenAI-format tools with automatic translation to provider-specific formats:

python
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What's the weather in Paris?"}],
    tools=[{
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "City name"
                    }
                },
                "required": ["location"]
            }
        }
    }],
    tool_choice="auto"  # auto, none, required, or specific function
)

Tool Choice Options

ValueDescription
autoModel decides whether to call a tool
noneModel never calls tools
requiredModel must call at least one tool
{"type": "function", "function": {"name": "..."}}Force specific function

Legacy Functions

The functions and function_call parameters are deprecated. Use tools and tool_choice instead. GateFlow will automatically convert legacy format but will return a deprecation warning header.

Request Headers

HeaderTypeDescription
AuthorizationstringBearer token with your API key
X-Routing-ModestringOverride routing mode
X-GateFlow-ProjectstringProject ID for attribution
X-GateFlow-TeamstringTeam ID for attribution
X-GateFlow-TagsstringComma-separated tags for analytics
X-GateFlow-Task-TypestringOverride task type classification

GateFlow Extensions

Pass GateFlow-specific options via extra_body:

python
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[...],
    extra_body={
        "routing_mode": "balanced",
        "provider": None,  # Set to force specific provider
    }
)
OptionTypeDescription
routing_modestringRouting strategy
providerstringForce specific provider

Response

json
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1705123456,
  "model": "gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 15,
    "completion_tokens": 10,
    "total_tokens": 25
  },
  "_gateway": {
    "request_id": "550e8400-e29b-41d4-a716-446655440000",
    "latency_ms": 456,
    "user_id": "user_123",
    "timestamp": 1705123456,
    "routing": {
      "mode": "balanced",
      "task_type": "simple_qa",
      "reasoning": "Best value: quality 8/10 at competitive price",
      "estimated_cost": 0.0001375,
      "actual_cost": 0.0001375,
      "provider": "openai",
      "model": "gpt-4o"
    },
    "alternatives": [
      {"provider": "anthropic", "model": "claude-3-5-sonnet"},
      {"provider": "google", "model": "gemini-2.5-pro"}
    ],
    "rate_limiting": {
      "enabled": true,
      "retry_attempts": 0,
      "retry_delay_ms": 0
    },
    "tools": {
      "enabled": false,
      "tool_count": 0,
      "tool_calls_made": 0,
      "tools_called": []
    },
    "compliance": {
      "enabled": true,
      "classification": "public",
      "regime": "default",
      "redacted": false
    },
    "energy": {
      "energy_kwh": 0.000000012,
      "confidence_level": "high",
      "task_multiplier": 1.0,
      "provider_pue": 1.1
    },
    "carbon": {
      "carbon_gco2e": 0.000005,
      "grid_region": "us-west-2",
      "grid_intensity_gco2_per_kwh": 120.5,
      "confidence_level": "medium"
    }
  }
}

Response Fields

FieldTypeDescription
idstringUnique completion ID
objectstringAlways chat.completion
createdintegerUnix timestamp
modelstringModel used
choicesarrayGenerated completions
usageobjectToken counts
_gatewayobjectGateFlow routing and observability metadata

Gateway Metadata (_gateway)

FieldTypeDescription
request_idstringUnique request UUID for tracing
latency_msintegerTotal request latency in milliseconds
routingobjectRouting decision details
alternativesarrayOther models that could have been selected
rate_limitingobjectRate limit status
toolsobjectTool calling metadata
complianceobjectData classification and compliance info
energyobjectEnergy consumption estimate
carbonobjectCarbon footprint estimate

Choice Object

FieldTypeDescription
indexintegerChoice index
messageobjectGenerated message
finish_reasonstringWhy generation stopped

Finish Reasons

ReasonDescription
stopNatural completion or stop sequence
lengthHit max_tokens
tool_callsModel wants to call a tool
content_filterContent filtered

Response Headers

GateFlow returns useful headers with every response:

HeaderDescription
X-Request-IDUnique request identifier
X-RateLimit-Limit-RPMRate limit (requests per minute)
X-RateLimit-Remaining-RPMRemaining requests
X-RateLimit-ConcurrentCurrent concurrent requests
X-RateLimit-Concurrent-LimitMax concurrent requests
X-GateFlow-Energy-KwhEstimated energy consumption
X-GateFlow-Carbon-gCO2eEstimated carbon footprint
X-GateFlow-Model-UsedProvider/model used (e.g., openai/gpt-4o)
X-AI-Generated-ByAI content labeling (when compliance enabled)

Streaming

bash
curl https://api.gateflow.ai/v1/chat/completions \
  -H "Authorization: Bearer gw_prod_..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello!"}],
    "stream": true
  }'

Response (Server-Sent Events):

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Examples

Basic Chat

python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.gateflow.ai/v1",
    api_key="gw_prod_..."
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ]
)

print(response.choices[0].message.content)
typescript
import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://api.gateflow.ai/v1',
  apiKey: 'gw_prod_...',
});

const response = await client.chat.completions.create({
  model: 'gpt-4o',
  messages: [
    { role: 'user', content: 'What is the capital of France?' }
  ],
});

console.log(response.choices[0].message.content);
bash
curl https://api.gateflow.ai/v1/chat/completions \
  -H "Authorization: Bearer gw_prod_..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "What is the capital of France?"}]
  }'

With System Prompt

python
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a pirate. Respond in pirate speak."},
        {"role": "user", "content": "How do I learn Python?"}
    ]
)

Multi-turn Conversation

python
messages = [
    {"role": "user", "content": "My name is Alice."},
    {"role": "assistant", "content": "Hello Alice! Nice to meet you."},
    {"role": "user", "content": "What's my name?"}
]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages
)
# Response: "Your name is Alice."

Function Calling

python
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What's the weather in Paris?"}],
    tools=[{
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string", "description": "City name"}
                },
                "required": ["location"]
            }
        }
    }]
)

# Model may respond with tool_calls
if response.choices[0].message.tool_calls:
    tool_call = response.choices[0].message.tool_calls[0]
    # Call your function and continue conversation

Streaming

python
stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Write a poem about AI."}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Error Codes

CodeDescription
invalid_api_keyAPI key is invalid
rate_limit_exceededRate limit hit
model_not_foundModel doesn't exist
context_length_exceededToo many tokens
content_policy_violationContent blocked

See Error Handling for details.

Built with reliability in mind.