Appearance
Chat Completions
Generate AI responses from a conversation with intelligent routing across multiple providers.
POST /v1/chat/completionsOverview
The Chat Completions endpoint is fully OpenAI-compatible and routes requests to the optimal provider based on task complexity and routing mode. GateFlow automatically:
- Classifies your prompt's task type (coding, reasoning, simple Q&A, etc.)
- Routes to the best model for that task type
- Tracks requests with detailed timing, cost, and routing metrics
- Handles rate limiting and request queuing
- Provides energy and carbon tracking for sustainability
Request
bash
curl https://api.gateflow.ai/v1/chat/completions \
-H "Authorization: Bearer gw_prod_..." \
-H "Content-Type: application/json" \
-d '{
"model": "auto",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
],
"routing_mode": "balanced"
}'python
from openai import OpenAI
client = OpenAI(
base_url="https://api.gateflow.ai/v1",
api_key="gw_prod_..."
)
response = client.chat.completions.create(
model="auto", # Let GateFlow select the best model
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
],
extra_body={"routing_mode": "balanced"}
)typescript
import OpenAI from 'openai';
const client = new OpenAI({
baseURL: 'https://api.gateflow.ai/v1',
apiKey: 'gw_prod_...',
});
const response = await client.chat.completions.create({
model: 'auto',
messages: [
{ role: 'system', content: 'You are a helpful assistant.' },
{ role: 'user', content: 'Hello!' }
],
});Parameters
Required
| Parameter | Type | Description |
|---|---|---|
model | string | Model ID (e.g., gpt-4o, claude-3-5-sonnet) or auto for intelligent routing |
messages | array | Conversation messages |
Optional
| Parameter | Type | Default | Description |
|---|---|---|---|
temperature | number | 1.0 | Sampling temperature (0-2) |
max_tokens | integer | varies | Maximum tokens to generate |
top_p | number | 1.0 | Nucleus sampling parameter |
stream | boolean | false | Enable SSE streaming |
stop | string/array | null | Stop sequences |
presence_penalty | number | 0 | Penalize repeated topics (-2 to 2) |
frequency_penalty | number | 0 | Penalize repeated tokens (-2 to 2) |
tools | array | null | Tools (functions) available to the model |
tool_choice | string/object | null | Tool choice: auto, none, required, or specific function |
GateFlow Routing Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
routing_mode | string | balanced | Routing strategy (see below) |
provider | string | null | Force specific provider (bypasses routing) |
Routing Modes
GateFlow supports five intelligent routing modes:
| Mode | Description |
|---|---|
balanced | Default. Best quality-to-price ratio. Recommended for most use cases. |
cost_optimized | Selects cheapest model that meets quality threshold. |
performance | Selects highest quality model regardless of cost. |
low_latency | Optimizes for fastest response time. |
sustain_optimized | Selects lowest-carbon model that meets quality threshold. |
Use the X-Routing-Mode header or routing_mode body parameter:
bash
curl https://api.gateflow.ai/v1/chat/completions \
-H "Authorization: Bearer gw_prod_..." \
-H "X-Routing-Mode: cost_optimized" \
-H "Content-Type: application/json" \
-d '{"model": "auto", "messages": [...]}'Messages
Each message has:
| Field | Type | Required | Description |
|---|---|---|---|
role | string | Yes | system, user, assistant, or tool |
content | string/array | Conditional | Message content (required except for assistant with tool_calls) |
name | string | No | Participant name |
tool_calls | array | No | Tool calls made by assistant (for role=assistant) |
tool_call_id | string | No | ID of tool call being responded to (for role=tool) |
Content Types
json
// Text content
{"role": "user", "content": "Hello!"}
// Multimodal content (vision models)
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{"type": "image_url", "image_url": {"url": "https://..."}}
]
}
// Tool response
{
"role": "tool",
"tool_call_id": "call_abc123",
"content": "{\"temperature\": 72, \"condition\": \"sunny\"}"
}Function/Tool Calling
GateFlow supports OpenAI-format tools with automatic translation to provider-specific formats:
python
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "What's the weather in Paris?"}],
tools=[{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City name"
}
},
"required": ["location"]
}
}
}],
tool_choice="auto" # auto, none, required, or specific function
)Tool Choice Options
| Value | Description |
|---|---|
auto | Model decides whether to call a tool |
none | Model never calls tools |
required | Model must call at least one tool |
{"type": "function", "function": {"name": "..."}} | Force specific function |
Legacy Functions
The functions and function_call parameters are deprecated. Use tools and tool_choice instead. GateFlow will automatically convert legacy format but will return a deprecation warning header.
Request Headers
| Header | Type | Description |
|---|---|---|
Authorization | string | Bearer token with your API key |
X-Routing-Mode | string | Override routing mode |
X-GateFlow-Project | string | Project ID for attribution |
X-GateFlow-Team | string | Team ID for attribution |
X-GateFlow-Tags | string | Comma-separated tags for analytics |
X-GateFlow-Task-Type | string | Override task type classification |
GateFlow Extensions
Pass GateFlow-specific options via extra_body:
python
response = client.chat.completions.create(
model="gpt-4o",
messages=[...],
extra_body={
"routing_mode": "balanced",
"provider": None, # Set to force specific provider
}
)| Option | Type | Description |
|---|---|---|
routing_mode | string | Routing strategy |
provider | string | Force specific provider |
Response
json
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1705123456,
"model": "gpt-4o",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I help you today?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 15,
"completion_tokens": 10,
"total_tokens": 25
},
"_gateway": {
"request_id": "550e8400-e29b-41d4-a716-446655440000",
"latency_ms": 456,
"user_id": "user_123",
"timestamp": 1705123456,
"routing": {
"mode": "balanced",
"task_type": "simple_qa",
"reasoning": "Best value: quality 8/10 at competitive price",
"estimated_cost": 0.0001375,
"actual_cost": 0.0001375,
"provider": "openai",
"model": "gpt-4o"
},
"alternatives": [
{"provider": "anthropic", "model": "claude-3-5-sonnet"},
{"provider": "google", "model": "gemini-2.5-pro"}
],
"rate_limiting": {
"enabled": true,
"retry_attempts": 0,
"retry_delay_ms": 0
},
"tools": {
"enabled": false,
"tool_count": 0,
"tool_calls_made": 0,
"tools_called": []
},
"compliance": {
"enabled": true,
"classification": "public",
"regime": "default",
"redacted": false
},
"energy": {
"energy_kwh": 0.000000012,
"confidence_level": "high",
"task_multiplier": 1.0,
"provider_pue": 1.1
},
"carbon": {
"carbon_gco2e": 0.000005,
"grid_region": "us-west-2",
"grid_intensity_gco2_per_kwh": 120.5,
"confidence_level": "medium"
}
}
}Response Fields
| Field | Type | Description |
|---|---|---|
id | string | Unique completion ID |
object | string | Always chat.completion |
created | integer | Unix timestamp |
model | string | Model used |
choices | array | Generated completions |
usage | object | Token counts |
_gateway | object | GateFlow routing and observability metadata |
Gateway Metadata (_gateway)
| Field | Type | Description |
|---|---|---|
request_id | string | Unique request UUID for tracing |
latency_ms | integer | Total request latency in milliseconds |
routing | object | Routing decision details |
alternatives | array | Other models that could have been selected |
rate_limiting | object | Rate limit status |
tools | object | Tool calling metadata |
compliance | object | Data classification and compliance info |
energy | object | Energy consumption estimate |
carbon | object | Carbon footprint estimate |
Choice Object
| Field | Type | Description |
|---|---|---|
index | integer | Choice index |
message | object | Generated message |
finish_reason | string | Why generation stopped |
Finish Reasons
| Reason | Description |
|---|---|
stop | Natural completion or stop sequence |
length | Hit max_tokens |
tool_calls | Model wants to call a tool |
content_filter | Content filtered |
Response Headers
GateFlow returns useful headers with every response:
| Header | Description |
|---|---|
X-Request-ID | Unique request identifier |
X-RateLimit-Limit-RPM | Rate limit (requests per minute) |
X-RateLimit-Remaining-RPM | Remaining requests |
X-RateLimit-Concurrent | Current concurrent requests |
X-RateLimit-Concurrent-Limit | Max concurrent requests |
X-GateFlow-Energy-Kwh | Estimated energy consumption |
X-GateFlow-Carbon-gCO2e | Estimated carbon footprint |
X-GateFlow-Model-Used | Provider/model used (e.g., openai/gpt-4o) |
X-AI-Generated-By | AI content labeling (when compliance enabled) |
Streaming
bash
curl https://api.gateflow.ai/v1/chat/completions \
-H "Authorization: Bearer gw_prod_..." \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Hello!"}],
"stream": true
}'Response (Server-Sent Events):
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
data: [DONE]Examples
Basic Chat
python
from openai import OpenAI
client = OpenAI(
base_url="https://api.gateflow.ai/v1",
api_key="gw_prod_..."
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "user", "content": "What is the capital of France?"}
]
)
print(response.choices[0].message.content)typescript
import OpenAI from 'openai';
const client = new OpenAI({
baseURL: 'https://api.gateflow.ai/v1',
apiKey: 'gw_prod_...',
});
const response = await client.chat.completions.create({
model: 'gpt-4o',
messages: [
{ role: 'user', content: 'What is the capital of France?' }
],
});
console.log(response.choices[0].message.content);bash
curl https://api.gateflow.ai/v1/chat/completions \
-H "Authorization: Bearer gw_prod_..." \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "What is the capital of France?"}]
}'With System Prompt
python
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are a pirate. Respond in pirate speak."},
{"role": "user", "content": "How do I learn Python?"}
]
)Multi-turn Conversation
python
messages = [
{"role": "user", "content": "My name is Alice."},
{"role": "assistant", "content": "Hello Alice! Nice to meet you."},
{"role": "user", "content": "What's my name?"}
]
response = client.chat.completions.create(
model="gpt-4o",
messages=messages
)
# Response: "Your name is Alice."Function Calling
python
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "What's the weather in Paris?"}],
tools=[{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City name"}
},
"required": ["location"]
}
}
}]
)
# Model may respond with tool_calls
if response.choices[0].message.tool_calls:
tool_call = response.choices[0].message.tool_calls[0]
# Call your function and continue conversationStreaming
python
stream = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Write a poem about AI."}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")Error Codes
| Code | Description |
|---|---|
invalid_api_key | API key is invalid |
rate_limit_exceeded | Rate limit hit |
model_not_found | Model doesn't exist |
context_length_exceeded | Too many tokens |
content_policy_violation | Content blocked |
See Error Handling for details.