Chat Completions

Generate AI responses from a conversation with intelligent routing across multiple providers.

POST /v1/chat/completions

Overview

The Chat Completions endpoint is fully OpenAI-compatible and routes requests to the optimal provider based on task complexity and routing mode. GateFlow automatically:

Classifies your prompt's task type (coding, reasoning, simple Q&A, etc.)
Routes to the best model for that task type
Tracks requests with detailed timing, cost, and routing metrics
Handles rate limiting and request queuing
Provides energy and carbon tracking for sustainability

Request

cURLPythonTypeScript

bash

curl https://api.gateflow.ai/v1/chat/completions \
  -H "Authorization: Bearer gw_prod_..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Hello!"}
    ],
    "routing_mode": "balanced"
  }'

python

from openai import OpenAI

client = OpenAI(
    base_url="https://api.gateflow.ai/v1",
    api_key="gw_prod_..."
)

response = client.chat.completions.create(
    model="auto",  # Let GateFlow select the best model
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"}
    ],
    extra_body={"routing_mode": "balanced"}
)

typescript

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://api.gateflow.ai/v1',
  apiKey: 'gw_prod_...',
});

const response = await client.chat.completions.create({
  model: 'auto',
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'Hello!' }
  ],
});

Parameters

Required

Parameter	Type	Description
`model`	string	Model ID (e.g., `gpt-4o`, `claude-3-5-sonnet`) or `auto` for intelligent routing
`messages`	array	Conversation messages

Optional

Parameter	Type	Default	Description
`temperature`	number	1.0	Sampling temperature (0-2)
`max_tokens`	integer	varies	Maximum tokens to generate
`top_p`	number	1.0	Nucleus sampling parameter
`stream`	boolean	false	Enable SSE streaming
`stop`	string/array	null	Stop sequences
`presence_penalty`	number	0	Penalize repeated topics (-2 to 2)
`frequency_penalty`	number	0	Penalize repeated tokens (-2 to 2)
`tools`	array	null	Tools (functions) available to the model
`tool_choice`	string/object	null	Tool choice: `auto`, `none`, `required`, or specific function

GateFlow Routing Parameters

Parameter	Type	Default	Description
`routing_mode`	string	`balanced`	Routing strategy (see below)
`provider`	string	null	Force specific provider (bypasses routing)

Routing Modes

GateFlow supports five intelligent routing modes:

Mode	Description
`balanced`	Default. Best quality-to-price ratio. Recommended for most use cases.
`cost_optimized`	Selects cheapest model that meets quality threshold.
`performance`	Selects highest quality model regardless of cost.
`low_latency`	Optimizes for fastest response time.
`sustain_optimized`	Selects lowest-carbon model that meets quality threshold.

Use the X-Routing-Mode header or routing_mode body parameter:

bash

curl https://api.gateflow.ai/v1/chat/completions \
  -H "Authorization: Bearer gw_prod_..." \
  -H "X-Routing-Mode: cost_optimized" \
  -H "Content-Type: application/json" \
  -d '{"model": "auto", "messages": [...]}'

Messages

Each message has:

Field	Type	Required	Description
`role`	string	Yes	`system`, `user`, `assistant`, or `tool`
`content`	string/array	Conditional	Message content (required except for assistant with tool_calls)
`name`	string	No	Participant name
`tool_calls`	array	No	Tool calls made by assistant (for role=assistant)
`tool_call_id`	string	No	ID of tool call being responded to (for role=tool)

Content Types

json

// Text content
{"role": "user", "content": "Hello!"}

// Multimodal content (vision models)
{
  "role": "user",
  "content": [
    {"type": "text", "text": "What's in this image?"},
    {"type": "image_url", "image_url": {"url": "https://..."}}
  ]
}

// Tool response
{
  "role": "tool",
  "tool_call_id": "call_abc123",
  "content": "{\"temperature\": 72, \"condition\": \"sunny\"}"
}

Function/Tool Calling

GateFlow supports OpenAI-format tools with automatic translation to provider-specific formats:

python

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What's the weather in Paris?"}],
    tools=[{
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "City name"
                    }
                },
                "required": ["location"]
            }
        }
    }],
    tool_choice="auto"  # auto, none, required, or specific function
)

Tool Choice Options

Value	Description
`auto`	Model decides whether to call a tool
`none`	Model never calls tools
`required`	Model must call at least one tool
`{"type": "function", "function": {"name": "..."}}`	Force specific function

Legacy Functions

The functions and function_call parameters are deprecated. Use tools and tool_choice instead. GateFlow will automatically convert legacy format but will return a deprecation warning header.

Request Headers

Header	Type	Description
`Authorization`	string	Bearer token with your API key
`X-Routing-Mode`	string	Override routing mode
`X-GateFlow-Project`	string	Project ID for attribution
`X-GateFlow-Team`	string	Team ID for attribution
`X-GateFlow-Tags`	string	Comma-separated tags for analytics
`X-GateFlow-Task-Type`	string	Override task type classification

GateFlow Extensions

Pass GateFlow-specific options via extra_body:

python

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[...],
    extra_body={
        "routing_mode": "balanced",
        "provider": None,  # Set to force specific provider
    }
)

Option	Type	Description
`routing_mode`	string	Routing strategy
`provider`	string	Force specific provider

Response

json

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1705123456,
  "model": "gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 15,
    "completion_tokens": 10,
    "total_tokens": 25
  },
  "_gateway": {
    "request_id": "550e8400-e29b-41d4-a716-446655440000",
    "latency_ms": 456,
    "user_id": "user_123",
    "timestamp": 1705123456,
    "routing": {
      "mode": "balanced",
      "task_type": "simple_qa",
      "reasoning": "Best value: quality 8/10 at competitive price",
      "estimated_cost": 0.0001375,
      "actual_cost": 0.0001375,
      "provider": "openai",
      "model": "gpt-4o"
    },
    "alternatives": [
      {"provider": "anthropic", "model": "claude-3-5-sonnet"},
      {"provider": "google", "model": "gemini-2.5-pro"}
    ],
    "rate_limiting": {
      "enabled": true,
      "retry_attempts": 0,
      "retry_delay_ms": 0
    },
    "tools": {
      "enabled": false,
      "tool_count": 0,
      "tool_calls_made": 0,
      "tools_called": []
    },
    "compliance": {
      "enabled": true,
      "classification": "public",
      "regime": "default",
      "redacted": false
    },
    "energy": {
      "energy_kwh": 0.000000012,
      "confidence_level": "high",
      "task_multiplier": 1.0,
      "provider_pue": 1.1
    },
    "carbon": {
      "carbon_gco2e": 0.000005,
      "grid_region": "us-west-2",
      "grid_intensity_gco2_per_kwh": 120.5,
      "confidence_level": "medium"
    }
  }
}

Response Fields

Field	Type	Description
`id`	string	Unique completion ID
`object`	string	Always `chat.completion`
`created`	integer	Unix timestamp
`model`	string	Model used
`choices`	array	Generated completions
`usage`	object	Token counts
`_gateway`	object	GateFlow routing and observability metadata

Gateway Metadata (`_gateway`)

Field	Type	Description
`request_id`	string	Unique request UUID for tracing
`latency_ms`	integer	Total request latency in milliseconds
`routing`	object	Routing decision details
`alternatives`	array	Other models that could have been selected
`rate_limiting`	object	Rate limit status
`tools`	object	Tool calling metadata
`compliance`	object	Data classification and compliance info
`energy`	object	Energy consumption estimate
`carbon`	object	Carbon footprint estimate

Choice Object

Field	Type	Description
`index`	integer	Choice index
`message`	object	Generated message
`finish_reason`	string	Why generation stopped

Finish Reasons

Reason	Description
`stop`	Natural completion or stop sequence
`length`	Hit `max_tokens`
`tool_calls`	Model wants to call a tool
`content_filter`	Content filtered

Response Headers

GateFlow returns useful headers with every response:

Header	Description
`X-Request-ID`	Unique request identifier
`X-RateLimit-Limit-RPM`	Rate limit (requests per minute)
`X-RateLimit-Remaining-RPM`	Remaining requests
`X-RateLimit-Concurrent`	Current concurrent requests
`X-RateLimit-Concurrent-Limit`	Max concurrent requests
`X-GateFlow-Energy-Kwh`	Estimated energy consumption
`X-GateFlow-Carbon-gCO2e`	Estimated carbon footprint
`X-GateFlow-Model-Used`	Provider/model used (e.g., `openai/gpt-4o`)
`X-AI-Generated-By`	AI content labeling (when compliance enabled)

Streaming

bash

curl https://api.gateflow.ai/v1/chat/completions \
  -H "Authorization: Bearer gw_prod_..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello!"}],
    "stream": true
  }'

Response (Server-Sent Events):

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Examples

Basic Chat

PythonTypeScriptcURL

python

from openai import OpenAI

client = OpenAI(
    base_url="https://api.gateflow.ai/v1",
    api_key="gw_prod_..."
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ]
)

print(response.choices[0].message.content)

typescript

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://api.gateflow.ai/v1',
  apiKey: 'gw_prod_...',
});

const response = await client.chat.completions.create({
  model: 'gpt-4o',
  messages: [
    { role: 'user', content: 'What is the capital of France?' }
  ],
});

console.log(response.choices[0].message.content);

bash

curl https://api.gateflow.ai/v1/chat/completions \
  -H "Authorization: Bearer gw_prod_..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "What is the capital of France?"}]
  }'

With System Prompt

python

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a pirate. Respond in pirate speak."},
        {"role": "user", "content": "How do I learn Python?"}
    ]
)

Multi-turn Conversation

python

messages = [
    {"role": "user", "content": "My name is Alice."},
    {"role": "assistant", "content": "Hello Alice! Nice to meet you."},
    {"role": "user", "content": "What's my name?"}
]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages
)
# Response: "Your name is Alice."

Function Calling

python

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What's the weather in Paris?"}],
    tools=[{
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string", "description": "City name"}
                },
                "required": ["location"]
            }
        }
    }]
)

# Model may respond with tool_calls
if response.choices[0].message.tool_calls:
    tool_call = response.choices[0].message.tool_calls[0]
    # Call your function and continue conversation

Streaming

python

stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Write a poem about AI."}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Error Codes

Code	Description
`invalid_api_key`	API key is invalid
`rate_limit_exceeded`	Rate limit hit
`model_not_found`	Model doesn't exist
`context_length_exceeded`	Too many tokens
`content_policy_violation`	Content blocked

See Error Handling for details.

Chat Completions ​

Overview ​

Request ​

Parameters ​

Required ​

Optional ​

GateFlow Routing Parameters ​

Routing Modes ​

Messages ​

Content Types ​

Function/Tool Calling ​

Tool Choice Options ​

Request Headers ​

GateFlow Extensions ​

Response ​

Response Fields ​

Gateway Metadata (_gateway) ​

Choice Object ​

Finish Reasons ​

Response Headers ​

Streaming ​

Examples ​

Basic Chat ​

With System Prompt ​

Multi-turn Conversation ​

Function Calling ​

Streaming ​

Error Codes ​

Chat Completions

Overview

Request

Parameters

Required

Optional

GateFlow Routing Parameters

Routing Modes

Messages

Content Types

Function/Tool Calling

Tool Choice Options

Request Headers

GateFlow Extensions

Response

Response Fields

Gateway Metadata (`_gateway`)

Choice Object

Finish Reasons

Response Headers

Streaming

Examples

Basic Chat

With System Prompt

Multi-turn Conversation

Function Calling

Streaming

Error Codes