Skip to content

OpenAI Integration

GateFlow provides comprehensive support for OpenAI's advanced AI models with sustainability optimization and carbon-efficient routing.

Available Models

Chat Models

GPT-5.2 Family (Flagship Models)

  • gpt-5.2 - Flagship model with extended reasoning capabilities

    • Context window: 200,000 tokens
    • Max output: 16,384 tokens
    • Supports: Chat, Vision, Functions, Streaming
    • Average latency: 1,500ms
    • Quality score: 10/10
  • gpt-5.2-instant - Fast variant with optimized latency

    • Context window: 200,000 tokens
    • Max output: 16,384 tokens
    • Supports: Chat, Vision, Functions, Streaming
    • Average latency: 800ms
    • Quality score: 9/10
  • gpt-5.2-codex - Specialized for code generation and debugging

    • Context window: 200,000 tokens
    • Max output: 16,384 tokens
    • Supports: Chat, Functions, Streaming
    • Average latency: 1,600ms
    • Quality score: 10/10

GPT-5 Family (Production Models)

  • gpt-5 - Balanced model for production use

    • Context window: 128,000 tokens
    • Max output: 16,384 tokens
    • Supports: Chat, Vision, Functions, Streaming
    • Average latency: 1,000ms
    • Quality score: 9/10
  • gpt-5-mini - Cost-effective for high-volume applications

    • Context window: 128,000 tokens
    • Max output: 16,384 tokens
    • Supports: Chat, Functions, Streaming
    • Average latency: 500ms
    • Quality score: 8/10
  • gpt-5-nano - Ultra-fast for simple tasks

    • Context window: 128,000 tokens
    • Max output: 4,096 tokens
    • Supports: Chat, Streaming
    • Average latency: 300ms
    • Quality score: 7/10

Specialized Models

  • o3 - Advanced reasoning model

    • Context window: 200,000 tokens
    • Max output: 100,000 tokens
    • Supports: Chat
    • Average latency: 3,000ms
    • Quality score: 10/10
  • o4-mini - Fast reasoning model

    • Context window: 128,000 tokens
    • Max output: 65,536 tokens
    • Supports: Chat
    • Average latency: 1,800ms
    • Quality score: 9/10

Embedding Models

  • text-embedding-3-large - High-quality embeddings (3,072 dimensions)

    • Context window: 8,191 tokens
    • Price: $0.13 per 1M tokens
    • Average latency: 150ms
    • Quality score: 10/10
  • text-embedding-3-small - Fast embeddings (1,536 dimensions)

    • Context window: 8,191 tokens
    • Price: $0.02 per 1M tokens
    • Average latency: 100ms
    • Quality score: 8/10

Sustainability Features

OpenAI integration through GateFlow offers several sustainability benefits:

  • Carbon-Optimized Routing: Automatically select the most energy-efficient data center
  • Model Efficiency: GPT-5.2 models are significantly more efficient than previous generations
  • Time-Shifted Execution: Defer non-urgent requests to low-carbon periods
  • Request Batching: Combine multiple requests for reduced overhead
  • Automatic Model Selection: Choose the most efficient model for your task

Example Usage

Basic Chat Completion

python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.gateflow.ai/v1",
    api_key="gw_prod_your_key_here"
)

# Using GPT-5.2 for complex reasoning with sustainability optimization
response = client.chat.completions.create(
    model="gpt-5.2",
    messages=[{"role": "user", "content": "Analyze this complex document and provide insights"}],
    routing_mode="sustain_optimized",  # Enable carbon-optimized routing
    maximum_quality_score=9  # Balance quality and efficiency
)

print(f"Response: {response.choices[0].message.content}")
print(f"Model used: {response.model}")
print(f"Carbon footprint: {response.sustainability.carbon_gco2e} gCO₂e")
print(f"Carbon saved: {response.sustainability.carbon_saved_gco2e} gCO₂e")

Using Embeddings for RAG

python
# High-quality embeddings for semantic search
embedding_response = client.embeddings.create(
    model="text-embedding-3-large",
    input=[
        "Document 1 content about sustainability practices",
        "Document 2 content about renewable energy",
        "User query about eco-friendly AI solutions"
    ],
    routing_mode="sustain_optimized"
)

# Use embeddings for semantic search
for i, embedding in enumerate(embedding_response.data):
    print(f"Embedding {i+1}: {len(embedding.embedding)} dimensions")
    print(f"Carbon footprint: {embedding.sustainability.carbon_gco2e} gCO₂e")

Advanced Reasoning with O3

python
# Using O3 for complex reasoning tasks
response = client.chat.completions.create(
    model="o3",
    messages=[{"role": "user", "content": "Solve this complex mathematical problem step by step"}],
    routing_mode="sustain_optimized",
    timeout_seconds=30  # Allow extra time for complex reasoning
)

print(f"Reasoning steps: {response.choices[0].message.content}")

Code Generation with GPT-5.2 Codex

python
# Specialized code generation
response = client.chat.completions.create(
    model="gpt-5.2-codex",
    messages=[{"role": "user", "content": "Generate Python code for a sustainable AI pipeline"}],
    routing_mode="sustain_optimized"
)

print(f"Generated code:\n{response.choices[0].message.content}")

OpenAI-Specific Features

Function Calling

python
# Define functions (works across all OpenAI models)
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get weather information for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string"},
                    "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
                }
            }
        }
    }
]

response = client.chat.completions.create(
    model="gpt-5.2",
    messages=[{"role": "user", "content": "What's the weather in Paris?"}],
    tools=tools,
    routing_mode="sustain_optimized"
)

Vision Capabilities

python
# Multi-modal input with vision
response = client.chat.completions.create(
    model="gpt-5.2",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Analyze this sustainability chart"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://example.com/sustainability-chart.png"
                    }
                }
            ]
        }
    ],
    routing_mode="sustain_optimized"
)

Streaming Responses

python
# Stream responses for better user experience
stream = client.chat.completions.create(
    model="gpt-5.2-instant",
    messages=[{"role": "user", "content": "Generate a detailed sustainability report"}],
    stream=True,
    routing_mode="sustain_optimized"
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Model Selection Guide

Use CaseRecommended ModelKey FeaturesSustainability Benefits
Complex reasoninggpt-5.2200K context, multimodalHighest efficiency per token
Fast responsesgpt-5.2-instant800ms latencyOptimized for speed and efficiency
Code generationgpt-5.2-codexSpecialized codingReduced compute for coding tasks
Production usegpt-5Balanced performanceBest quality-to-carbon ratio
High-volume appsgpt-5-miniCost-effectiveLowest carbon footprint
Simple tasksgpt-5-nanoUltra-fastMinimal energy consumption
Advanced reasoningo3100K output tokensOptimized for complex tasks
Fast reasoningo4-mini65K output tokensEfficient reasoning architecture
High-quality embeddingstext-embedding-3-large3,072 dimensionsOptimized embedding generation
Cost-effective embeddingstext-embedding-3-small1,536 dimensionsLowest energy embeddings

Sustainability Best Practices

Optimization Strategies

  1. Right-size your model: Use gpt-5-mini or gpt-5-nano for simple tasks instead of flagship models
  2. Enable Sustain Mode: Let GateFlow automatically choose the most efficient OpenAI model
  3. Use time-shifting: Defer non-urgent requests to low-carbon periods
  4. Batch requests: Process multiple items in single API calls to reduce overhead
  5. Combine with caching: Cache frequent OpenAI requests for maximum savings
  6. Region optimization: Select data centers in low-carbon regions

Configuration Example

python
# Configure OpenAI provider with sustainability settings
response = client.chat.completions.create(
    model="auto",  # Let GateFlow choose most efficient OpenAI model
    messages=[{"role": "user", "content": "Process this sustainably"}],
    routing_mode="sustain_optimized",
    minimum_quality_score=8,  # Balance quality and efficiency
    region_preference="us-west",  # Prioritize low-carbon regions
    max_carbon_budget_gco2e=50  # Set maximum carbon budget
)

Performance Characteristics

Latency Comparison

  • Fastest: gpt-5-nano (300ms)
  • Balanced: gpt-5-mini (500ms), gpt-5.2-instant (800ms)
  • Standard: gpt-5 (1,000ms), gpt-5.2 (1,500ms)
  • Advanced: o4-mini (1,800ms), o3 (3,000ms)

Token Limits

  • Standard models: 128K-200K context window
  • Output limits: 4K-100K tokens depending on model
  • Embedding models: 8K token input limit

Pricing Overview

  • Input prices: $0.10-$10.00 per 1M tokens
  • Output prices: $0.04-$40.00 per 1M tokens
  • Embeddings: $0.02-$0.13 per 1M tokens

Integration with Other GateFlow Features

Multi-Provider Fallbacks

python
# Configure OpenAI as primary with fallbacks
response = client.chat.completions.create(
    model="gpt-5.2",  # Primary: OpenAI
    messages=[{"role": "user", "content": "Important request"}],
    fallback_providers=["anthropic", "mistral"],  # Fallback chain
    routing_mode="sustain_optimized"
)

Semantic Caching

python
# Cache frequent OpenAI requests
response = client.chat.completions.create(
    model="gpt-5",
    messages=[{"role": "user", "content": "Frequently asked sustainability question"}],
    cache_ttl_seconds=3600,  # Cache for 1 hour
    embedding_model="text-embedding-3-small"  # Use for semantic matching
)

Troubleshooting

"OpenAI API key not configured"

Solution: Add your OpenAI API key in the GateFlow Dashboard under Settings → Providers.

"Model not found: gpt-4-turbo"

Solution: Use current models like gpt-5 instead of deprecated models. Check the model compatibility guide.

"Rate limit exceeded"

Solution:

  1. Check your OpenAI account limits
  2. Configure fallbacks to other providers
  3. Enable request queuing in GateFlow settings
  4. Use gpt-5-mini for high-volume applications

"Carbon savings lower than expected"

Solution:

  1. Verify Sustain Mode is properly configured
  2. Check grid carbon intensity in your region
  3. Try different OpenAI models for better efficiency
  4. Enable time-shifted execution for non-urgent requests

Migration from Direct OpenAI API

Key Differences

FeatureDirect OpenAI APIGateFlow OpenAI Integration
API FormatOpenAI-specificOpenAI-compatible
AuthenticationOpenAI API keyGateFlow API key
Model Namesgpt-4, gpt-3.5-turbogpt-5.2, gpt-5
Carbon TrackingManualAutomatic
Multi-providerNoYes
FallbacksManualAutomatic
SustainabilityBasicAdvanced optimization

Migration Example

Before (Direct OpenAI API):

python
import openai
openai.api_key = "sk-proj-..."
response = openai.ChatCompletion.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello from OpenAI!"}]
)

After (GateFlow Integration):

python
from openai import OpenAI
client = OpenAI(
    base_url="https://api.gateflow.ai/v1",
    api_key="gw_prod_your_gateflow_key"
)
response = client.chat.completions.create(
    model="gpt-5.2",  # Use current models
    messages=[{"role": "user", "content": "Hello from OpenAI via GateFlow with sustainability benefits!"}],
    routing_mode="sustain_optimized"  # Enable carbon optimization
)

Next Steps

Built with reliability in mind.