Skip to content

Google Integration

Google's Gemini models provide advanced AI capabilities with large context windows and multimodal support. GateFlow integrates seamlessly with Google's AI platform for optimized routing and sustainability.

Available Models

Chat Models

Model IDTypeContext WindowBest For
gemini-3-proChat2M tokensComplex reasoning, multi-turn conversations
gemini-3-flashChat1M tokensFast, cost-effective responses
gemini-2.5-proChat1M tokensBalanced performance
gemini-2.5-flashChat1M tokensHigh-speed, low-cost tasks
gemini-2.5-flash-liteChat500K tokensLightweight tasks

Embedding Models

Model IDDimensionsMax TokensBest For
text-embedding-0047682,048Semantic search, clustering

Configuration

json
{
  "provider": "google",
  "credentials": {
    "api_key": "AIza..."
  }
}

Pricing

ModelInput ($/M tokens)Output ($/M tokens)
gemini-3-pro$3.50$14.00
gemini-3-flash$0.10$0.40
gemini-2.5-pro$1.25$5.00
gemini-2.5-flash$0.075$0.30
gemini-2.5-flash-lite$0.05$0.20
text-embedding-004$0.025N/A

Sustainability Features

Google integration through GateFlow offers several sustainability benefits:

  • Carbon-Neutral Data Centers: Google's data centers run on 100% renewable energy
  • TPU Acceleration: Energy-efficient Tensor Processing Units for AI workloads
  • Intelligent Routing: Automatically select the most energy-efficient data center
  • Time-Shifted Execution: Defer non-urgent requests to low-carbon periods
  • Automatic Model Selection: Choose the most efficient Gemini model for your task

Example Usage

Basic Chat Completion

python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.gateflow.ai/v1",
    api_key="gw_prod_your_key_here"
)

# Use Google Gemini for large context tasks
response = client.chat.completions.create(
    model="gemini-3-pro",
    messages=[{"role": "user", "content": "Analyze this 1M token document"}],
    routing_mode="sustain_optimized"
)

print(f"Response: {response.choices[0].message.content}")
print(f"Model used: {response.model}")
print(f"Carbon footprint: {response.sustainability.carbon_gco2e} gCO₂e")
print(f"Carbon saved: {response.sustainability.carbon_saved_gco2e} gCO₂e")

Using Embeddings

python
# Generate embeddings for semantic search
embedding_response = client.embeddings.create(
    model="text-embedding-004",
    input=[
        "Document 1 content",
        "Document 2 content",
        "User query"
    ],
    routing_mode="sustain_optimized"
)

for i, embedding in enumerate(embedding_response.data):
    print(f"Embedding {i+1}: {len(embedding.embedding)} dimensions")
    print(f"Carbon footprint: {embedding.sustainability.carbon_gco2e} gCO₂e")

Large Context Processing

python
# Process very large documents with Gemini 3 Pro
response = client.chat.completions.create(
    model="gemini-3-pro",
    messages=[{"role": "user", "content": "Summarize this 500K token research paper"}],
    routing_mode="sustain_optimized",
    max_tokens=4096
)

print(f"Summary: {response.choices[0].message.content}")

Google-Specific Features

Multi-modal Support

python
# Multi-modal input with images
response = client.chat.completions.create(
    model="gemini-3-pro",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Analyze this chart"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://example.com/chart.png"
                    }
                }
            ]
        }
    ],
    routing_mode="sustain_optimized"
)

Function Calling

python
# Define functions for tool use
tools = [
    {
        "type": "function",
        "function": {
            "name": "search_knowledge_base",
            "description": "Search company knowledge base",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {"type": "string"}
                }
            }
        }
    }
]

response = client.chat.completions.create(
    model="gemini-3-pro",
    messages=[{"role": "user", "content": "Find information about our sustainability initiatives"}],
    tools=tools,
    routing_mode="sustain_optimized"
)

Model Selection Guide

Use CaseRecommended ModelKey FeaturesSustainability Benefits
Large contextgemini-3-pro2M context windowCarbon-neutral data centers
Fast reasoninggemini-3-flash400ms latencyTPU-optimized efficiency
Balanced performancegemini-2.5-pro1M contextBest quality-to-carbon ratio
High-volumegemini-2.5-flashUltra-fastLowest carbon footprint
Lightweight tasksgemini-2.5-flash-liteCost-effectiveMinimal energy consumption
Embeddingstext-embedding-004768 dimensionsOptimized embedding generation

Sustainability Best Practices

Optimization Strategies

  1. Right-size your model: Use gemini-2.5-flash for simple tasks instead of Pro models
  2. Enable Sustain Mode: Let GateFlow automatically choose the most efficient Google model
  3. Use time-shifting: Defer non-urgent requests to low-carbon periods
  4. Batch requests: Process multiple items in single API calls to reduce overhead
  5. Leverage TPUs: Google's Tensor Processing Units provide energy-efficient acceleration

Configuration Example

python
# Configure Google provider with sustainability settings
response = client.chat.completions.create(
    model="google:auto",  # Let GateFlow choose most efficient Google model
    messages=[{"role": "user", "content": "Process this sustainably"}],
    routing_mode="sustain_optimized",
    minimum_quality_score=8,  # Balance quality and efficiency
    region_preference="us-central1"  # Prioritize Google's carbon-neutral regions
)

Performance Characteristics

Latency Comparison

  • Fastest: gemini-2.5-flash (200ms)
  • Balanced: gemini-3-flash (400ms)
  • Standard: gemini-2.5-pro (900ms)
  • Advanced: gemini-3-pro (1,200ms)
  • Deep Think: gemini-3-deep-think (2,500ms)

Token Limits

  • Gemini 3 Pro: 2M context window, 8K output tokens
  • Gemini 3 Flash: 1M context window, 8K output tokens
  • Gemini 2.5 models: 1M-2M context window, 8K output tokens
  • Embedding model: 2K token input limit

Pricing Overview

  • Input prices: $0.05-$3.50 per 1M tokens
  • Output prices: $0.20-$14.00 per 1M tokens
  • Embeddings: $0.025 per 1M tokens

Integration with Other GateFlow Features

Multi-Provider Fallbacks

python
# Configure Google as primary with fallbacks
response = client.chat.completions.create(
    model="gemini-3-pro",  # Primary: Google
    messages=[{"role": "user", "content": "Important request"}],
    fallback_providers=["anthropic", "openai"],  # Fallback chain
    routing_mode="sustain_optimized"
)

Semantic Caching

python
# Cache frequent Google requests
response = client.chat.completions.create(
    model="gemini-2.5-pro",
    messages=[{"role": "user", "content": "Frequently asked question"}],
    cache_ttl_seconds=3600,  # Cache for 1 hour
    embedding_model="text-embedding-004"  # Use Google embeddings for semantic matching
)

Troubleshooting

"Google API key not configured"

Solution: Add your Google API key in the GateFlow Dashboard under Settings → Providers.

"Model not found: gemini-1.5-pro"

Solution: Use current models like gemini-3-pro instead of deprecated models.

"Rate limit exceeded"

Solution:

  1. Check your Google Cloud quota
  2. Configure fallbacks to other providers
  3. Enable request queuing in GateFlow settings
  4. Use gemini-2.5-flash for high-volume applications

"Carbon savings lower than expected"

Solution:

  1. Verify Sustain Mode is properly configured
  2. Check grid carbon intensity in your region
  3. Try different Google models for better efficiency
  4. Enable time-shifted execution for non-urgent requests

Migration from Direct Google API

Key Differences

FeatureDirect Google APIGateFlow Google Integration
API FormatGoogle-specificOpenAI-compatible
AuthenticationGoogle API keyGateFlow API key
Model Namesgemini-1.5-progemini-3-pro
Carbon TrackingManualAutomatic
Multi-providerNoYes
FallbacksManualAutomatic
SustainabilityBasicAdvanced optimization

Migration Example

Before (Direct Google API):

python
import google.generativeai as genai
genai.configure(api_key="your-google-api-key")
model = genai.GenerativeModel("gemini-1.5-pro")
response = model.generate_content("Hello from Google!")

After (GateFlow Integration):

python
from openai import OpenAI
client = OpenAI(
    base_url="https://api.gateflow.ai/v1",
    api_key="gw_prod_your_gateflow_key"
)
response = client.chat.completions.create(
    model="gemini-3-pro",  # Use current models
    messages=[{"role": "user", "content": "Hello from Google via GateFlow with sustainability benefits!"}],
    routing_mode="sustain_optimized"  # Enable carbon optimization
)

Next Steps

Built with reliability in mind.