Appearance
Google Integration
Google's Gemini models provide advanced AI capabilities with large context windows and multimodal support. GateFlow integrates seamlessly with Google's AI platform for optimized routing and sustainability.
Available Models
Chat Models
| Model ID | Type | Context Window | Best For |
|---|---|---|---|
gemini-3-pro | Chat | 2M tokens | Complex reasoning, multi-turn conversations |
gemini-3-flash | Chat | 1M tokens | Fast, cost-effective responses |
gemini-2.5-pro | Chat | 1M tokens | Balanced performance |
gemini-2.5-flash | Chat | 1M tokens | High-speed, low-cost tasks |
gemini-2.5-flash-lite | Chat | 500K tokens | Lightweight tasks |
Embedding Models
| Model ID | Dimensions | Max Tokens | Best For |
|---|---|---|---|
text-embedding-004 | 768 | 2,048 | Semantic search, clustering |
Configuration
json
{
"provider": "google",
"credentials": {
"api_key": "AIza..."
}
}Pricing
| Model | Input ($/M tokens) | Output ($/M tokens) |
|---|---|---|
gemini-3-pro | $3.50 | $14.00 |
gemini-3-flash | $0.10 | $0.40 |
gemini-2.5-pro | $1.25 | $5.00 |
gemini-2.5-flash | $0.075 | $0.30 |
gemini-2.5-flash-lite | $0.05 | $0.20 |
text-embedding-004 | $0.025 | N/A |
Sustainability Features
Google integration through GateFlow offers several sustainability benefits:
- Carbon-Neutral Data Centers: Google's data centers run on 100% renewable energy
- TPU Acceleration: Energy-efficient Tensor Processing Units for AI workloads
- Intelligent Routing: Automatically select the most energy-efficient data center
- Time-Shifted Execution: Defer non-urgent requests to low-carbon periods
- Automatic Model Selection: Choose the most efficient Gemini model for your task
Example Usage
Basic Chat Completion
python
from openai import OpenAI
client = OpenAI(
base_url="https://api.gateflow.ai/v1",
api_key="gw_prod_your_key_here"
)
# Use Google Gemini for large context tasks
response = client.chat.completions.create(
model="gemini-3-pro",
messages=[{"role": "user", "content": "Analyze this 1M token document"}],
routing_mode="sustain_optimized"
)
print(f"Response: {response.choices[0].message.content}")
print(f"Model used: {response.model}")
print(f"Carbon footprint: {response.sustainability.carbon_gco2e} gCO₂e")
print(f"Carbon saved: {response.sustainability.carbon_saved_gco2e} gCO₂e")Using Embeddings
python
# Generate embeddings for semantic search
embedding_response = client.embeddings.create(
model="text-embedding-004",
input=[
"Document 1 content",
"Document 2 content",
"User query"
],
routing_mode="sustain_optimized"
)
for i, embedding in enumerate(embedding_response.data):
print(f"Embedding {i+1}: {len(embedding.embedding)} dimensions")
print(f"Carbon footprint: {embedding.sustainability.carbon_gco2e} gCO₂e")Large Context Processing
python
# Process very large documents with Gemini 3 Pro
response = client.chat.completions.create(
model="gemini-3-pro",
messages=[{"role": "user", "content": "Summarize this 500K token research paper"}],
routing_mode="sustain_optimized",
max_tokens=4096
)
print(f"Summary: {response.choices[0].message.content}")Google-Specific Features
Multi-modal Support
python
# Multi-modal input with images
response = client.chat.completions.create(
model="gemini-3-pro",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "Analyze this chart"},
{
"type": "image_url",
"image_url": {
"url": "https://example.com/chart.png"
}
}
]
}
],
routing_mode="sustain_optimized"
)Function Calling
python
# Define functions for tool use
tools = [
{
"type": "function",
"function": {
"name": "search_knowledge_base",
"description": "Search company knowledge base",
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string"}
}
}
}
}
]
response = client.chat.completions.create(
model="gemini-3-pro",
messages=[{"role": "user", "content": "Find information about our sustainability initiatives"}],
tools=tools,
routing_mode="sustain_optimized"
)Model Selection Guide
| Use Case | Recommended Model | Key Features | Sustainability Benefits |
|---|---|---|---|
| Large context | gemini-3-pro | 2M context window | Carbon-neutral data centers |
| Fast reasoning | gemini-3-flash | 400ms latency | TPU-optimized efficiency |
| Balanced performance | gemini-2.5-pro | 1M context | Best quality-to-carbon ratio |
| High-volume | gemini-2.5-flash | Ultra-fast | Lowest carbon footprint |
| Lightweight tasks | gemini-2.5-flash-lite | Cost-effective | Minimal energy consumption |
| Embeddings | text-embedding-004 | 768 dimensions | Optimized embedding generation |
Sustainability Best Practices
Optimization Strategies
- Right-size your model: Use
gemini-2.5-flashfor simple tasks instead of Pro models - Enable Sustain Mode: Let GateFlow automatically choose the most efficient Google model
- Use time-shifting: Defer non-urgent requests to low-carbon periods
- Batch requests: Process multiple items in single API calls to reduce overhead
- Leverage TPUs: Google's Tensor Processing Units provide energy-efficient acceleration
Configuration Example
python
# Configure Google provider with sustainability settings
response = client.chat.completions.create(
model="google:auto", # Let GateFlow choose most efficient Google model
messages=[{"role": "user", "content": "Process this sustainably"}],
routing_mode="sustain_optimized",
minimum_quality_score=8, # Balance quality and efficiency
region_preference="us-central1" # Prioritize Google's carbon-neutral regions
)Performance Characteristics
Latency Comparison
- Fastest:
gemini-2.5-flash(200ms) - Balanced:
gemini-3-flash(400ms) - Standard:
gemini-2.5-pro(900ms) - Advanced:
gemini-3-pro(1,200ms) - Deep Think:
gemini-3-deep-think(2,500ms)
Token Limits
- Gemini 3 Pro: 2M context window, 8K output tokens
- Gemini 3 Flash: 1M context window, 8K output tokens
- Gemini 2.5 models: 1M-2M context window, 8K output tokens
- Embedding model: 2K token input limit
Pricing Overview
- Input prices: $0.05-$3.50 per 1M tokens
- Output prices: $0.20-$14.00 per 1M tokens
- Embeddings: $0.025 per 1M tokens
Integration with Other GateFlow Features
Multi-Provider Fallbacks
python
# Configure Google as primary with fallbacks
response = client.chat.completions.create(
model="gemini-3-pro", # Primary: Google
messages=[{"role": "user", "content": "Important request"}],
fallback_providers=["anthropic", "openai"], # Fallback chain
routing_mode="sustain_optimized"
)Semantic Caching
python
# Cache frequent Google requests
response = client.chat.completions.create(
model="gemini-2.5-pro",
messages=[{"role": "user", "content": "Frequently asked question"}],
cache_ttl_seconds=3600, # Cache for 1 hour
embedding_model="text-embedding-004" # Use Google embeddings for semantic matching
)Troubleshooting
"Google API key not configured"
Solution: Add your Google API key in the GateFlow Dashboard under Settings → Providers.
"Model not found: gemini-1.5-pro"
Solution: Use current models like gemini-3-pro instead of deprecated models.
"Rate limit exceeded"
Solution:
- Check your Google Cloud quota
- Configure fallbacks to other providers
- Enable request queuing in GateFlow settings
- Use
gemini-2.5-flashfor high-volume applications
"Carbon savings lower than expected"
Solution:
- Verify Sustain Mode is properly configured
- Check grid carbon intensity in your region
- Try different Google models for better efficiency
- Enable time-shifted execution for non-urgent requests
Migration from Direct Google API
Key Differences
| Feature | Direct Google API | GateFlow Google Integration |
|---|---|---|
| API Format | Google-specific | OpenAI-compatible |
| Authentication | Google API key | GateFlow API key |
| Model Names | gemini-1.5-pro | gemini-3-pro |
| Carbon Tracking | Manual | Automatic |
| Multi-provider | No | Yes |
| Fallbacks | Manual | Automatic |
| Sustainability | Basic | Advanced optimization |
Migration Example
Before (Direct Google API):
python
import google.generativeai as genai
genai.configure(api_key="your-google-api-key")
model = genai.GenerativeModel("gemini-1.5-pro")
response = model.generate_content("Hello from Google!")After (GateFlow Integration):
python
from openai import OpenAI
client = OpenAI(
base_url="https://api.gateflow.ai/v1",
api_key="gw_prod_your_gateflow_key"
)
response = client.chat.completions.create(
model="gemini-3-pro", # Use current models
messages=[{"role": "user", "content": "Hello from Google via GateFlow with sustainability benefits!"}],
routing_mode="sustain_optimized" # Enable carbon optimization
)Next Steps
- Explore OpenAI Integration - Versatile AI models
- Try Anthropic Integration - Advanced reasoning models
- Configure Sustain Mode - Automatic carbon optimization
- View Provider Analytics - Monitor your Google carbon savings