Appearance
Cohere Integration
GateFlow provides full support for Cohere's efficient AI models with sustainability optimization.
Available Models
Chat Models
command-r-plus- Most capable model for complex taskscommand-r- Balanced performance and efficiencycommand- Cost-effective for simpler tasks
Embedding Models
embed-english-v3.0- Optimized for English textembed-multilingual-v3.0- Supports 100+ languages
Rerank Models
rerank-english-v3.0- Improve RAG quality with semantic rankingrerank-multilingual-v3.0- Multilingual reranking
Sustainability Benefits
Cohere models are optimized for efficiency:
- Lower Carbon Footprint: Up to 30% less CO₂ per token vs comparable models
- Faster Inference: Reduced compute time = lower energy consumption
- Competitive Pricing: Cost efficiency often correlates with carbon efficiency
- Specialized Models: Right-sized models for specific tasks
Example Usage
Basic Chat Completion
python
from openai import OpenAI
client = OpenAI(
base_url="https://api.gateflow.ai/v1",
api_key="gw_prod_your_key_here"
)
# Use Cohere for efficient chat completion
response = client.chat.completions.create(
model="command-r-plus",
messages=[{"role": "user", "content": "Analyze this document efficiently"}]
)
print(response.choices[0].message.content)
print(f"Carbon footprint: {response.sustainability.carbon_gco2e} gCO₂e")Using Cohere Rerank for Better RAG
python
# Use Cohere rerank for better RAG results
documents = [
"Document 1 content about sustainability...",
"Document 2 content about AI efficiency...",
"Document 3 content about renewable energy..."
]
rerank_response = client.rerank.create(
model="rerank-english-v3.0",
query="sustainable AI practices",
documents=documents,
top_n=2
)
# Get the top 2 most relevant documents
for result in rerank_response.results:
print(f"Document {result.index}: Score {result.score}")
print(f"Carbon saved: {result.sustainability.carbon_saved_gco2e} gCO₂e")Sustain Mode with Cohere
python
# Let GateFlow choose the most sustainable Cohere model
response = client.chat.completions.create(
model="cohere:auto", # Auto-select most efficient Cohere model
routing_mode="sustain_optimized",
messages=[{"role": "user", "content": "Generate eco-friendly content"}]
)
print(f"Selected model: {response.model}")
print(f"Carbon saved: {response.sustainability.carbon_saved_gco2e} gCO₂e")Cohere-Specific Features
Tool Use
Cohere models support function calling with GateFlow's unified interface:
python
# Define tools (works across all providers)
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather information for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
}
}
}
}
]
response = client.chat.completions.create(
model="command-r-plus",
messages=[{"role": "user", "content": "What's the weather in Paris?"}],
tools=tools
)Search & Rerank Pipeline
Combine embeddings and rerank for optimal search results:
python
# Step 1: Generate embeddings
embedding_response = client.embeddings.create(
model="embed-english-v3.0",
input=["query text", "document 1", "document 2", "document 3"]
)
# Step 2: Use rerank for precision
rerank_response = client.rerank.create(
model="rerank-english-v3.0",
query="query text",
documents=["document 1", "document 2", "document 3"],
embeddings=embedding_response.data
)Sustainability Best Practices
Model Selection Guide
| Use Case | Recommended Model |
|---|---|
| Complex analysis | command-r-plus |
| General chat | command-r |
| Simple tasks | command |
| English embeddings | embed-english-v3.0 |
| Semantic search | rerank-english-v3.0 |
Optimization Tips
- Right-size your model: Use
commandinstead ofcommand-r-plusfor simple tasks - Batch requests: Process multiple items in single API calls
- Use rerank: Improve RAG quality while reducing overall compute
- Enable caching: Cache frequent Cohere requests for maximum savings
- Combine with Sustain Mode: Let GateFlow optimize across all providers
Performance Characteristics
Latency
- Chat models: 200-800ms typical response time
- Embedding models: 50-200ms per batch
- Rerank models: 100-300ms per query
Token Limits
- Chat models: Up to 128K tokens context window
- Embedding models: Up to 512 tokens per text
- Rerank models: Up to 512 documents per query
Integration with Other GateFlow Features
Semantic Caching
Cohere embeddings work seamlessly with GateFlow's semantic caching:
python
# Enable semantic caching with Cohere embeddings
response = client.chat.completions.create(
model="command-r-plus",
messages=[{"role": "user", "content": "Frequently asked question"}],
cache_ttl_seconds=3600, # Cache for 1 hour
embedding_model="embed-english-v3.0" # Use Cohere for semantic matching
)Multi-Provider Fallbacks
Configure Cohere as fallback for other providers:
python
# Set up fallback chain in Dashboard:
# Primary: OpenAI gpt-5.2
# Fallback 1: Cohere command-r-plus
# Fallback 2: Anthropic claude-3-5-sonnet
response = client.chat.completions.create(
model="gpt-5.2", # Will fallback to Cohere if OpenAI unavailable
messages=[{"role": "user", "content": "Important request"}]
)Troubleshooting
"Cohere API key not configured"
Solution: Add your Cohere API key in the GateFlow Dashboard under Settings → Providers.
"Model not found: command-r-plus"
Solution: Ensure you've selected the correct model name from the available Cohere models.
"Rate limit exceeded"
Solution:
- Check your Cohere account limits
- Configure fallbacks to other providers
- Enable request queuing in GateFlow settings
Migration from Direct Cohere API
Key Differences
| Feature | Direct Cohere API | GateFlow Cohere Integration |
|---|---|---|
| API Format | Cohere-specific | OpenAI-compatible |
| Authentication | Cohere API key | GateFlow API key |
| Model Names | command-r-plus | command-r-plus |
| Tool Support | Cohere format | OpenAI format |
| Carbon Tracking | Manual | Automatic |
| Multi-provider | No | Yes |
| Fallbacks | Manual | Automatic |
Migration Example
Before (Direct Cohere API):
python
import cohere
co = cohere.Client("your-cohere-api-key")
response = co.chat(
model="command-r-plus",
message="Hello from Cohere!"
)After (GateFlow Integration):
python
from openai import OpenAI
client = OpenAI(
base_url="https://api.gateflow.ai/v1",
api_key="gw_prod_your_gateflow_key"
)
response = client.chat.completions.create(
model="command-r-plus",
messages=[{"role": "user", "content": "Hello from Cohere via GateFlow!"}]
)Next Steps
- Try ElevenLabs Integration - Low-carbon voice synthesis
- Explore Sustain Mode - Automatic carbon optimization
- View Sustainability Dashboard - Track your Cohere carbon savings
- Configure Provider Settings - Optimize your Cohere configuration