Skip to content

Cohere Integration

GateFlow provides full support for Cohere's efficient AI models with sustainability optimization.

Available Models

Chat Models

  • command-r-plus - Most capable model for complex tasks
  • command-r - Balanced performance and efficiency
  • command - Cost-effective for simpler tasks

Embedding Models

  • embed-english-v3.0 - Optimized for English text
  • embed-multilingual-v3.0 - Supports 100+ languages

Rerank Models

  • rerank-english-v3.0 - Improve RAG quality with semantic ranking
  • rerank-multilingual-v3.0 - Multilingual reranking

Sustainability Benefits

Cohere models are optimized for efficiency:

  • Lower Carbon Footprint: Up to 30% less CO₂ per token vs comparable models
  • Faster Inference: Reduced compute time = lower energy consumption
  • Competitive Pricing: Cost efficiency often correlates with carbon efficiency
  • Specialized Models: Right-sized models for specific tasks

Example Usage

Basic Chat Completion

python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.gateflow.ai/v1",
    api_key="gw_prod_your_key_here"
)

# Use Cohere for efficient chat completion
response = client.chat.completions.create(
    model="command-r-plus",
    messages=[{"role": "user", "content": "Analyze this document efficiently"}]
)

print(response.choices[0].message.content)
print(f"Carbon footprint: {response.sustainability.carbon_gco2e} gCO₂e")

Using Cohere Rerank for Better RAG

python
# Use Cohere rerank for better RAG results
documents = [
    "Document 1 content about sustainability...",
    "Document 2 content about AI efficiency...",
    "Document 3 content about renewable energy..."
]

rerank_response = client.rerank.create(
    model="rerank-english-v3.0",
    query="sustainable AI practices",
    documents=documents,
    top_n=2
)

# Get the top 2 most relevant documents
for result in rerank_response.results:
    print(f"Document {result.index}: Score {result.score}")
    print(f"Carbon saved: {result.sustainability.carbon_saved_gco2e} gCO₂e")

Sustain Mode with Cohere

python
# Let GateFlow choose the most sustainable Cohere model
response = client.chat.completions.create(
    model="cohere:auto",  # Auto-select most efficient Cohere model
    routing_mode="sustain_optimized",
    messages=[{"role": "user", "content": "Generate eco-friendly content"}]
)

print(f"Selected model: {response.model}")
print(f"Carbon saved: {response.sustainability.carbon_saved_gco2e} gCO₂e")

Cohere-Specific Features

Tool Use

Cohere models support function calling with GateFlow's unified interface:

python
# Define tools (works across all providers)
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get weather information for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string"},
                    "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
                }
            }
        }
    }
]

response = client.chat.completions.create(
    model="command-r-plus",
    messages=[{"role": "user", "content": "What's the weather in Paris?"}],
    tools=tools
)

Search & Rerank Pipeline

Combine embeddings and rerank for optimal search results:

python
# Step 1: Generate embeddings
embedding_response = client.embeddings.create(
    model="embed-english-v3.0",
    input=["query text", "document 1", "document 2", "document 3"]
)

# Step 2: Use rerank for precision
rerank_response = client.rerank.create(
    model="rerank-english-v3.0",
    query="query text",
    documents=["document 1", "document 2", "document 3"],
    embeddings=embedding_response.data
)

Sustainability Best Practices

Model Selection Guide

Use CaseRecommended Model
Complex analysiscommand-r-plus
General chatcommand-r
Simple taskscommand
English embeddingsembed-english-v3.0
Semantic searchrerank-english-v3.0

Optimization Tips

  1. Right-size your model: Use command instead of command-r-plus for simple tasks
  2. Batch requests: Process multiple items in single API calls
  3. Use rerank: Improve RAG quality while reducing overall compute
  4. Enable caching: Cache frequent Cohere requests for maximum savings
  5. Combine with Sustain Mode: Let GateFlow optimize across all providers

Performance Characteristics

Latency

  • Chat models: 200-800ms typical response time
  • Embedding models: 50-200ms per batch
  • Rerank models: 100-300ms per query

Token Limits

  • Chat models: Up to 128K tokens context window
  • Embedding models: Up to 512 tokens per text
  • Rerank models: Up to 512 documents per query

Integration with Other GateFlow Features

Semantic Caching

Cohere embeddings work seamlessly with GateFlow's semantic caching:

python
# Enable semantic caching with Cohere embeddings
response = client.chat.completions.create(
    model="command-r-plus",
    messages=[{"role": "user", "content": "Frequently asked question"}],
    cache_ttl_seconds=3600,  # Cache for 1 hour
    embedding_model="embed-english-v3.0"  # Use Cohere for semantic matching
)

Multi-Provider Fallbacks

Configure Cohere as fallback for other providers:

python
# Set up fallback chain in Dashboard:
# Primary: OpenAI gpt-5.2
# Fallback 1: Cohere command-r-plus
# Fallback 2: Anthropic claude-3-5-sonnet

response = client.chat.completions.create(
    model="gpt-5.2",  # Will fallback to Cohere if OpenAI unavailable
    messages=[{"role": "user", "content": "Important request"}]
)

Troubleshooting

"Cohere API key not configured"

Solution: Add your Cohere API key in the GateFlow Dashboard under Settings → Providers.

"Model not found: command-r-plus"

Solution: Ensure you've selected the correct model name from the available Cohere models.

"Rate limit exceeded"

Solution:

  1. Check your Cohere account limits
  2. Configure fallbacks to other providers
  3. Enable request queuing in GateFlow settings

Migration from Direct Cohere API

Key Differences

FeatureDirect Cohere APIGateFlow Cohere Integration
API FormatCohere-specificOpenAI-compatible
AuthenticationCohere API keyGateFlow API key
Model Namescommand-r-pluscommand-r-plus
Tool SupportCohere formatOpenAI format
Carbon TrackingManualAutomatic
Multi-providerNoYes
FallbacksManualAutomatic

Migration Example

Before (Direct Cohere API):

python
import cohere
co = cohere.Client("your-cohere-api-key")
response = co.chat(
    model="command-r-plus",
    message="Hello from Cohere!"
)

After (GateFlow Integration):

python
from openai import OpenAI
client = OpenAI(
    base_url="https://api.gateflow.ai/v1",
    api_key="gw_prod_your_gateflow_key"
)
response = client.chat.completions.create(
    model="command-r-plus",
    messages=[{"role": "user", "content": "Hello from Cohere via GateFlow!"}]
)

Next Steps

Built with reliability in mind.