Appearance
Rerank Integration
Improve search relevance with neural reranking models.
Overview
Reranking is a two-stage retrieval process that dramatically improves search quality:
Why Reranking?
| Stage | Speed | Accuracy | Purpose |
|---|---|---|---|
| Initial Search | Fast | Good | Cast wide net |
| Reranking | Slower | Excellent | Precision refinement |
Embedding search alone returns semantically similar results, but reranking uses cross-encoder models to deeply understand query-document relevance.
Supported Models
| Model | Provider | Languages | Best For |
|---|---|---|---|
rerank-english-v3.0 | Cohere | English | English documents |
rerank-multilingual-v3.0 | Cohere | 100+ | Multilingual corpora |
rerank-english-v2.0 | Cohere | English | Legacy support |
Basic Usage
Simple Reranking
python
from openai import OpenAI
client = OpenAI(
base_url="https://api.gateflow.ai/v1",
api_key="gw_prod_..."
)
# Initial search results (from semantic search)
documents = [
"The refund policy allows returns within 30 days.",
"Our return process is simple and customer-friendly.",
"Contact support for refund assistance.",
"Products must be unused for refund eligibility.",
"Refunds are processed within 5-7 business days."
]
# Rerank for the specific query
response = client.post(
"/rerank",
json={
"model": "rerank-english-v3.0",
"query": "How long do I have to return an item?",
"documents": documents,
"top_n": 3
}
)
for result in response["results"]:
print(f"Score: {result['relevance_score']:.3f}")
print(f"Document: {documents[result['index']]}\n")Output:
Score: 0.982
Document: The refund policy allows returns within 30 days.
Score: 0.847
Document: Products must be unused for refund eligibility.
Score: 0.734
Document: Our return process is simple and customer-friendly.cURL Example
bash
curl -X POST https://api.gateflow.ai/v1/rerank \
-H "Authorization: Bearer gw_prod_..." \
-H "Content-Type: application/json" \
-d '{
"model": "rerank-english-v3.0",
"query": "How long do I have to return an item?",
"documents": [
"The refund policy allows returns within 30 days.",
"Our return process is simple and customer-friendly.",
"Contact support for refund assistance."
],
"top_n": 2
}'Integrated Search and Rerank
Combined API Call
python
# Search with automatic reranking
response = client.post(
"/data/search",
json={
"query": "What is our vacation policy?",
"collection": "hr_policies",
"limit": 50, # Initial search limit
"rerank": {
"enabled": True,
"model": "rerank-english-v3.0",
"top_n": 5 # Final results after rerank
}
}
)
for result in response["results"]:
print(f"Score: {result['rerank_score']:.3f}")
print(f"Content: {result['content'][:100]}...")
print()RAG with Reranking
python
# Chat with RAG + reranking
response = client.chat.completions.create(
model="gpt-5.2",
messages=[
{"role": "user", "content": "How many vacation days do new employees get?"}
],
extra_body={
"gateflow": {
"rag": {
"enabled": True,
"collection": "hr_policies",
"top_k": 5,
"rerank": {
"enabled": True,
"model": "rerank-english-v3.0",
"initial_k": 30 # Search 30, rerank to 5
}
}
}
}
)
print(response.choices[0].message.content)Configuration Options
Rerank Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
model | string | required | Rerank model ID |
query | string | required | Search query |
documents | array | required | Documents to rerank |
top_n | integer | 10 | Number of results to return |
return_documents | boolean | false | Include document text in response |
max_chunks_per_doc | integer | null | Limit chunks per document |
Advanced Options
python
response = client.post(
"/rerank",
json={
"model": "rerank-multilingual-v3.0",
"query": "Quelle est la politique de remboursement?",
"documents": documents,
"top_n": 5,
"return_documents": True,
"max_chunks_per_doc": 3
}
)Performance Optimization
Batch Reranking
python
# Rerank multiple queries efficiently
queries = [
"vacation policy",
"sick leave",
"remote work guidelines"
]
results = []
for query in queries:
response = client.post(
"/rerank",
json={
"model": "rerank-english-v3.0",
"query": query,
"documents": all_documents,
"top_n": 5
}
)
results.append({
"query": query,
"top_results": response["results"]
})Caching Strategy
python
# Enable rerank result caching
response = client.post(
"/rerank",
json={
"model": "rerank-english-v3.0",
"query": "vacation policy",
"documents": documents,
"top_n": 5
},
headers={
"X-GateFlow-Cache": "enabled",
"X-GateFlow-Cache-TTL": "3600" # 1 hour
}
)Multilingual Reranking
Cross-Language Search
python
# Query in one language, documents in another
response = client.post(
"/rerank",
json={
"model": "rerank-multilingual-v3.0",
"query": "Comment puis-je obtenir un remboursement?", # French
"documents": [
"Refunds are processed within 7 days.", # English
"Die Rückerstattung erfolgt innerhalb von 7 Tagen.", # German
"Los reembolsos se procesan en 7 días." # Spanish
],
"top_n": 3
}
)Language Detection
python
# Let the system detect languages automatically
response = client.post(
"/data/search",
json={
"query": "política de devoluciones",
"collection": "support_docs",
"rerank": {
"enabled": True,
"model": "rerank-multilingual-v3.0",
"auto_detect_language": True
}
}
)Quality Metrics
Relevance Scores
| Score Range | Interpretation |
|---|---|
| 0.9 - 1.0 | Highly relevant |
| 0.7 - 0.9 | Relevant |
| 0.5 - 0.7 | Somewhat relevant |
| 0.0 - 0.5 | Low relevance |
Monitoring Rerank Quality
python
# Track rerank performance
def analyze_rerank_quality(results, threshold=0.7):
above_threshold = [r for r in results if r["relevance_score"] >= threshold]
return {
"total_results": len(results),
"high_quality": len(above_threshold),
"quality_ratio": len(above_threshold) / len(results),
"avg_score": sum(r["relevance_score"] for r in results) / len(results)
}Best Practices
- Oversample initial search - Search for 5-10x your final result count
- Use appropriate model - English model for English, multilingual for mixed
- Set reasonable top_n - Usually 3-10 results is optimal
- Cache frequently - Reranking is more expensive than search
- Monitor scores - Track average relevance scores over time
Pricing
| Model | Cost per 1K Documents |
|---|---|
rerank-english-v3.0 | $0.002 |
rerank-multilingual-v3.0 | $0.002 |
Response Format
json
{
"id": "rerank_abc123",
"model": "rerank-english-v3.0",
"results": [
{
"index": 0,
"relevance_score": 0.982,
"document": "The refund policy allows returns within 30 days."
},
{
"index": 3,
"relevance_score": 0.847,
"document": "Products must be unused for refund eligibility."
}
],
"usage": {
"search_units": 5
}
}Next Steps
- Semantic Search - Initial retrieval
- RAG Injection - Use with LLMs
- Retrieval Tools - MCP integration