Rerank Integration

Improve search relevance with neural reranking models.

Overview

Reranking is a two-stage retrieval process that dramatically improves search quality:

Why Reranking?

Stage	Speed	Accuracy	Purpose
Initial Search	Fast	Good	Cast wide net
Reranking	Slower	Excellent	Precision refinement

Embedding search alone returns semantically similar results, but reranking uses cross-encoder models to deeply understand query-document relevance.

Supported Models

Model	Provider	Languages	Best For
`rerank-english-v3.0`	Cohere	English	English documents
`rerank-multilingual-v3.0`	Cohere	100+	Multilingual corpora
`rerank-english-v2.0`	Cohere	English	Legacy support

Basic Usage

Simple Reranking

python

from openai import OpenAI

client = OpenAI(
    base_url="https://api.gateflow.ai/v1",
    api_key="gw_prod_..."
)

# Initial search results (from semantic search)
documents = [
    "The refund policy allows returns within 30 days.",
    "Our return process is simple and customer-friendly.",
    "Contact support for refund assistance.",
    "Products must be unused for refund eligibility.",
    "Refunds are processed within 5-7 business days."
]

# Rerank for the specific query
response = client.post(
    "/rerank",
    json={
        "model": "rerank-english-v3.0",
        "query": "How long do I have to return an item?",
        "documents": documents,
        "top_n": 3
    }
)

for result in response["results"]:
    print(f"Score: {result['relevance_score']:.3f}")
    print(f"Document: {documents[result['index']]}\n")

Output:

Score: 0.982
Document: The refund policy allows returns within 30 days.

Score: 0.847
Document: Products must be unused for refund eligibility.

Score: 0.734
Document: Our return process is simple and customer-friendly.

cURL Example

bash

curl -X POST https://api.gateflow.ai/v1/rerank \
  -H "Authorization: Bearer gw_prod_..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "rerank-english-v3.0",
    "query": "How long do I have to return an item?",
    "documents": [
      "The refund policy allows returns within 30 days.",
      "Our return process is simple and customer-friendly.",
      "Contact support for refund assistance."
    ],
    "top_n": 2
  }'

Integrated Search and Rerank

Combined API Call

python

# Search with automatic reranking
response = client.post(
    "/data/search",
    json={
        "query": "What is our vacation policy?",
        "collection": "hr_policies",
        "limit": 50,           # Initial search limit
        "rerank": {
            "enabled": True,
            "model": "rerank-english-v3.0",
            "top_n": 5         # Final results after rerank
        }
    }
)

for result in response["results"]:
    print(f"Score: {result['rerank_score']:.3f}")
    print(f"Content: {result['content'][:100]}...")
    print()

RAG with Reranking

python

# Chat with RAG + reranking
response = client.chat.completions.create(
    model="gpt-5.2",
    messages=[
        {"role": "user", "content": "How many vacation days do new employees get?"}
    ],
    extra_body={
        "gateflow": {
            "rag": {
                "enabled": True,
                "collection": "hr_policies",
                "top_k": 5,
                "rerank": {
                    "enabled": True,
                    "model": "rerank-english-v3.0",
                    "initial_k": 30  # Search 30, rerank to 5
                }
            }
        }
    }
)

print(response.choices[0].message.content)

Configuration Options

Rerank Parameters

Parameter	Type	Default	Description
`model`	string	required	Rerank model ID
`query`	string	required	Search query
`documents`	array	required	Documents to rerank
`top_n`	integer	10	Number of results to return
`return_documents`	boolean	false	Include document text in response
`max_chunks_per_doc`	integer	null	Limit chunks per document

Advanced Options

python

response = client.post(
    "/rerank",
    json={
        "model": "rerank-multilingual-v3.0",
        "query": "Quelle est la politique de remboursement?",
        "documents": documents,
        "top_n": 5,
        "return_documents": True,
        "max_chunks_per_doc": 3
    }
)

Performance Optimization

Batch Reranking

python

# Rerank multiple queries efficiently
queries = [
    "vacation policy",
    "sick leave",
    "remote work guidelines"
]

results = []
for query in queries:
    response = client.post(
        "/rerank",
        json={
            "model": "rerank-english-v3.0",
            "query": query,
            "documents": all_documents,
            "top_n": 5
        }
    )
    results.append({
        "query": query,
        "top_results": response["results"]
    })

Caching Strategy

python

# Enable rerank result caching
response = client.post(
    "/rerank",
    json={
        "model": "rerank-english-v3.0",
        "query": "vacation policy",
        "documents": documents,
        "top_n": 5
    },
    headers={
        "X-GateFlow-Cache": "enabled",
        "X-GateFlow-Cache-TTL": "3600"  # 1 hour
    }
)

Multilingual Reranking

Cross-Language Search

python

# Query in one language, documents in another
response = client.post(
    "/rerank",
    json={
        "model": "rerank-multilingual-v3.0",
        "query": "Comment puis-je obtenir un remboursement?",  # French
        "documents": [
            "Refunds are processed within 7 days.",           # English
            "Die Rückerstattung erfolgt innerhalb von 7 Tagen.", # German
            "Los reembolsos se procesan en 7 días."           # Spanish
        ],
        "top_n": 3
    }
)

Language Detection

python

# Let the system detect languages automatically
response = client.post(
    "/data/search",
    json={
        "query": "política de devoluciones",
        "collection": "support_docs",
        "rerank": {
            "enabled": True,
            "model": "rerank-multilingual-v3.0",
            "auto_detect_language": True
        }
    }
)

Quality Metrics

Relevance Scores

Score Range	Interpretation
0.9 - 1.0	Highly relevant
0.7 - 0.9	Relevant
0.5 - 0.7	Somewhat relevant
0.0 - 0.5	Low relevance

Monitoring Rerank Quality

python

# Track rerank performance
def analyze_rerank_quality(results, threshold=0.7):
    above_threshold = [r for r in results if r["relevance_score"] >= threshold]

    return {
        "total_results": len(results),
        "high_quality": len(above_threshold),
        "quality_ratio": len(above_threshold) / len(results),
        "avg_score": sum(r["relevance_score"] for r in results) / len(results)
    }

Best Practices

Oversample initial search - Search for 5-10x your final result count
Use appropriate model - English model for English, multilingual for mixed
Set reasonable top_n - Usually 3-10 results is optimal
Cache frequently - Reranking is more expensive than search
Monitor scores - Track average relevance scores over time

Pricing

Model	Cost per 1K Documents
`rerank-english-v3.0`	$0.002
`rerank-multilingual-v3.0`	$0.002

Response Format

json

{
  "id": "rerank_abc123",
  "model": "rerank-english-v3.0",
  "results": [
    {
      "index": 0,
      "relevance_score": 0.982,
      "document": "The refund policy allows returns within 30 days."
    },
    {
      "index": 3,
      "relevance_score": 0.847,
      "document": "Products must be unused for refund eligibility."
    }
  ],
  "usage": {
    "search_units": 5
  }
}

Next Steps

Semantic Search - Initial retrieval
RAG Injection - Use with LLMs
Retrieval Tools - MCP integration

Rerank Integration ​

Overview ​

Why Reranking? ​

Supported Models ​

Basic Usage ​

Simple Reranking ​

cURL Example ​

Integrated Search and Rerank ​

Combined API Call ​

RAG with Reranking ​

Configuration Options ​

Rerank Parameters ​

Advanced Options ​

Performance Optimization ​

Batch Reranking ​

Caching Strategy ​

Multilingual Reranking ​

Cross-Language Search ​

Language Detection ​

Quality Metrics ​

Relevance Scores ​

Monitoring Rerank Quality ​

Best Practices ​

Pricing ​

Response Format ​

Next Steps ​

Rerank Integration

Overview

Why Reranking?

Supported Models

Basic Usage

Simple Reranking

cURL Example

Integrated Search and Rerank

Combined API Call

RAG with Reranking

Configuration Options

Rerank Parameters

Advanced Options

Performance Optimization

Batch Reranking

Caching Strategy

Multilingual Reranking

Cross-Language Search

Language Detection

Quality Metrics

Relevance Scores

Monitoring Rerank Quality

Best Practices

Pricing

Response Format

Next Steps