Skip to content

Rerank Integration

Improve search relevance with neural reranking models.

Overview

Reranking is a two-stage retrieval process that dramatically improves search quality:

Why Reranking?

StageSpeedAccuracyPurpose
Initial SearchFastGoodCast wide net
RerankingSlowerExcellentPrecision refinement

Embedding search alone returns semantically similar results, but reranking uses cross-encoder models to deeply understand query-document relevance.

Supported Models

ModelProviderLanguagesBest For
rerank-english-v3.0CohereEnglishEnglish documents
rerank-multilingual-v3.0Cohere100+Multilingual corpora
rerank-english-v2.0CohereEnglishLegacy support

Basic Usage

Simple Reranking

python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.gateflow.ai/v1",
    api_key="gw_prod_..."
)

# Initial search results (from semantic search)
documents = [
    "The refund policy allows returns within 30 days.",
    "Our return process is simple and customer-friendly.",
    "Contact support for refund assistance.",
    "Products must be unused for refund eligibility.",
    "Refunds are processed within 5-7 business days."
]

# Rerank for the specific query
response = client.post(
    "/rerank",
    json={
        "model": "rerank-english-v3.0",
        "query": "How long do I have to return an item?",
        "documents": documents,
        "top_n": 3
    }
)

for result in response["results"]:
    print(f"Score: {result['relevance_score']:.3f}")
    print(f"Document: {documents[result['index']]}\n")

Output:

Score: 0.982
Document: The refund policy allows returns within 30 days.

Score: 0.847
Document: Products must be unused for refund eligibility.

Score: 0.734
Document: Our return process is simple and customer-friendly.

cURL Example

bash
curl -X POST https://api.gateflow.ai/v1/rerank \
  -H "Authorization: Bearer gw_prod_..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "rerank-english-v3.0",
    "query": "How long do I have to return an item?",
    "documents": [
      "The refund policy allows returns within 30 days.",
      "Our return process is simple and customer-friendly.",
      "Contact support for refund assistance."
    ],
    "top_n": 2
  }'

Integrated Search and Rerank

Combined API Call

python
# Search with automatic reranking
response = client.post(
    "/data/search",
    json={
        "query": "What is our vacation policy?",
        "collection": "hr_policies",
        "limit": 50,           # Initial search limit
        "rerank": {
            "enabled": True,
            "model": "rerank-english-v3.0",
            "top_n": 5         # Final results after rerank
        }
    }
)

for result in response["results"]:
    print(f"Score: {result['rerank_score']:.3f}")
    print(f"Content: {result['content'][:100]}...")
    print()

RAG with Reranking

python
# Chat with RAG + reranking
response = client.chat.completions.create(
    model="gpt-5.2",
    messages=[
        {"role": "user", "content": "How many vacation days do new employees get?"}
    ],
    extra_body={
        "gateflow": {
            "rag": {
                "enabled": True,
                "collection": "hr_policies",
                "top_k": 5,
                "rerank": {
                    "enabled": True,
                    "model": "rerank-english-v3.0",
                    "initial_k": 30  # Search 30, rerank to 5
                }
            }
        }
    }
)

print(response.choices[0].message.content)

Configuration Options

Rerank Parameters

ParameterTypeDefaultDescription
modelstringrequiredRerank model ID
querystringrequiredSearch query
documentsarrayrequiredDocuments to rerank
top_ninteger10Number of results to return
return_documentsbooleanfalseInclude document text in response
max_chunks_per_docintegernullLimit chunks per document

Advanced Options

python
response = client.post(
    "/rerank",
    json={
        "model": "rerank-multilingual-v3.0",
        "query": "Quelle est la politique de remboursement?",
        "documents": documents,
        "top_n": 5,
        "return_documents": True,
        "max_chunks_per_doc": 3
    }
)

Performance Optimization

Batch Reranking

python
# Rerank multiple queries efficiently
queries = [
    "vacation policy",
    "sick leave",
    "remote work guidelines"
]

results = []
for query in queries:
    response = client.post(
        "/rerank",
        json={
            "model": "rerank-english-v3.0",
            "query": query,
            "documents": all_documents,
            "top_n": 5
        }
    )
    results.append({
        "query": query,
        "top_results": response["results"]
    })

Caching Strategy

python
# Enable rerank result caching
response = client.post(
    "/rerank",
    json={
        "model": "rerank-english-v3.0",
        "query": "vacation policy",
        "documents": documents,
        "top_n": 5
    },
    headers={
        "X-GateFlow-Cache": "enabled",
        "X-GateFlow-Cache-TTL": "3600"  # 1 hour
    }
)

Multilingual Reranking

python
# Query in one language, documents in another
response = client.post(
    "/rerank",
    json={
        "model": "rerank-multilingual-v3.0",
        "query": "Comment puis-je obtenir un remboursement?",  # French
        "documents": [
            "Refunds are processed within 7 days.",           # English
            "Die Rückerstattung erfolgt innerhalb von 7 Tagen.", # German
            "Los reembolsos se procesan en 7 días."           # Spanish
        ],
        "top_n": 3
    }
)

Language Detection

python
# Let the system detect languages automatically
response = client.post(
    "/data/search",
    json={
        "query": "política de devoluciones",
        "collection": "support_docs",
        "rerank": {
            "enabled": True,
            "model": "rerank-multilingual-v3.0",
            "auto_detect_language": True
        }
    }
)

Quality Metrics

Relevance Scores

Score RangeInterpretation
0.9 - 1.0Highly relevant
0.7 - 0.9Relevant
0.5 - 0.7Somewhat relevant
0.0 - 0.5Low relevance

Monitoring Rerank Quality

python
# Track rerank performance
def analyze_rerank_quality(results, threshold=0.7):
    above_threshold = [r for r in results if r["relevance_score"] >= threshold]

    return {
        "total_results": len(results),
        "high_quality": len(above_threshold),
        "quality_ratio": len(above_threshold) / len(results),
        "avg_score": sum(r["relevance_score"] for r in results) / len(results)
    }

Best Practices

  1. Oversample initial search - Search for 5-10x your final result count
  2. Use appropriate model - English model for English, multilingual for mixed
  3. Set reasonable top_n - Usually 3-10 results is optimal
  4. Cache frequently - Reranking is more expensive than search
  5. Monitor scores - Track average relevance scores over time

Pricing

ModelCost per 1K Documents
rerank-english-v3.0$0.002
rerank-multilingual-v3.0$0.002

Response Format

json
{
  "id": "rerank_abc123",
  "model": "rerank-english-v3.0",
  "results": [
    {
      "index": 0,
      "relevance_score": 0.982,
      "document": "The refund policy allows returns within 30 days."
    },
    {
      "index": 3,
      "relevance_score": 0.847,
      "document": "Products must be unused for refund eligibility."
    }
  ],
  "usage": {
    "search_units": 5
  }
}

Next Steps

Built with reliability in mind.