Skip to content

RAG Injection

Automatically inject relevant context into LLM requests.

How It Works

Enable RAG

python
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What's our refund policy?"}],
    extra_body={
        "gateflow": {
            "rag": {
                "enabled": True,
                "collection": "policies",
                "top_k": 5
            }
        }
    }
)

Injection Modes

Prepend (Default)

Context added before user message:

[System] Context from documents:
- Refunds available within 30 days...
- Full refund for defective items...

[User] What's our refund policy?

System Prompt

Context added to system prompt:

json
{
  "rag": {
    "injection_mode": "system",
    "system_template": "Use this context to answer: {context}"
  }
}

Append

Context after user message:

json
{
  "rag": {
    "injection_mode": "append"
  }
}

Configuration

json
{
  "rag": {
    "enabled": true,
    "collection": "policies",
    "top_k": 5,
    "min_score": 0.7,
    "rerank": true,
    "include_metadata": true,
    "max_context_tokens": 4000
  }
}

Response Metadata

json
{
  "gateflow": {
    "rag": {
      "used": true,
      "chunks_injected": 3,
      "sources": [
        {"document_id": "doc_123", "chunk_id": "chunk_456"}
      ]
    }
  }
}

Next Steps

Built with reliability in mind.