Skip to content

Cache Configuration

Advanced configuration options for GateFlow's semantic caching system.

Configuration Overview

json
{
  "semantic_cache": {
    "enabled": true,
    "threshold": 0.95,
    "ttl_seconds": 86400,
    "max_entries": 1000000,
    "embedding_model": "text-embedding-3-small",
    "scope": {
      "include_model": true,
      "include_temperature": true,
      "include_system_prompt": true
    },
    "exclusions": {
      "models": [],
      "paths": [],
      "headers": {}
    }
  }
}

Threshold Tuning

Finding the Right Threshold

bash
# Analyze query similarity distribution
curl https://api.gateflow.ai/v1/management/cache/similarity-analysis \
  -H "Authorization: Bearer gw_prod_admin_key"

Response:

json
{
  "distribution": {
    "0.99-1.00": 0.15,
    "0.95-0.99": 0.25,
    "0.90-0.95": 0.20,
    "0.85-0.90": 0.15,
    "below_0.85": 0.25
  },
  "recommendation": {
    "threshold": 0.95,
    "expected_hit_rate": 0.40,
    "quality_score": "high"
  }
}

Threshold Impact

ThresholdHit RateQuality Risk
0.99~15%Very Low
0.95~40%Low
0.90~60%Medium
0.85~75%High

Per-Model Thresholds

Different models may need different thresholds:

json
{
  "semantic_cache": {
    "model_thresholds": {
      "gpt-4o": 0.95,
      "gpt-4o-mini": 0.92,
      "claude-3-haiku": 0.93
    }
  }
}

TTL Strategies

Static TTL

All entries expire after the same duration:

json
{
  "ttl_seconds": 86400  // 24 hours
}

Dynamic TTL

Vary TTL based on query characteristics:

json
{
  "ttl_strategy": "dynamic",
  "ttl_rules": [
    {
      "condition": {"contains": ["today", "now", "current"]},
      "ttl_seconds": 3600  // 1 hour
    },
    {
      "condition": {"model": "gpt-4o-mini"},
      "ttl_seconds": 604800  // 1 week
    },
    {
      "condition": {"default": true},
      "ttl_seconds": 86400  // 1 day
    }
  ]
}

Sliding TTL

Reset TTL on cache hit:

json
{
  "ttl_strategy": "sliding",
  "ttl_seconds": 86400,
  "max_age_seconds": 604800  // Maximum 1 week regardless
}

Cache Scope

Model Scoping

Cache per model (default: true):

json
{
  "scope": {
    "include_model": true
  }
}

If false, queries across models share cache:

  • Pro: Higher hit rate
  • Con: May return wrong model's style

Temperature Scoping

Cache per temperature setting:

json
{
  "scope": {
    "include_temperature": true
  }
}

Considerations:

  • temperature: 0 → Deterministic, safe to share
  • temperature: 0.7+ → Variable, consider scoping

System Prompt Scoping

Cache per system prompt:

json
{
  "scope": {
    "include_system_prompt": true
  }
}

If false, same query with different system prompts shares cache.

Custom Scope Keys

Add custom dimensions to cache key:

json
{
  "scope": {
    "custom_keys": ["user_locale", "app_version"]
  }
}

Pass in request:

python
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[...],
    extra_body={
        "gateflow": {
            "cache_scope": {
                "user_locale": "en-US",
                "app_version": "2.1"
            }
        }
    }
)

Exclusions

Exclude Models

json
{
  "exclusions": {
    "models": ["gpt-4-vision", "whisper-1"]
  }
}

Exclude by Header

json
{
  "exclusions": {
    "headers": {
      "X-No-Cache": "true",
      "X-User-Type": "premium"
    }
  }
}

Exclude by Pattern

json
{
  "exclusions": {
    "message_patterns": [
      "order.*status",
      "account.*balance",
      "\\$[0-9]+"
    ]
  }
}

Storage Configuration

Maximum Entries

json
{
  "max_entries": 1000000
}

When limit is reached, oldest entries are evicted (LRU).

Entry Size Limits

json
{
  "max_entry_size_tokens": 8000,
  "max_response_size_tokens": 4000
}

Responses larger than limits are not cached.

Embedding Configuration

Embedding Model

json
{
  "embedding_model": "text-embedding-3-small"
}

Options:

  • text-embedding-3-small - Fast, good for most cases
  • text-embedding-3-large - Higher quality similarity
  • text-embedding-ada-002 - Legacy, compatibility

Embedding Scope

What to embed for similarity:

json
{
  "embedding_scope": {
    "include_system": false,  // Don't embed system prompt
    "include_history": true,  // Embed conversation history
    "max_history_turns": 3    // Only last 3 turns
  }
}

Cache Warming

Automatic Warming

Pre-cache popular queries:

json
{
  "warming": {
    "enabled": true,
    "sources": ["popular_queries", "predefined_list"],
    "schedule": "0 0 * * *"  // Daily at midnight
  }
}

Manual Warming

bash
curl -X POST https://api.gateflow.ai/v1/management/cache/warm \
  -H "Authorization: Bearer gw_prod_admin_key" \
  -H "Content-Type: application/json" \
  -d '{
    "source": "file",
    "url": "s3://bucket/popular-queries.jsonl",
    "model": "gpt-4o"
  }'

Monitoring

Cache Metrics

bash
curl https://api.gateflow.ai/v1/management/cache/metrics \
  -H "Authorization: Bearer gw_prod_admin_key"

Response:

json
{
  "entries": 456789,
  "storage_mb": 1234,
  "hit_rate_1h": 0.42,
  "hit_rate_24h": 0.38,
  "avg_similarity_hit": 0.97,
  "evictions_24h": 1234
}

Alerts

json
{
  "alerts": [
    {
      "name": "Low cache hit rate",
      "condition": {"metric": "hit_rate_1h", "lt": 0.2},
      "notify": ["slack"]
    },
    {
      "name": "Cache storage high",
      "condition": {"metric": "storage_percentage", "gt": 0.9},
      "notify": ["email"]
    }
  ]
}

Debugging

Test Similarity

bash
curl -X POST https://api.gateflow.ai/v1/management/cache/test-similarity \
  -H "Authorization: Bearer gw_prod_admin_key" \
  -H "Content-Type: application/json" \
  -d '{
    "query_a": "What is Python?",
    "query_b": "Explain Python programming"
  }'

Response:

json
{
  "similarity": 0.94,
  "would_hit_cache": false,
  "threshold": 0.95
}

Cache Debug Mode

Enable detailed cache logging:

python
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[...],
    extra_body={
        "gateflow": {
            "cache_debug": true
        }
    }
)

Response includes:

json
{
  "gateflow": {
    "cache_debug": {
      "query_embedding": "[0.123, -0.456, ...]",
      "nearest_matches": [
        {"similarity": 0.93, "query": "Tell me about Python"},
        {"similarity": 0.89, "query": "Python overview"}
      ],
      "decision": "miss",
      "reason": "no_match_above_threshold"
    }
  }
}

Next Steps

Built with reliability in mind.