Skip to content

Embeddings

Create vector embeddings for text. Embeddings are numerical representations of text that capture semantic meaning, enabling similarity search, clustering, and RAG applications.

POST /v1/embeddings

Overview

The Embeddings API converts text into high-dimensional vectors that can be used for:

  • Semantic search - Find similar documents based on meaning
  • RAG pipelines - Retrieve relevant context for LLM prompts
  • Clustering - Group similar documents together
  • Classification - Categorize text based on learned patterns

GateFlow routes embedding requests to the appropriate provider based on the model name and automatically handles provider-specific formatting.

Request

bash
curl https://api.gateflow.ai/v1/embeddings \
  -H "Authorization: Bearer gw_prod_..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "text-embedding-3-small",
    "input": "Hello world"
  }'
python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.gateflow.ai/v1",
    api_key="gw_prod_..."
)

response = client.embeddings.create(
    model="text-embedding-3-small",
    input="Hello world"
)
typescript
import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://api.gateflow.ai/v1',
  apiKey: 'gw_prod_...',
});

const response = await client.embeddings.create({
  model: 'text-embedding-3-small',
  input: 'Hello world',
});

Parameters

ParameterTypeRequiredDescription
modelstringYesEmbedding model ID
inputstring/arrayYesText or list of texts to embed
encoding_formatstringNofloat (default) or base64
dimensionsintegerNoOutput dimensions (for text-embedding-3 models)
userstringNoEnd-user identifier
providerstringNoForce specific provider

Supported Models

ModelProviderDimensionsMax TokensUse Case
text-embedding-3-smallOpenAI15368191General purpose, cost-effective
text-embedding-3-largeOpenAI30728191Highest accuracy
text-embedding-ada-002OpenAI15368191Legacy, widely compatible
text-embedding-004Google7682048Multilingual support
embed-english-v3.0Cohere1024512English-optimized
embed-multilingual-v3.0Cohere1024512100+ languages
mistral-embedMistral10248192Long context support

Anthropic Not Supported

Anthropic does not provide embedding models. Use OpenAI, Google, Cohere, or Mistral for embeddings.

Response

json
{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [0.0023064255, -0.009327292, ...]
    }
  ],
  "model": "text-embedding-3-small",
  "usage": {
    "prompt_tokens": 2,
    "total_tokens": 2
  },
  "gateflow": {
    "request_id": "req_xyz789",
    "provider": "openai",
    "latency_ms": 89,
    "cost": {
      "total": 0.00000004
    }
  }
}

Response Fields

FieldTypeDescription
objectstringAlways list
dataarrayArray of embeddings
modelstringModel used
usageobjectToken usage

Embedding Object

FieldTypeDescription
objectstringAlways embedding
indexintegerPosition in input array
embeddingarrayVector of floats

Examples

Single Text

python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.gateflow.ai/v1",
    api_key="gw_prod_..."
)

response = client.embeddings.create(
    model="text-embedding-3-small",
    input="The quick brown fox jumps over the lazy dog."
)

embedding = response.data[0].embedding
print(f"Dimensions: {len(embedding)}")  # 1536
typescript
import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://api.gateflow.ai/v1',
  apiKey: 'gw_prod_...',
});

const response = await client.embeddings.create({
  model: 'text-embedding-3-small',
  input: 'The quick brown fox jumps over the lazy dog.',
});

const embedding = response.data[0].embedding;
console.log(`Dimensions: ${embedding.length}`);
bash
curl https://api.gateflow.ai/v1/embeddings \
  -H "Authorization: Bearer gw_prod_..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "text-embedding-3-small",
    "input": "The quick brown fox jumps over the lazy dog."
  }'

Batch Embeddings

python
response = client.embeddings.create(
    model="text-embedding-3-small",
    input=[
        "First document",
        "Second document",
        "Third document"
    ]
)

for i, data in enumerate(response.data):
    print(f"Document {i}: {len(data.embedding)} dimensions")

Reduced Dimensions

python
# Use fewer dimensions for efficiency
response = client.embeddings.create(
    model="text-embedding-3-large",
    input="Hello world",
    dimensions=256  # Instead of 3072
)
python
import numpy as np

def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

# Create embeddings
query = client.embeddings.create(
    model="text-embedding-3-small",
    input="What is machine learning?"
).data[0].embedding

documents = client.embeddings.create(
    model="text-embedding-3-small",
    input=[
        "Machine learning is a subset of AI.",
        "The weather is nice today.",
        "Neural networks learn from data."
    ]
).data

# Find most similar
similarities = [
    cosine_similarity(query, doc.embedding)
    for doc in documents
]

most_similar_idx = np.argmax(similarities)
print(f"Most similar: document {most_similar_idx}")

GateFlow Extensions

Skip Caching

python
response = client.embeddings.create(
    model="text-embedding-3-small",
    input="Hello world",
    extra_body={
        "gateflow": {
            "cache": "skip"
        }
    }
)

Fallback Models

python
response = client.embeddings.create(
    model="text-embedding-3-small",
    input="Hello world",
    extra_body={
        "gateflow": {
            "fallbacks": ["text-embedding-004", "embed-english-v3.0"]
        }
    }
)

Use Cases

  1. Embed all documents at indexing time
  2. Store embeddings in a vector database
  3. Embed search query
  4. Find nearest neighbors

Semantic Caching

GateFlow uses embeddings internally for semantic cache:

python
# These similar queries may hit cache
client.chat.completions.create(model="gpt-4o", messages=[{"role": "user", "content": "What is Python?"}])
client.chat.completions.create(model="gpt-4o", messages=[{"role": "user", "content": "Explain Python"}])

Clustering

python
from sklearn.cluster import KMeans

# Embed documents
embeddings = [e.embedding for e in response.data]

# Cluster
kmeans = KMeans(n_clusters=3)
clusters = kmeans.fit_predict(embeddings)

Pricing

ModelPrice per 1M tokens
text-embedding-3-small$0.02
text-embedding-3-large$0.13
text-embedding-ada-002$0.10
text-embedding-004$0.025
embed-english-v3.0$0.10

Error Codes

CodeDescription
invalid_inputInput is empty or invalid
token_limit_exceededInput exceeds model's token limit
model_not_foundEmbedding model not available

See Error Handling for details.

Built with reliability in mind.