Embeddings

Create vector embeddings for text. Embeddings are numerical representations of text that capture semantic meaning, enabling similarity search, clustering, and RAG applications.

POST /v1/embeddings

Overview

The Embeddings API converts text into high-dimensional vectors that can be used for:

Semantic search - Find similar documents based on meaning
RAG pipelines - Retrieve relevant context for LLM prompts
Clustering - Group similar documents together
Classification - Categorize text based on learned patterns

GateFlow routes embedding requests to the appropriate provider based on the model name and automatically handles provider-specific formatting.

Request

cURLPythonTypeScript

bash

curl https://api.gateflow.ai/v1/embeddings \
  -H "Authorization: Bearer gw_prod_..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "text-embedding-3-small",
    "input": "Hello world"
  }'

python

from openai import OpenAI

client = OpenAI(
    base_url="https://api.gateflow.ai/v1",
    api_key="gw_prod_..."
)

response = client.embeddings.create(
    model="text-embedding-3-small",
    input="Hello world"
)

typescript

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://api.gateflow.ai/v1',
  apiKey: 'gw_prod_...',
});

const response = await client.embeddings.create({
  model: 'text-embedding-3-small',
  input: 'Hello world',
});

Parameters

Parameter	Type	Required	Description
`model`	string	Yes	Embedding model ID
`input`	string/array	Yes	Text or list of texts to embed
`encoding_format`	string	No	`float` (default) or `base64`
`dimensions`	integer	No	Output dimensions (for text-embedding-3 models)
`user`	string	No	End-user identifier
`provider`	string	No	Force specific provider

Supported Models

Model	Provider	Dimensions	Max Tokens	Use Case
`text-embedding-3-small`	OpenAI	1536	8191	General purpose, cost-effective
`text-embedding-3-large`	OpenAI	3072	8191	Highest accuracy
`text-embedding-ada-002`	OpenAI	1536	8191	Legacy, widely compatible
`text-embedding-004`	Google	768	2048	Multilingual support
`embed-english-v3.0`	Cohere	1024	512	English-optimized
`embed-multilingual-v3.0`	Cohere	1024	512	100+ languages
`mistral-embed`	Mistral	1024	8192	Long context support

Anthropic Not Supported

Anthropic does not provide embedding models. Use OpenAI, Google, Cohere, or Mistral for embeddings.

Response

json

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [0.0023064255, -0.009327292, ...]
    }
  ],
  "model": "text-embedding-3-small",
  "usage": {
    "prompt_tokens": 2,
    "total_tokens": 2
  },
  "gateflow": {
    "request_id": "req_xyz789",
    "provider": "openai",
    "latency_ms": 89,
    "cost": {
      "total": 0.00000004
    }
  }
}

Response Fields

Field	Type	Description
`object`	string	Always `list`
`data`	array	Array of embeddings
`model`	string	Model used
`usage`	object	Token usage

Embedding Object

Field	Type	Description
`object`	string	Always `embedding`
`index`	integer	Position in input array
`embedding`	array	Vector of floats

Examples

Single Text

PythonTypeScriptcURL

python

from openai import OpenAI

client = OpenAI(
    base_url="https://api.gateflow.ai/v1",
    api_key="gw_prod_..."
)

response = client.embeddings.create(
    model="text-embedding-3-small",
    input="The quick brown fox jumps over the lazy dog."
)

embedding = response.data[0].embedding
print(f"Dimensions: {len(embedding)}")  # 1536

typescript

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://api.gateflow.ai/v1',
  apiKey: 'gw_prod_...',
});

const response = await client.embeddings.create({
  model: 'text-embedding-3-small',
  input: 'The quick brown fox jumps over the lazy dog.',
});

const embedding = response.data[0].embedding;
console.log(`Dimensions: ${embedding.length}`);

bash

curl https://api.gateflow.ai/v1/embeddings \
  -H "Authorization: Bearer gw_prod_..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "text-embedding-3-small",
    "input": "The quick brown fox jumps over the lazy dog."
  }'

Batch Embeddings

python

response = client.embeddings.create(
    model="text-embedding-3-small",
    input=[
        "First document",
        "Second document",
        "Third document"
    ]
)

for i, data in enumerate(response.data):
    print(f"Document {i}: {len(data.embedding)} dimensions")

Reduced Dimensions

python

# Use fewer dimensions for efficiency
response = client.embeddings.create(
    model="text-embedding-3-large",
    input="Hello world",
    dimensions=256  # Instead of 3072
)

Similarity Search

python

import numpy as np

def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

# Create embeddings
query = client.embeddings.create(
    model="text-embedding-3-small",
    input="What is machine learning?"
).data[0].embedding

documents = client.embeddings.create(
    model="text-embedding-3-small",
    input=[
        "Machine learning is a subset of AI.",
        "The weather is nice today.",
        "Neural networks learn from data."
    ]
).data

# Find most similar
similarities = [
    cosine_similarity(query, doc.embedding)
    for doc in documents
]

most_similar_idx = np.argmax(similarities)
print(f"Most similar: document {most_similar_idx}")

GateFlow Extensions

Skip Caching

python

response = client.embeddings.create(
    model="text-embedding-3-small",
    input="Hello world",
    extra_body={
        "gateflow": {
            "cache": "skip"
        }
    }
)

Fallback Models

python

response = client.embeddings.create(
    model="text-embedding-3-small",
    input="Hello world",
    extra_body={
        "gateflow": {
            "fallbacks": ["text-embedding-004", "embed-english-v3.0"]
        }
    }
)

Use Cases

Document Search

Embed all documents at indexing time
Store embeddings in a vector database
Embed search query
Find nearest neighbors

Semantic Caching

GateFlow uses embeddings internally for semantic cache:

python

# These similar queries may hit cache
client.chat.completions.create(model="gpt-4o", messages=[{"role": "user", "content": "What is Python?"}])
client.chat.completions.create(model="gpt-4o", messages=[{"role": "user", "content": "Explain Python"}])

Clustering

python

from sklearn.cluster import KMeans

# Embed documents
embeddings = [e.embedding for e in response.data]

# Cluster
kmeans = KMeans(n_clusters=3)
clusters = kmeans.fit_predict(embeddings)

Pricing

Model	Price per 1M tokens
text-embedding-3-small	$0.02
text-embedding-3-large	$0.13
text-embedding-ada-002	$0.10
text-embedding-004	$0.025
embed-english-v3.0	$0.10

Error Codes

Code	Description
`invalid_input`	Input is empty or invalid
`token_limit_exceeded`	Input exceeds model's token limit
`model_not_found`	Embedding model not available

See Error Handling for details.

Embeddings ​

Overview ​

Request ​

Parameters ​

Supported Models ​

Response ​

Response Fields ​

Embedding Object ​

Examples ​

Single Text ​

Batch Embeddings ​

Reduced Dimensions ​

Similarity Search ​

GateFlow Extensions ​

Skip Caching ​

Fallback Models ​

Use Cases ​

Document Search ​

Semantic Caching ​

Clustering ​

Pricing ​

Error Codes ​

Embeddings

Overview

Request

Parameters

Supported Models

Response

Response Fields

Embedding Object

Examples

Single Text

Batch Embeddings

Reduced Dimensions

Similarity Search

GateFlow Extensions

Skip Caching

Fallback Models

Use Cases

Document Search

Semantic Caching

Clustering

Pricing

Error Codes