Appearance
Embeddings
Create vector embeddings for text. Embeddings are numerical representations of text that capture semantic meaning, enabling similarity search, clustering, and RAG applications.
POST /v1/embeddings1
Overview
The Embeddings API converts text into high-dimensional vectors that can be used for:
- Semantic search - Find similar documents based on meaning
- RAG pipelines - Retrieve relevant context for LLM prompts
- Clustering - Group similar documents together
- Classification - Categorize text based on learned patterns
GateFlow routes embedding requests to the appropriate provider based on the model name and automatically handles provider-specific formatting.
Request
bash
curl https://api.gateflow.ai/v1/embeddings \
-H "Authorization: Bearer gw_prod_..." \
-H "Content-Type: application/json" \
-d '{
"model": "text-embedding-3-small",
"input": "Hello world"
}'1
2
3
4
5
6
7
2
3
4
5
6
7
python
from openai import OpenAI
client = OpenAI(
base_url="https://api.gateflow.ai/v1",
api_key="gw_prod_..."
)
response = client.embeddings.create(
model="text-embedding-3-small",
input="Hello world"
)1
2
3
4
5
6
7
8
9
10
11
2
3
4
5
6
7
8
9
10
11
typescript
import OpenAI from 'openai';
const client = new OpenAI({
baseURL: 'https://api.gateflow.ai/v1',
apiKey: 'gw_prod_...',
});
const response = await client.embeddings.create({
model: 'text-embedding-3-small',
input: 'Hello world',
});1
2
3
4
5
6
7
8
9
10
11
2
3
4
5
6
7
8
9
10
11
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Embedding model ID |
input | string/array | Yes | Text or list of texts to embed |
encoding_format | string | No | float (default) or base64 |
dimensions | integer | No | Output dimensions (for text-embedding-3 models) |
user | string | No | End-user identifier |
provider | string | No | Force specific provider |
Supported Models
| Model | Provider | Dimensions | Max Tokens | Use Case |
|---|---|---|---|---|
text-embedding-3-small | OpenAI | 1536 | 8191 | General purpose, cost-effective |
text-embedding-3-large | OpenAI | 3072 | 8191 | Highest accuracy |
text-embedding-ada-002 | OpenAI | 1536 | 8191 | Legacy, widely compatible |
text-embedding-004 | 768 | 2048 | Multilingual support | |
embed-english-v3.0 | Cohere | 1024 | 512 | English-optimized |
embed-multilingual-v3.0 | Cohere | 1024 | 512 | 100+ languages |
mistral-embed | Mistral | 1024 | 8192 | Long context support |
Anthropic Not Supported
Anthropic does not provide embedding models. Use OpenAI, Google, Cohere, or Mistral for embeddings.
Response
json
{
"object": "list",
"data": [
{
"object": "embedding",
"index": 0,
"embedding": [0.0023064255, -0.009327292, ...]
}
],
"model": "text-embedding-3-small",
"usage": {
"prompt_tokens": 2,
"total_tokens": 2
},
"gateflow": {
"request_id": "req_xyz789",
"provider": "openai",
"latency_ms": 89,
"cost": {
"total": 0.00000004
}
}
}1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
Response Fields
| Field | Type | Description |
|---|---|---|
object | string | Always list |
data | array | Array of embeddings |
model | string | Model used |
usage | object | Token usage |
Embedding Object
| Field | Type | Description |
|---|---|---|
object | string | Always embedding |
index | integer | Position in input array |
embedding | array | Vector of floats |
Examples
Single Text
python
from openai import OpenAI
client = OpenAI(
base_url="https://api.gateflow.ai/v1",
api_key="gw_prod_..."
)
response = client.embeddings.create(
model="text-embedding-3-small",
input="The quick brown fox jumps over the lazy dog."
)
embedding = response.data[0].embedding
print(f"Dimensions: {len(embedding)}") # 15361
2
3
4
5
6
7
8
9
10
11
12
13
14
2
3
4
5
6
7
8
9
10
11
12
13
14
typescript
import OpenAI from 'openai';
const client = new OpenAI({
baseURL: 'https://api.gateflow.ai/v1',
apiKey: 'gw_prod_...',
});
const response = await client.embeddings.create({
model: 'text-embedding-3-small',
input: 'The quick brown fox jumps over the lazy dog.',
});
const embedding = response.data[0].embedding;
console.log(`Dimensions: ${embedding.length}`);1
2
3
4
5
6
7
8
9
10
11
12
13
14
2
3
4
5
6
7
8
9
10
11
12
13
14
bash
curl https://api.gateflow.ai/v1/embeddings \
-H "Authorization: Bearer gw_prod_..." \
-H "Content-Type: application/json" \
-d '{
"model": "text-embedding-3-small",
"input": "The quick brown fox jumps over the lazy dog."
}'1
2
3
4
5
6
7
2
3
4
5
6
7
Batch Embeddings
python
response = client.embeddings.create(
model="text-embedding-3-small",
input=[
"First document",
"Second document",
"Third document"
]
)
for i, data in enumerate(response.data):
print(f"Document {i}: {len(data.embedding)} dimensions")1
2
3
4
5
6
7
8
9
10
11
2
3
4
5
6
7
8
9
10
11
Reduced Dimensions
python
# Use fewer dimensions for efficiency
response = client.embeddings.create(
model="text-embedding-3-large",
input="Hello world",
dimensions=256 # Instead of 3072
)1
2
3
4
5
6
2
3
4
5
6
Similarity Search
python
import numpy as np
def cosine_similarity(a, b):
return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
# Create embeddings
query = client.embeddings.create(
model="text-embedding-3-small",
input="What is machine learning?"
).data[0].embedding
documents = client.embeddings.create(
model="text-embedding-3-small",
input=[
"Machine learning is a subset of AI.",
"The weather is nice today.",
"Neural networks learn from data."
]
).data
# Find most similar
similarities = [
cosine_similarity(query, doc.embedding)
for doc in documents
]
most_similar_idx = np.argmax(similarities)
print(f"Most similar: document {most_similar_idx}")1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
GateFlow Extensions
Skip Caching
python
response = client.embeddings.create(
model="text-embedding-3-small",
input="Hello world",
extra_body={
"gateflow": {
"cache": "skip"
}
}
)1
2
3
4
5
6
7
8
9
2
3
4
5
6
7
8
9
Fallback Models
python
response = client.embeddings.create(
model="text-embedding-3-small",
input="Hello world",
extra_body={
"gateflow": {
"fallbacks": ["text-embedding-004", "embed-english-v3.0"]
}
}
)1
2
3
4
5
6
7
8
9
2
3
4
5
6
7
8
9
Use Cases
Document Search
- Embed all documents at indexing time
- Store embeddings in a vector database
- Embed search query
- Find nearest neighbors
Semantic Caching
GateFlow uses embeddings internally for semantic cache:
python
# These similar queries may hit cache
client.chat.completions.create(model="gpt-4o", messages=[{"role": "user", "content": "What is Python?"}])
client.chat.completions.create(model="gpt-4o", messages=[{"role": "user", "content": "Explain Python"}])1
2
3
2
3
Clustering
python
from sklearn.cluster import KMeans
# Embed documents
embeddings = [e.embedding for e in response.data]
# Cluster
kmeans = KMeans(n_clusters=3)
clusters = kmeans.fit_predict(embeddings)1
2
3
4
5
6
7
8
2
3
4
5
6
7
8
Pricing
| Model | Price per 1M tokens |
|---|---|
| text-embedding-3-small | $0.02 |
| text-embedding-3-large | $0.13 |
| text-embedding-ada-002 | $0.10 |
| text-embedding-004 | $0.025 |
| embed-english-v3.0 | $0.10 |
Error Codes
| Code | Description |
|---|---|
invalid_input | Input is empty or invalid |
token_limit_exceeded | Input exceeds model's token limit |
model_not_found | Embedding model not available |
See Error Handling for details.