Appearance
Transcriptions
Convert audio to text using speech-to-text models.
Endpoint
POST /v1/audio/transcriptionsAuthentication
Authorization: Bearer gw_prod_...Request
Headers
| Header | Required | Description |
|---|---|---|
Authorization | Yes | Bearer token with API key |
Content-Type | Yes | multipart/form-data |
Body Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
file | file | Yes | Audio file (mp3, mp4, mpeg, mpga, m4a, wav, webm) |
model | string | Yes | Model ID: whisper-1, voxtral-mini-latest |
language | string | No | ISO-639-1 language code (e.g., en, es, fr) |
prompt | string | No | Optional text to guide transcription style |
response_format | string | No | json, text, srt, verbose_json, vtt |
temperature | number | No | Sampling temperature (0-1) |
timestamp_granularities | array | No | word, segment (verbose_json only) |
Examples
Basic Transcription
bash
curl -X POST https://api.gateflow.ai/v1/audio/transcriptions \
-H "Authorization: Bearer gw_prod_..." \
-F "file=@audio.mp3" \
-F "model=whisper-1"Python
python
import openai
client = openai.OpenAI(
base_url="https://api.gateflow.ai/v1",
api_key="gw_prod_..."
)
with open("audio.mp3", "rb") as audio_file:
transcript = client.audio.transcriptions.create(
model="whisper-1",
file=audio_file
)
print(transcript.text)With Language Hint
python
transcript = client.audio.transcriptions.create(
model="whisper-1",
file=audio_file,
language="es" # Spanish
)With Timestamps
python
transcript = client.audio.transcriptions.create(
model="whisper-1",
file=audio_file,
response_format="verbose_json",
timestamp_granularities=["word", "segment"]
)
for segment in transcript.segments:
print(f"[{segment.start:.2f}s - {segment.end:.2f}s] {segment.text}")Streaming Transcription
python
import httpx
async with httpx.AsyncClient() as client:
with open("audio.mp3", "rb") as f:
async with client.stream(
"POST",
"https://api.gateflow.ai/v1/audio/transcriptions",
headers={"Authorization": "Bearer gw_prod_..."},
files={"file": f},
data={"model": "voxtral-mini-latest", "stream": "true"}
) as response:
async for chunk in response.aiter_text():
print(chunk)Response
JSON Format (Default)
json
{
"text": "Hello, how can I help you today?"
}Verbose JSON Format
json
{
"task": "transcribe",
"language": "english",
"duration": 5.5,
"text": "Hello, how can I help you today?",
"segments": [
{
"id": 0,
"seek": 0,
"start": 0.0,
"end": 2.5,
"text": "Hello, how can I help you today?",
"tokens": [50364, 2425, 11, 577, 393, 286, 854, 291, 965, 30],
"temperature": 0.0,
"avg_logprob": -0.25,
"compression_ratio": 1.2,
"no_speech_prob": 0.01
}
],
"words": [
{"word": "Hello", "start": 0.0, "end": 0.5},
{"word": "how", "start": 0.6, "end": 0.8},
{"word": "can", "start": 0.9, "end": 1.1},
{"word": "I", "start": 1.2, "end": 1.3},
{"word": "help", "start": 1.4, "end": 1.6},
{"word": "you", "start": 1.7, "end": 1.9},
{"word": "today", "start": 2.0, "end": 2.5}
]
}SRT Format
1
00:00:00,000 --> 00:00:02,500
Hello, how can I help you today?VTT Format
WEBVTT
00:00:00.000 --> 00:00:02.500
Hello, how can I help you today?GateFlow Extensions
Provider Selection
python
transcript = client.audio.transcriptions.create(
model="auto", # Let GateFlow choose
file=audio_file,
extra_body={
"gateflow": {
"prefer": "quality" # or "speed" or "cost"
}
}
)Fallbacks
python
transcript = client.audio.transcriptions.create(
model="whisper-1",
file=audio_file,
extra_body={
"gateflow": {
"fallbacks": ["voxtral-mini-latest"]
}
}
)Supported Models
| Model | Provider | Languages | Max File Size | Best For |
|---|---|---|---|---|
whisper-1 | OpenAI | 50+ | 25MB | General transcription, translation |
voxtral-mini-latest | Mistral | 100+ | 25MB | Multilingual, diarization |
voxtral-mini-2602 | Mistral | 100+ | 25MB | Multilingual, diarization |
gemini-2.5-flash | 100+ | 25MB | Fast multilingual | |
gemini-2.5-pro | 100+ | 25MB | High accuracy |
Mistral Voxtral Features
Mistral Voxtral models support additional features:
| Parameter | Type | Description |
|---|---|---|
diarize | boolean | Enable speaker identification |
context_bias | string | Comma-separated domain-specific words (max 100) |
python
# Transcription with speaker diarization
transcript = client.audio.transcriptions.create(
model="voxtral-mini-latest",
file=audio_file,
extra_body={
"diarize": True,
"context_bias": "GateFlow,API,authentication"
}
)Voxtral Limitations
Mistral Voxtral does not support translation. Use OpenAI Whisper for audio translation to English.
Errors
| Code | Description |
|---|---|
| 400 | Invalid file format or parameters |
| 401 | Invalid API key |
| 413 | File too large (max 25MB) |
| 429 | Rate limit exceeded |
| 500 | Provider error |
See Also
- Speech Synthesis - Text to speech
- Audio Providers - Provider configuration
- Streaming Speech - Real-time audio