Skip to content

Transcriptions

Convert audio to text using speech-to-text models.

Endpoint

POST /v1/audio/transcriptions

Authentication

Authorization: Bearer gw_prod_...

Request

Headers

HeaderRequiredDescription
AuthorizationYesBearer token with API key
Content-TypeYesmultipart/form-data

Body Parameters

ParameterTypeRequiredDescription
filefileYesAudio file (mp3, mp4, mpeg, mpga, m4a, wav, webm)
modelstringYesModel ID: whisper-1, voxtral-mini-latest
languagestringNoISO-639-1 language code (e.g., en, es, fr)
promptstringNoOptional text to guide transcription style
response_formatstringNojson, text, srt, verbose_json, vtt
temperaturenumberNoSampling temperature (0-1)
timestamp_granularitiesarrayNoword, segment (verbose_json only)

Examples

Basic Transcription

bash
curl -X POST https://api.gateflow.ai/v1/audio/transcriptions \
  -H "Authorization: Bearer gw_prod_..." \
  -F "file=@audio.mp3" \
  -F "model=whisper-1"

Python

python
import openai

client = openai.OpenAI(
    base_url="https://api.gateflow.ai/v1",
    api_key="gw_prod_..."
)

with open("audio.mp3", "rb") as audio_file:
    transcript = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file
    )

print(transcript.text)

With Language Hint

python
transcript = client.audio.transcriptions.create(
    model="whisper-1",
    file=audio_file,
    language="es"  # Spanish
)

With Timestamps

python
transcript = client.audio.transcriptions.create(
    model="whisper-1",
    file=audio_file,
    response_format="verbose_json",
    timestamp_granularities=["word", "segment"]
)

for segment in transcript.segments:
    print(f"[{segment.start:.2f}s - {segment.end:.2f}s] {segment.text}")

Streaming Transcription

python
import httpx

async with httpx.AsyncClient() as client:
    with open("audio.mp3", "rb") as f:
        async with client.stream(
            "POST",
            "https://api.gateflow.ai/v1/audio/transcriptions",
            headers={"Authorization": "Bearer gw_prod_..."},
            files={"file": f},
            data={"model": "voxtral-mini-latest", "stream": "true"}
        ) as response:
            async for chunk in response.aiter_text():
                print(chunk)

Response

JSON Format (Default)

json
{
  "text": "Hello, how can I help you today?"
}

Verbose JSON Format

json
{
  "task": "transcribe",
  "language": "english",
  "duration": 5.5,
  "text": "Hello, how can I help you today?",
  "segments": [
    {
      "id": 0,
      "seek": 0,
      "start": 0.0,
      "end": 2.5,
      "text": "Hello, how can I help you today?",
      "tokens": [50364, 2425, 11, 577, 393, 286, 854, 291, 965, 30],
      "temperature": 0.0,
      "avg_logprob": -0.25,
      "compression_ratio": 1.2,
      "no_speech_prob": 0.01
    }
  ],
  "words": [
    {"word": "Hello", "start": 0.0, "end": 0.5},
    {"word": "how", "start": 0.6, "end": 0.8},
    {"word": "can", "start": 0.9, "end": 1.1},
    {"word": "I", "start": 1.2, "end": 1.3},
    {"word": "help", "start": 1.4, "end": 1.6},
    {"word": "you", "start": 1.7, "end": 1.9},
    {"word": "today", "start": 2.0, "end": 2.5}
  ]
}

SRT Format

1
00:00:00,000 --> 00:00:02,500
Hello, how can I help you today?

VTT Format

WEBVTT

00:00:00.000 --> 00:00:02.500
Hello, how can I help you today?

GateFlow Extensions

Provider Selection

python
transcript = client.audio.transcriptions.create(
    model="auto",  # Let GateFlow choose
    file=audio_file,
    extra_body={
        "gateflow": {
            "prefer": "quality"  # or "speed" or "cost"
        }
    }
)

Fallbacks

python
transcript = client.audio.transcriptions.create(
    model="whisper-1",
    file=audio_file,
    extra_body={
        "gateflow": {
            "fallbacks": ["voxtral-mini-latest"]
        }
    }
)

Supported Models

ModelProviderLanguagesMax File SizeBest For
whisper-1OpenAI50+25MBGeneral transcription, translation
voxtral-mini-latestMistral100+25MBMultilingual, diarization
voxtral-mini-2602Mistral100+25MBMultilingual, diarization
gemini-2.5-flashGoogle100+25MBFast multilingual
gemini-2.5-proGoogle100+25MBHigh accuracy

Mistral Voxtral Features

Mistral Voxtral models support additional features:

ParameterTypeDescription
diarizebooleanEnable speaker identification
context_biasstringComma-separated domain-specific words (max 100)
python
# Transcription with speaker diarization
transcript = client.audio.transcriptions.create(
    model="voxtral-mini-latest",
    file=audio_file,
    extra_body={
        "diarize": True,
        "context_bias": "GateFlow,API,authentication"
    }
)

Voxtral Limitations

Mistral Voxtral does not support translation. Use OpenAI Whisper for audio translation to English.

Errors

CodeDescription
400Invalid file format or parameters
401Invalid API key
413File too large (max 25MB)
429Rate limit exceeded
500Provider error

See Also

Built with reliability in mind.