Transcriptions

Convert audio to text using speech-to-text models.

Endpoint

POST /v1/audio/transcriptions

Authentication

Authorization: Bearer gw_prod_...

Request

Headers

Header	Required	Description
`Authorization`	Yes	Bearer token with API key
`Content-Type`	Yes	`multipart/form-data`

Body Parameters

Parameter	Type	Required	Description
`file`	file	Yes	Audio file (mp3, mp4, mpeg, mpga, m4a, wav, webm)
`model`	string	Yes	Model ID: `whisper-1`, `voxtral-mini-latest`
`language`	string	No	ISO-639-1 language code (e.g., `en`, `es`, `fr`)
`prompt`	string	No	Optional text to guide transcription style
`response_format`	string	No	`json`, `text`, `srt`, `verbose_json`, `vtt`
`temperature`	number	No	Sampling temperature (0-1)
`timestamp_granularities`	array	No	`word`, `segment` (verbose_json only)

Examples

Basic Transcription

bash

curl -X POST https://api.gateflow.ai/v1/audio/transcriptions \
  -H "Authorization: Bearer gw_prod_..." \
  -F "file=@audio.mp3" \
  -F "model=whisper-1"

Python

python

import openai

client = openai.OpenAI(
    base_url="https://api.gateflow.ai/v1",
    api_key="gw_prod_..."
)

with open("audio.mp3", "rb") as audio_file:
    transcript = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file
    )

print(transcript.text)

With Language Hint

python

transcript = client.audio.transcriptions.create(
    model="whisper-1",
    file=audio_file,
    language="es"  # Spanish
)

With Timestamps

python

transcript = client.audio.transcriptions.create(
    model="whisper-1",
    file=audio_file,
    response_format="verbose_json",
    timestamp_granularities=["word", "segment"]
)

for segment in transcript.segments:
    print(f"[{segment.start:.2f}s - {segment.end:.2f}s] {segment.text}")

Streaming Transcription

python

import httpx

async with httpx.AsyncClient() as client:
    with open("audio.mp3", "rb") as f:
        async with client.stream(
            "POST",
            "https://api.gateflow.ai/v1/audio/transcriptions",
            headers={"Authorization": "Bearer gw_prod_..."},
            files={"file": f},
            data={"model": "voxtral-mini-latest", "stream": "true"}
        ) as response:
            async for chunk in response.aiter_text():
                print(chunk)

Response

JSON Format (Default)

json

{
  "text": "Hello, how can I help you today?"
}

Verbose JSON Format

json

{
  "task": "transcribe",
  "language": "english",
  "duration": 5.5,
  "text": "Hello, how can I help you today?",
  "segments": [
    {
      "id": 0,
      "seek": 0,
      "start": 0.0,
      "end": 2.5,
      "text": "Hello, how can I help you today?",
      "tokens": [50364, 2425, 11, 577, 393, 286, 854, 291, 965, 30],
      "temperature": 0.0,
      "avg_logprob": -0.25,
      "compression_ratio": 1.2,
      "no_speech_prob": 0.01
    }
  ],
  "words": [
    {"word": "Hello", "start": 0.0, "end": 0.5},
    {"word": "how", "start": 0.6, "end": 0.8},
    {"word": "can", "start": 0.9, "end": 1.1},
    {"word": "I", "start": 1.2, "end": 1.3},
    {"word": "help", "start": 1.4, "end": 1.6},
    {"word": "you", "start": 1.7, "end": 1.9},
    {"word": "today", "start": 2.0, "end": 2.5}
  ]
}

SRT Format

1
00:00:00,000 --> 00:00:02,500
Hello, how can I help you today?

VTT Format

WEBVTT

00:00:00.000 --> 00:00:02.500
Hello, how can I help you today?

GateFlow Extensions

Provider Selection

python

transcript = client.audio.transcriptions.create(
    model="auto",  # Let GateFlow choose
    file=audio_file,
    extra_body={
        "gateflow": {
            "prefer": "quality"  # or "speed" or "cost"
        }
    }
)

Fallbacks

python

transcript = client.audio.transcriptions.create(
    model="whisper-1",
    file=audio_file,
    extra_body={
        "gateflow": {
            "fallbacks": ["voxtral-mini-latest"]
        }
    }
)

Supported Models

Model	Provider	Languages	Max File Size	Best For
`whisper-1`	OpenAI	50+	25MB	General transcription, translation
`voxtral-mini-latest`	Mistral	100+	25MB	Multilingual, diarization
`voxtral-mini-2602`	Mistral	100+	25MB	Multilingual, diarization
`gemini-2.5-flash`	Google	100+	25MB	Fast multilingual
`gemini-2.5-pro`	Google	100+	25MB	High accuracy

Mistral Voxtral Features

Mistral Voxtral models support additional features:

Parameter	Type	Description
`diarize`	boolean	Enable speaker identification
`context_bias`	string	Comma-separated domain-specific words (max 100)

python

# Transcription with speaker diarization
transcript = client.audio.transcriptions.create(
    model="voxtral-mini-latest",
    file=audio_file,
    extra_body={
        "diarize": True,
        "context_bias": "GateFlow,API,authentication"
    }
)

Voxtral Limitations

Mistral Voxtral does not support translation. Use OpenAI Whisper for audio translation to English.

Errors

Code	Description
400	Invalid file format or parameters
401	Invalid API key
413	File too large (max 25MB)
429	Rate limit exceeded
500	Provider error

Transcriptions ​

Endpoint ​

Authentication ​

Request ​

Headers ​

Body Parameters ​

Examples ​

Basic Transcription ​

Python ​

With Language Hint ​

With Timestamps ​

Streaming Transcription ​

Response ​

JSON Format (Default) ​

Verbose JSON Format ​

SRT Format ​

VTT Format ​

GateFlow Extensions ​

Provider Selection ​

Fallbacks ​

Supported Models ​

Mistral Voxtral Features ​

Errors ​

See Also ​

Transcriptions

Endpoint

Authentication

Request

Headers

Body Parameters

Examples

Basic Transcription

Python

With Language Hint

With Timestamps

Streaming Transcription

Response

JSON Format (Default)

Verbose JSON Format

SRT Format

VTT Format

GateFlow Extensions

Provider Selection

Fallbacks

Supported Models

Mistral Voxtral Features

Errors

See Also