Skip to content

Audio Providers

Configure speech-to-text (STT) and text-to-speech (TTS) providers for your voice pipelines.

Supported Providers

Speech-to-Text (STT)

ProviderModelLanguagesBest For
OpenAIwhisper-150+General transcription
Mistralvoxtral-mini-latest100+Multilingual, low latency
Googlegemini-2.5-flash100+Audio via chat API

Text-to-Speech (TTS)

ProviderModelVoicesBest For
OpenAItts-1, tts-1-hd6General use
ElevenLabseleven_multilingual_v2100+Premium quality
ElevenLabseleven_turbo_v2_5100+Low latency
ElevenLabseleven_flash_v2_5100+Cost-effective

Configuring Providers

OpenAI (Whisper)

bash
curl -X POST https://api.gateflow.ai/v1/management/providers \
  -H "Authorization: Bearer gw_prod_admin_key" \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "openai",
    "credentials": {
      "api_key": "sk-..."
    }
  }'

STT Usage:

bash
curl -X POST https://api.gateflow.ai/v1/audio/transcriptions \
  -H "Authorization: Bearer gw_prod_..." \
  -F "file=@audio.mp3" \
  -F "model=whisper-1"

TTS Usage:

bash
curl -X POST https://api.gateflow.ai/v1/audio/speech \
  -H "Authorization: Bearer gw_prod_..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-1-hd",
    "input": "Hello, how can I help you?",
    "voice": "alloy"
  }' --output speech.mp3

ElevenLabs

bash
curl -X POST https://api.gateflow.ai/v1/management/providers \
  -H "Authorization: Bearer gw_prod_admin_key" \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "elevenlabs",
    "credentials": {
      "api_key": "..."
    }
  }'

TTS Usage:

bash
curl -X POST https://api.gateflow.ai/v1/audio/speech \
  -H "Authorization: Bearer gw_prod_..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "eleven_multilingual_v2",
    "input": "Hello, how can I help you?",
    "voice": "rachel"
  }' --output speech.mp3

Mistral (Voxtral)

bash
curl -X POST https://api.gateflow.ai/v1/management/providers \
  -H "Authorization: Bearer gw_prod_admin_key" \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "mistral",
    "credentials": {
      "api_key": "..."
    }
  }'

STT Usage:

bash
curl -X POST https://api.gateflow.ai/v1/audio/transcriptions \
  -H "Authorization: Bearer gw_prod_..." \
  -F "file=@audio.mp3" \
  -F "model=voxtral-mini-latest"

Provider Selection

Automatic Selection

Let GateFlow choose based on requirements:

python
response = client.audio.transcriptions.create(
    file=audio_file,
    model="auto",
    extra_body={
        "gateflow": {
            "prefer": "quality"  # or "speed" or "cost"
        }
    }
)

Explicit Selection

Specify the exact provider and model:

python
response = client.audio.transcriptions.create(
    file=audio_file,
    model="whisper-1"  # Always uses OpenAI
)

Provider Comparison

STT Quality

ProviderAccuracyLatencyCost
Whisper-1ExcellentMedium$0.006/min
Voxtral MiniExcellentLow$0.02/min
Gemini 2.5GoodLowVia chat tokens

TTS Quality

ProviderNaturalnessLatencyCost
ElevenLabs v2ExcellentMedium$24/1M chars
ElevenLabs TurboGoodLow$12/1M chars
OpenAI TTS-1-HDGoodMedium$30/1M chars
OpenAI TTS-1GoodLow$15/1M chars

Fallback Configuration

Configure fallbacks for reliability:

bash
curl -X POST https://api.gateflow.ai/v1/management/audio-fallbacks \
  -H "Authorization: Bearer gw_prod_admin_key" \
  -H "Content-Type: application/json" \
  -d '{
    "stt": {
      "primary": "whisper-1",
      "fallbacks": ["voxtral-mini-latest"]
    },
    "tts": {
      "primary": "eleven_multilingual_v2",
      "fallbacks": ["tts-1-hd", "tts-1"]
    }
  }'

Next Steps

Built with reliability in mind.