Skip to content

Voice & Audio Overview

GateFlow provides a unified pipeline for voice and audio processing, combining speech-to-text (STT), LLM processing, and text-to-speech (TTS) into a single API.

Pipeline Architecture

Key Features

Unified API

One endpoint for complete voice interactions:

python
response = requests.post(
    "https://api.gateflow.ai/v1/audio/pipelines",
    headers={"Authorization": "Bearer gw_prod_..."},
    json={
        "template": "voice-agent-fast",
        "audio": base64_audio,
        "context": "You are a helpful customer service agent."
    }
)

# Returns transcription, LLM response, and synthesized audio

Provider Flexibility

Mix and match providers for each stage:

StageProviders
STTOpenAI Whisper, Google, Deepgram
LLMOpenAI, Anthropic, Google, Mistral
TTSOpenAI, ElevenLabs, Google, PlayHT

Pipeline Templates

Pre-configured pipelines for common use cases:

  • ambient-scribe: Medical transcription with PII handling
  • voice-agent-fast: Low-latency voice assistant
  • voice-agent-premium: High-quality voice assistant
  • legal-dictation: Legal document transcription

Quick Start

1. Transcribe Audio

bash
curl -X POST https://api.gateflow.ai/v1/audio/transcriptions \
  -H "Authorization: Bearer gw_prod_..." \
  -F "file=@audio.mp3" \
  -F "model=whisper-1"

2. Synthesize Speech

bash
curl -X POST https://api.gateflow.ai/v1/audio/speech \
  -H "Authorization: Bearer gw_prod_..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-1",
    "input": "Hello, how can I help you today?",
    "voice": "alloy"
  }' \
  --output speech.mp3

3. Full Pipeline

bash
curl -X POST https://api.gateflow.ai/v1/audio/pipelines \
  -H "Authorization: Bearer gw_prod_..." \
  -F "audio=@question.mp3" \
  -F "template=voice-agent-fast"

Voice Mapping

GateFlow provides 6 standard voice personas that map to each provider's best equivalent:

GateFlow VoiceDescription
professionalClear, authoritative
friendlyWarm, approachable
calmSoothing, measured
energeticUpbeat, enthusiastic
seriousFormal, deliberate
casualRelaxed, conversational

Streaming

Real-time streaming for low-latency applications:

python
async for chunk in client.audio.pipelines.stream(
    template="voice-agent-fast",
    audio=audio_bytes
):
    if chunk.type == "transcription":
        print(f"User said: {chunk.text}")
    elif chunk.type == "audio":
        play_audio(chunk.data)

Next Steps

Built with reliability in mind.