Voice & Audio Overview

GateFlow provides a unified pipeline for voice and audio processing, combining speech-to-text (STT), LLM processing, and text-to-speech (TTS) into a single API.

Pipeline Architecture

Key Features

Unified API

One endpoint for complete voice interactions:

python

response = requests.post(
    "https://api.gateflow.ai/v1/audio/pipelines",
    headers={"Authorization": "Bearer gw_prod_..."},
    json={
        "template": "voice-agent-fast",
        "audio": base64_audio,
        "context": "You are a helpful customer service agent."
    }
)

# Returns transcription, LLM response, and synthesized audio

Provider Flexibility

Mix and match providers for each stage:

Stage	Providers
STT	OpenAI Whisper, Google, Deepgram
LLM	OpenAI, Anthropic, Google, Mistral
TTS	OpenAI, ElevenLabs, Google, PlayHT

Pipeline Templates

Pre-configured pipelines for common use cases:

ambient-scribe: Medical transcription with PII handling
voice-agent-fast: Low-latency voice assistant
voice-agent-premium: High-quality voice assistant
legal-dictation: Legal document transcription

Quick Start

1. Transcribe Audio

bash

curl -X POST https://api.gateflow.ai/v1/audio/transcriptions \
  -H "Authorization: Bearer gw_prod_..." \
  -F "file=@audio.mp3" \
  -F "model=whisper-1"

2. Synthesize Speech

bash

curl -X POST https://api.gateflow.ai/v1/audio/speech \
  -H "Authorization: Bearer gw_prod_..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-1",
    "input": "Hello, how can I help you today?",
    "voice": "alloy"
  }' \
  --output speech.mp3

3. Full Pipeline

bash

curl -X POST https://api.gateflow.ai/v1/audio/pipelines \
  -H "Authorization: Bearer gw_prod_..." \
  -F "audio=@question.mp3" \
  -F "template=voice-agent-fast"

Voice Mapping

GateFlow provides 6 standard voice personas that map to each provider's best equivalent:

GateFlow Voice	Description
`professional`	Clear, authoritative
`friendly`	Warm, approachable
`calm`	Soothing, measured
`energetic`	Upbeat, enthusiastic
`serious`	Formal, deliberate
`casual`	Relaxed, conversational

Streaming

Real-time streaming for low-latency applications:

python

async for chunk in client.audio.pipelines.stream(
    template="voice-agent-fast",
    audio=audio_bytes
):
    if chunk.type == "transcription":
        print(f"User said: {chunk.text}")
    elif chunk.type == "audio":
        play_audio(chunk.data)

Next Steps

Audio Providers - Configure STT/TTS providers
Voice Mapping - Customize voice personas
Pipeline Templates - Pre-built configurations
Streaming Speech - Real-time audio

Voice & Audio Overview ​

Pipeline Architecture ​

Key Features ​

Unified API ​

Provider Flexibility ​

Pipeline Templates ​

Quick Start ​

1. Transcribe Audio ​

2. Synthesize Speech ​

3. Full Pipeline ​

Voice Mapping ​

Streaming ​

Next Steps ​

Voice & Audio Overview

Pipeline Architecture

Key Features

Unified API

Provider Flexibility

Pipeline Templates

Quick Start

1. Transcribe Audio

2. Synthesize Speech

3. Full Pipeline

Voice Mapping

Streaming

Next Steps