Appearance
Voice & Audio Overview
GateFlow provides a unified pipeline for voice and audio processing, combining speech-to-text (STT), LLM processing, and text-to-speech (TTS) into a single API.
Pipeline Architecture
Key Features
Unified API
One endpoint for complete voice interactions:
python
response = requests.post(
"https://api.gateflow.ai/v1/audio/pipelines",
headers={"Authorization": "Bearer gw_prod_..."},
json={
"template": "voice-agent-fast",
"audio": base64_audio,
"context": "You are a helpful customer service agent."
}
)
# Returns transcription, LLM response, and synthesized audioProvider Flexibility
Mix and match providers for each stage:
| Stage | Providers |
|---|---|
| STT | OpenAI Whisper, Google, Deepgram |
| LLM | OpenAI, Anthropic, Google, Mistral |
| TTS | OpenAI, ElevenLabs, Google, PlayHT |
Pipeline Templates
Pre-configured pipelines for common use cases:
- ambient-scribe: Medical transcription with PII handling
- voice-agent-fast: Low-latency voice assistant
- voice-agent-premium: High-quality voice assistant
- legal-dictation: Legal document transcription
Quick Start
1. Transcribe Audio
bash
curl -X POST https://api.gateflow.ai/v1/audio/transcriptions \
-H "Authorization: Bearer gw_prod_..." \
-F "file=@audio.mp3" \
-F "model=whisper-1"2. Synthesize Speech
bash
curl -X POST https://api.gateflow.ai/v1/audio/speech \
-H "Authorization: Bearer gw_prod_..." \
-H "Content-Type: application/json" \
-d '{
"model": "tts-1",
"input": "Hello, how can I help you today?",
"voice": "alloy"
}' \
--output speech.mp33. Full Pipeline
bash
curl -X POST https://api.gateflow.ai/v1/audio/pipelines \
-H "Authorization: Bearer gw_prod_..." \
-F "audio=@question.mp3" \
-F "template=voice-agent-fast"Voice Mapping
GateFlow provides 6 standard voice personas that map to each provider's best equivalent:
| GateFlow Voice | Description |
|---|---|
professional | Clear, authoritative |
friendly | Warm, approachable |
calm | Soothing, measured |
energetic | Upbeat, enthusiastic |
serious | Formal, deliberate |
casual | Relaxed, conversational |
Streaming
Real-time streaming for low-latency applications:
python
async for chunk in client.audio.pipelines.stream(
template="voice-agent-fast",
audio=audio_bytes
):
if chunk.type == "transcription":
print(f"User said: {chunk.text}")
elif chunk.type == "audio":
play_audio(chunk.data)Next Steps
- Audio Providers - Configure STT/TTS providers
- Voice Mapping - Customize voice personas
- Pipeline Templates - Pre-built configurations
- Streaming Speech - Real-time audio