Appearance
Voice Agent Fast
Low-latency voice pipeline optimized for real-time conversations.
Overview
The voice-agent-fast template prioritizes speed over quality, achieving sub-second response times for interactive voice applications.
Configuration
yaml
template: voice-agent-fast
stt:
model: voxtral-mini-latest
streaming: true
language: auto
llm:
model: gpt-5-mini
max_tokens: 150
temperature: 0.7
tts:
model: eleven_turbo_v2_5
voice: friendly
streaming: truePerformance
| Stage | Latency | Notes |
|---|---|---|
| STT (first word) | 200-400ms | Streaming transcription |
| LLM (first token) | 100-200ms | Fast model |
| TTS (first chunk) | 100-150ms | Turbo model |
| Total TTFB | 400-750ms | Time to first audio byte |
Usage
Basic Usage
python
from gateflow_mcp import MCPClient
import base64
client = MCPClient(agent_id="agent_abc123", api_key="gf-agent-...")
# Read audio
with open("question.mp3", "rb") as f:
audio_b64 = base64.b64encode(f.read()).decode()
# Process through pipeline
result = client.call_tool(
name="voice/pipeline",
arguments={
"audio": audio_b64,
"template": "voice-agent-fast",
"context": "You are a helpful customer service assistant."
}
)
print(f"User said: {result['transcription']}")
print(f"Response: {result['response']}")
# Play response audio
response_audio = base64.b64decode(result["audio"])Streaming Response
python
async def stream_voice_response(audio_file):
with open(audio_file, "rb") as f:
audio_b64 = base64.b64encode(f.read()).decode()
async for event in client.stream_tool(
name="voice/pipeline",
arguments={
"audio": audio_b64,
"template": "voice-agent-fast",
"stream": True
}
):
if event["type"] == "transcription.partial":
print(f"Hearing: {event['text']}", end="\r")
elif event["type"] == "transcription.final":
print(f"User: {event['text']}")
elif event["type"] == "llm.token":
print(event["token"], end="", flush=True)
elif event["type"] == "audio.chunk":
audio = base64.b64decode(event["data"])
play_audio_chunk(audio) # Play immediately
elif event["type"] == "done":
print(f"\nTotal latency: {event['latency_ms']}ms")With Custom Context
python
result = client.call_tool(
name="voice/pipeline",
arguments={
"audio": audio_b64,
"template": "voice-agent-fast",
"context": """You are a voice assistant for TechCorp.
Key information:
- Support hours: 9 AM - 6 PM EST
- Phone: 1-800-TECH-123
- Website: techcorp.com
Be concise and helpful.""",
"conversation_history": [
{"role": "user", "content": "What are your hours?"},
{"role": "assistant", "content": "We're open 9 AM to 6 PM Eastern."}
]
}
)Response Format
json
{
"transcription": "What time do you close today?",
"response": "We close at 6 PM Eastern today. Is there anything else I can help you with?",
"audio": "base64_encoded_mp3_audio...",
"audio_format": "mp3",
"latency": {
"stt_ms": 320,
"llm_ms": 180,
"tts_ms": 150,
"total_ms": 650
},
"usage": {
"stt_seconds": 3.5,
"llm_tokens": {"prompt": 45, "completion": 28},
"tts_characters": 85
},
"cost": 0.0045
}Overrides
Customize the template per-request:
python
result = client.call_tool(
name="voice/pipeline",
arguments={
"audio": audio_b64,
"template": "voice-agent-fast",
"overrides": {
"llm": {
"model": "gpt-5.2", # Use better model
"max_tokens": 200
},
"tts": {
"voice": "professional" # Different voice
}
}
}
)Best For
- Customer service bots - Quick responses to common questions
- Voice assistants - Interactive conversations
- IVR systems - Automated phone systems
- Real-time translation - Low-latency requirements
Limitations
- Shorter responses (150 tokens default)
- Streaming STT may have lower accuracy than batch
- ElevenLabs Turbo has slightly lower quality than v2
Permissions Required
yaml
permissions:
tools:
- voice/pipeline
models:
- voxtral-mini-latest
- gpt-5-mini
- eleven_turbo_v2_5
pipelines:
- voice-agent-fastNext Steps
- Voice Agent Premium - Higher quality option
- Custom Templates - Build your own
- Streaming Speech - Streaming guide