Skip to content

Voice Agent Fast

Low-latency voice pipeline optimized for real-time conversations.

Overview

The voice-agent-fast template prioritizes speed over quality, achieving sub-second response times for interactive voice applications.

Configuration

yaml
template: voice-agent-fast

stt:
  model: voxtral-mini-latest
  streaming: true
  language: auto

llm:
  model: gpt-5-mini
  max_tokens: 150
  temperature: 0.7

tts:
  model: eleven_turbo_v2_5
  voice: friendly
  streaming: true

Performance

StageLatencyNotes
STT (first word)200-400msStreaming transcription
LLM (first token)100-200msFast model
TTS (first chunk)100-150msTurbo model
Total TTFB400-750msTime to first audio byte

Usage

Basic Usage

python
from gateflow_mcp import MCPClient
import base64

client = MCPClient(agent_id="agent_abc123", api_key="gf-agent-...")

# Read audio
with open("question.mp3", "rb") as f:
    audio_b64 = base64.b64encode(f.read()).decode()

# Process through pipeline
result = client.call_tool(
    name="voice/pipeline",
    arguments={
        "audio": audio_b64,
        "template": "voice-agent-fast",
        "context": "You are a helpful customer service assistant."
    }
)

print(f"User said: {result['transcription']}")
print(f"Response: {result['response']}")

# Play response audio
response_audio = base64.b64decode(result["audio"])

Streaming Response

python
async def stream_voice_response(audio_file):
    with open(audio_file, "rb") as f:
        audio_b64 = base64.b64encode(f.read()).decode()

    async for event in client.stream_tool(
        name="voice/pipeline",
        arguments={
            "audio": audio_b64,
            "template": "voice-agent-fast",
            "stream": True
        }
    ):
        if event["type"] == "transcription.partial":
            print(f"Hearing: {event['text']}", end="\r")

        elif event["type"] == "transcription.final":
            print(f"User: {event['text']}")

        elif event["type"] == "llm.token":
            print(event["token"], end="", flush=True)

        elif event["type"] == "audio.chunk":
            audio = base64.b64decode(event["data"])
            play_audio_chunk(audio)  # Play immediately

        elif event["type"] == "done":
            print(f"\nTotal latency: {event['latency_ms']}ms")

With Custom Context

python
result = client.call_tool(
    name="voice/pipeline",
    arguments={
        "audio": audio_b64,
        "template": "voice-agent-fast",
        "context": """You are a voice assistant for TechCorp.

Key information:
- Support hours: 9 AM - 6 PM EST
- Phone: 1-800-TECH-123
- Website: techcorp.com

Be concise and helpful.""",
        "conversation_history": [
            {"role": "user", "content": "What are your hours?"},
            {"role": "assistant", "content": "We're open 9 AM to 6 PM Eastern."}
        ]
    }
)

Response Format

json
{
  "transcription": "What time do you close today?",
  "response": "We close at 6 PM Eastern today. Is there anything else I can help you with?",
  "audio": "base64_encoded_mp3_audio...",
  "audio_format": "mp3",
  "latency": {
    "stt_ms": 320,
    "llm_ms": 180,
    "tts_ms": 150,
    "total_ms": 650
  },
  "usage": {
    "stt_seconds": 3.5,
    "llm_tokens": {"prompt": 45, "completion": 28},
    "tts_characters": 85
  },
  "cost": 0.0045
}

Overrides

Customize the template per-request:

python
result = client.call_tool(
    name="voice/pipeline",
    arguments={
        "audio": audio_b64,
        "template": "voice-agent-fast",
        "overrides": {
            "llm": {
                "model": "gpt-5.2",  # Use better model
                "max_tokens": 200
            },
            "tts": {
                "voice": "professional"  # Different voice
            }
        }
    }
)

Best For

  • Customer service bots - Quick responses to common questions
  • Voice assistants - Interactive conversations
  • IVR systems - Automated phone systems
  • Real-time translation - Low-latency requirements

Limitations

  • Shorter responses (150 tokens default)
  • Streaming STT may have lower accuracy than batch
  • ElevenLabs Turbo has slightly lower quality than v2

Permissions Required

yaml
permissions:
  tools:
    - voice/pipeline
  models:
    - voxtral-mini-latest
    - gpt-5-mini
    - eleven_turbo_v2_5
  pipelines:
    - voice-agent-fast

Next Steps

Built with reliability in mind.