Skip to content

Voice Agent Premium

High-quality voice pipeline for premium voice experiences.

Overview

The voice-agent-premium template prioritizes quality and natural conversation over raw speed, using the best available models.

Configuration

yaml
template: voice-agent-premium

stt:
  model: whisper-1
  streaming: false      # Batch for accuracy
  language: auto

llm:
  model: gpt-5.2
  max_tokens: 500
  temperature: 0.7

tts:
  model: eleven_multilingual_v2
  voice: professional
  streaming: true
  voice_settings:
    stability: 0.6
    similarity_boost: 0.8

Performance

StageLatencyNotes
STT500-1000msBatch processing
LLM (first token)200-400msAdvanced model
TTS (first chunk)150-250msPremium model
Total TTFB850-1650msHigher quality

Quality Comparison

FeatureFastPremium
STT AccuracyGoodExcellent
Response QualityGoodExcellent
Voice NaturalnessGoodExcellent
MultilingualLimitedFull
Latency400-750ms850-1650ms
Cost$0.003-0.005$0.008-0.015

Usage

Basic Usage

python
from gateflow_mcp import MCPClient
import base64

client = MCPClient(agent_id="agent_abc123", api_key="gf-agent-...")

with open("question.mp3", "rb") as f:
    audio_b64 = base64.b64encode(f.read()).decode()

result = client.call_tool(
    name="voice/pipeline",
    arguments={
        "audio": audio_b64,
        "template": "voice-agent-premium",
        "context": "You are a knowledgeable executive assistant."
    }
)

print(f"Transcription: {result['transcription']}")
print(f"Response: {result['response']}")

Extended Conversation

python
result = client.call_tool(
    name="voice/pipeline",
    arguments={
        "audio": audio_b64,
        "template": "voice-agent-premium",
        "context": """You are an executive assistant helping schedule meetings.

Today's date: February 16, 2026
User's timezone: Eastern

Be professional and thorough in your responses.""",
        "conversation_history": [
            {"role": "user", "content": "I need to schedule a board meeting."},
            {"role": "assistant", "content": "I'd be happy to help schedule a board meeting. What date and time works best for you?"},
            {"role": "user", "content": "Next Tuesday at 2 PM."}
        ]
    }
)

Multilingual Support

python
# Works with 100+ languages
result = client.call_tool(
    name="voice/pipeline",
    arguments={
        "audio": french_audio_b64,
        "template": "voice-agent-premium",
        "context": "Vous êtes un assistant utile. Répondez en français.",
        "overrides": {
            "stt": {"language": "fr"},
            "tts": {"language": "fr"}
        }
    }
)

Response Format

json
{
  "transcription": "I need to reschedule the board meeting to next Friday.",
  "response": "I'll reschedule the board meeting to Friday, February 21st. Would you like me to send updated calendar invites to all attendees? I can also prepare an updated agenda if needed.",
  "audio": "base64_encoded_mp3_audio...",
  "audio_format": "mp3",
  "latency": {
    "stt_ms": 750,
    "llm_ms": 380,
    "tts_ms": 220,
    "total_ms": 1350
  },
  "usage": {
    "stt_seconds": 5.2,
    "llm_tokens": {"prompt": 120, "completion": 65},
    "tts_characters": 180
  },
  "cost": 0.012
}

Voice Customization

python
result = client.call_tool(
    name="voice/pipeline",
    arguments={
        "audio": audio_b64,
        "template": "voice-agent-premium",
        "overrides": {
            "tts": {
                "voice": "21m00Tcm4TlvDq8ikWAM",  # Specific ElevenLabs voice
                "voice_settings": {
                    "stability": 0.5,
                    "similarity_boost": 0.9,
                    "style": 0.3,
                    "speed": 0.95
                }
            }
        }
    }
)

Best For

  • Executive assistants - Complex scheduling and tasks
  • Premium customer service - High-value interactions
  • Accessibility applications - Natural voice interfaces
  • Multilingual applications - Global user bases
  • Complex conversations - Multi-turn dialogues

Use Case: Financial Advisor

python
result = client.call_tool(
    name="voice/pipeline",
    arguments={
        "audio": audio_b64,
        "template": "voice-agent-premium",
        "context": """You are a financial advisor assistant.

Guidelines:
- Never give specific investment advice
- Always recommend consulting a licensed advisor
- Be informative about general financial concepts
- Maintain a professional, reassuring tone""",
        "overrides": {
            "llm": {
                "temperature": 0.5  # More conservative
            },
            "tts": {
                "voice": "serious"
            }
        }
    }
)

Permissions Required

yaml
permissions:
  tools:
    - voice/pipeline
  models:
    - whisper-1
    - gpt-5.2
    - eleven_multilingual_v2
  pipelines:
    - voice-agent-premium

Cost Optimization

Balance quality and cost:

python
# Use premium for complex queries, fast for simple ones
def select_template(query_complexity):
    if query_complexity == "simple":
        return "voice-agent-fast"
    else:
        return "voice-agent-premium"

Next Steps

Built with reliability in mind.