Model Capabilities

Voice APIs

View as Markdown

Voice APIs

Enterprise Compliance & Security

SOC 2 Type II
Audited controls for security, availability, and confidentiality
HIPAA Eligible
BAA available for healthcare applications handling PHI
GDPR Compliant
Data processing agreements and EU data residency options
Data Residency
Regional processing for compliance requirements
High Availability
Multi-region infrastructure with custom SLAs for enterprise workloads
SSO & RBAC
SAML SSO, role-based access, and audit logging

Quick Start: Voice Agent

Build a real-time voice assistant in minutes:

import asyncio
import json
import os
import websockets

async def voice_agent():
    async with websockets.connect(
        "wss://api.x.ai/v1/realtime",
        additional_headers={"Authorization": f"Bearer {os.environ['XAI_API_KEY']}"}
    ) as ws:
        # Configure voice and enable tools
        await ws.send(json.dumps({
            "type": "session.update",
            "session": {
                "voice": "Eve",
                "instructions": "You are a helpful customer support agent.",
                "turn_detection": {"type": "server_vad"},
                "tools": [{"type": "web_search"}]
            }
        }))
        
        # Stream audio and receive responses
        async for message in ws:
            event = json.loads(message)
            if event["type"] == "response.output_audio.delta":
                # Play audio: base64.b64decode(event["delta"])
                pass

asyncio.run(voice_agent())

Quick Start: Text to Speech

Generate speech from text with a single API call:

import os
import requests

response = requests.post(
    "https://api.x.ai/v1/tts",
    headers={
        "Authorization": f"Bearer {os.environ['XAI_API_KEY']}",
        "Content-Type": "application/json",
    },
    json={
        "text": "Welcome to xAI. How can I help you today?",
        "voice_id": "eve",
        "language": "en",
    },
)

with open("welcome.mp3", "wb") as f:
    f.write(response.content)

Built for Enterprise Voice

Telephony Integration
Connect via SIP, WebSocket, or LiveKit. Native G.711 μ-law/A-law codec support — no transcoding overhead.
Tool Calling
CRMs, calendars, databases, and any REST or GraphQL endpoint via function calling during live conversations.
20+ Languages
Natural pronunciation, accent handling, and seamless code-switching between languages in the same conversation.
Domain Expertise
Precise transcription of medical, legal, financial, and technical terminology — names, codes, and addresses.

Voices

Choose from 5 distinct voices, each with unique characteristics suited to different applications:

VoiceTypeToneDescriptionSample
EveFemaleEnergetic, upbeatDefault voice, engaging and enthusiastic
AraFemaleWarm, friendlyBalanced and conversational
RexMaleConfident, clearProfessional and articulate, ideal for business
SalNeutralSmooth, balancedVersatile voice suitable for various contexts
LeoMaleAuthoritative, strongDecisive and commanding, suitable for instructional content

Expressive Speech Tags

Go beyond flat text — add laughter, whispers, pauses, and more with inline tags. Works in both the Voice Agent API and Text to Speech API.

So I walked in and [pause] there it was. [laugh] I honestly could not believe it! <whisper>It was a secret the whole time.</whisper> Pretty cool, right?

Example Applications & Integrations

Production-ready examples and third-party framework integrations:


Did you find this page helpful?