Model Capabilities
Voice APIs
Voice APIs
Enterprise Compliance & Security
✓
SOC 2 Type II
Audited controls for security, availability, and confidentiality
✓
HIPAA Eligible
BAA available for healthcare applications handling PHI
✓
GDPR Compliant
Data processing agreements and EU data residency options
✓
Data Residency
Regional processing for compliance requirements
✓
High Availability
Multi-region infrastructure with custom SLAs for enterprise workloads
✓
SSO & RBAC
SAML SSO, role-based access, and audit logging
Quick Start: Voice Agent
Build a real-time voice assistant in minutes:
import asyncio
import json
import os
import websockets
async def voice_agent():
async with websockets.connect(
"wss://api.x.ai/v1/realtime",
additional_headers={"Authorization": f"Bearer {os.environ['XAI_API_KEY']}"}
) as ws:
# Configure voice and enable tools
await ws.send(json.dumps({
"type": "session.update",
"session": {
"voice": "Eve",
"instructions": "You are a helpful customer support agent.",
"turn_detection": {"type": "server_vad"},
"tools": [{"type": "web_search"}]
}
}))
# Stream audio and receive responses
async for message in ws:
event = json.loads(message)
if event["type"] == "response.output_audio.delta":
# Play audio: base64.b64decode(event["delta"])
pass
asyncio.run(voice_agent())
Quick Start: Text to Speech
Generate speech from text with a single API call:
import os
import requests
response = requests.post(
"https://api.x.ai/v1/tts",
headers={
"Authorization": f"Bearer {os.environ['XAI_API_KEY']}",
"Content-Type": "application/json",
},
json={
"text": "Welcome to xAI. How can I help you today?",
"voice_id": "eve",
"language": "en",
},
)
with open("welcome.mp3", "wb") as f:
f.write(response.content)
Built for Enterprise Voice
Telephony Integration
Connect via SIP, WebSocket, or LiveKit. Native G.711 μ-law/A-law codec support — no transcoding overhead.
Tool Calling
CRMs, calendars, databases, and any REST or GraphQL endpoint via function calling during live conversations.
20+ Languages
Natural pronunciation, accent handling, and seamless code-switching between languages in the same conversation.
Domain Expertise
Precise transcription of medical, legal, financial, and technical terminology — names, codes, and addresses.
Voices
Choose from 5 distinct voices, each with unique characteristics suited to different applications:
| Voice | Type | Tone | Description | Sample |
|---|---|---|---|---|
Eve | Female | Energetic, upbeat | Default voice, engaging and enthusiastic | |
Ara | Female | Warm, friendly | Balanced and conversational | |
Rex | Male | Confident, clear | Professional and articulate, ideal for business | |
Sal | Neutral | Smooth, balanced | Versatile voice suitable for various contexts | |
Leo | Male | Authoritative, strong | Decisive and commanding, suitable for instructional content |
Expressive Speech Tags
Go beyond flat text — add laughter, whispers, pauses, and more with inline tags. Works in both the Voice Agent API and Text to Speech API.
ExampleFull tag reference →
So I walked in and [pause] there it was. [laugh] I honestly could not believe it! <whisper>It was a secret the whole time.</whisper> Pretty cool, right?
Example Applications & Integrations
Production-ready examples and third-party framework integrations:
Did you find this page helpful?