Inference API
Voice
Create client secret
/v1/realtime/client_secrets
Create an ephemeral client secret for authenticating browser-side Realtime API connections.
Request Body
Response Body
value
string
The ephemeral token value. Use as a Bearer token in the WebSocket Authorization header, or in the sec-websocket-protocol header with prefix xai-client-secret..
expires_at
integer
Unix timestamp (seconds) when this client secret expires.
Realtime
Loading WebSocket spec…
Text to speech
/v1/tts
Convert text into speech audio.
Request Body
text
string
required
The text to convert to speech. Maximum 15,000 characters. Supports inline speech tags for expressive output: [pause], [long-pause], [hum-tune], [laugh], [chuckle], [giggle], [cry], [tsk], [tongue-click], [lip-smack], [breath], [inhale], [exhale], [sigh]. Also supports wrapping tags for style control: <soft>, <whisper>, <loud>, <build-intensity>, <decrease-intensity>, <higher-pitch>, <lower-pitch>, <slow>, <fast>, <sing-song>, <singing>, <laugh-speak>, <emphasis>.
language
string
required
BCP-47 language code (e.g. en, zh, pt-BR) or auto for automatic language detection. Case-insensitive. Supported values: auto, en, ar-EG, ar-SA, ar-AE, bn, zh, fr, de, hi, id, it, ja, ko, pt-BR, pt-PT, ru, es-MX, es-ES, tr, vi. Additional languages may work with varying accuracy.
Streaming text to speech
Loading WebSocket spec…
List voices
/v1/tts/voices
List all available TTS voices.
Response Body
voices
array
List of available voices.
Get voice
/v1/tts/voices/{voice_id}
Get details for a specific voice.
Path parameters
voice_id
string
required
The unique identifier of the voice (e.g. `eve`, `ara`). Case-insensitive.
Response Body
voice_id
string
Unique identifier for the voice (lowercase). Pass this value as voice_id in TTS requests or as the voice parameter in Realtime API session configuration.
name
string
Human-readable display name for the voice.
Did you find this page helpful?