Inference API

Voice

View as Markdown

Create client secret

/v1/realtime/client_secrets

Create an ephemeral client secret for authenticating browser-side Realtime API connections.

Request Body

Response Body

value

string

The ephemeral token value. Use as a Bearer token in the WebSocket Authorization header, or in the sec-websocket-protocol header with prefix xai-client-secret..

expires_at

integer

Unix timestamp (seconds) when this client secret expires.


Realtime

Loading WebSocket spec…


Text to speech

/v1/tts

Convert text into speech audio.

Request Body

text

string

required

The text to convert to speech. Maximum 15,000 characters. Supports inline speech tags for expressive output: [pause], [long-pause], [hum-tune], [laugh], [chuckle], [giggle], [cry], [tsk], [tongue-click], [lip-smack], [breath], [inhale], [exhale], [sigh]. Also supports wrapping tags for style control: <soft>, <whisper>, <loud>, <build-intensity>, <decrease-intensity>, <higher-pitch>, <lower-pitch>, <slow>, <fast>, <sing-song>, <singing>, <laugh-speak>, <emphasis>.

language

string

required

BCP-47 language code (e.g. en, zh, pt-BR) or auto for automatic language detection. Case-insensitive. Supported values: auto, en, ar-EG, ar-SA, ar-AE, bn, zh, fr, de, hi, id, it, ja, ko, pt-BR, pt-PT, ru, es-MX, es-ES, tr, vi. Additional languages may work with varying accuracy.


Streaming text to speech

Loading WebSocket spec…


List voices

/v1/tts/voices

List all available TTS voices.

Response Body

voices

array

List of available voices.


Get voice

/v1/tts/voices/{voice_id}

Get details for a specific voice.

Path parameters

voice_id

string

required

The unique identifier of the voice (e.g. `eve`, `ara`). Case-insensitive.

Response Body

voice_id

string

Unique identifier for the voice (lowercase). Pass this value as voice_id in TTS requests or as the voice parameter in Realtime API session configuration.

name

string

Human-readable display name for the voice.


Did you find this page helpful?