Inference API
Voice
Create client secret
/v1/realtime/client_secrets
Create an ephemeral client secret for authenticating browser-side Realtime API connections.
Request Body
Response Body
value
string
The ephemeral token value. Use as a Bearer token in the WebSocket Authorization header, or in the sec-websocket-protocol header with prefix xai-client-secret..
expires_at
integer
Unix timestamp (seconds) when this client secret expires.
Realtime
wss://api.x.ai/v1/realtimeReal-time voice conversations with Grok models via WebSocket. The connection begins with an HTTP GET that is upgraded to WebSocket (status 101). Once connected, the client and server exchange JSON messages to configure the session, stream audio, and receive responses.
Handshake
URL
wss://api.x.ai/v1/realtime
Method
GET
Status
101 Switching Protocols
Headers
Authorization
string
required
Bearer token for authentication. Use your xAI API key (server-side only) or an ephemeral client secret from the Create client secret endpoint.
Bearer $XAI_API_KEYSec-WebSocket-Protocol
string
Alternative authentication for browser clients. Pass the ephemeral token with prefix xai-client-secret.. When provided, the Authorization header is not required.
xai-client-secret.<EPHEMERAL_TOKEN>Handshake
wss://api.x.ai/v1/realtimeExample Message Flow
Client → Server
Server → Client
Text to speech
/v1/tts
Convert text into speech audio.
Request Body
text
string
required
The text to convert to speech. Maximum 15,000 characters. Supports inline speech tags for expressive output: [pause], [long-pause], [hum-tune], [laugh], [chuckle], [giggle], [cry], [tsk], [tongue-click], [lip-smack], [breath], [inhale], [exhale], [sigh]. Also supports wrapping tags for style control: <soft>, <whisper>, <loud>, <build-intensity>, <decrease-intensity>, <higher-pitch>, <lower-pitch>, <slow>, <fast>, <sing-song>, <singing>, <laugh-speak>, <emphasis>.
language
string
required
BCP-47 language code (e.g. en, zh, pt-BR) or auto for automatic language detection. Case-insensitive. Supported values: auto, en, ar-EG, ar-SA, ar-AE, bn, zh, fr, de, hi, id, it, ja, ko, pt-BR, pt-PT, ru, es-MX, es-ES, tr, vi. Additional languages may work with varying accuracy.
Streaming text to speech
wss://api.x.ai/v1/ttsBidirectional streaming text-to-speech via WebSocket. Send text incrementally and receive audio chunks in real time. Shares the /v1/tts path with the batch POST endpoint — a GET with Upgrade: websocket activates streaming mode. Configuration is done via query parameters at connection time. Supports multi-utterance: after audio.done, send another stream of text.delta messages on the same connection.
Handshake
URL
wss://api.x.ai/v1/tts
Method
GET
Status
101 Switching Protocols
Headers
Authorization
string
required
Bearer token for authentication. Use your xAI API key.
Bearer $XAI_API_KEYQuery Parameters
voice
string
optional
Voice identifier. Case-insensitive.
eveararexsalleolanguage
string
required
BCP-47 language code (e.g. `en`, `zh`, `pt-BR`) or `auto` for automatic language detection. Case-insensitive.
autoenar-EGar-SAar-AEbnzhfrdehiiditjakopt-BRpt-PTrues-MXes-EStrvicodec
string
optional
Audio codec for the output.
mp3wavpcmmulawalawsample_rate
integer
optional
Sample rate in Hz.
80001600022050240004410048000bit_rate
integer
optional
Bit rate in bps. Only applies when `codec` is `mp3`.
320006400096000128000192000Handshake
wss://api.x.ai/v1/ttsExample Message Flow
Client → Server
Server → Client
List voices
/v1/tts/voices
List all available TTS voices.
Response Body
voices
array
List of available voices.
Get voice
/v1/tts/voices/{voice_id}
Get details for a specific voice.
Path parameters
voice_id
string
required
The unique identifier of the voice (e.g. `eve`, `ara`). Case-insensitive.
Response Body
voice_id
string
Unique identifier for the voice (lowercase). Pass this value as voice_id in TTS requests or as the voice parameter in Realtime API session configuration.
name
string
Human-readable display name for the voice.
Did you find this page helpful?
Last updated: April 8, 2026