Inference API

Voice

View as Markdown

Create client secret

/v1/realtime/client_secrets

Create an ephemeral client secret for authenticating browser-side Realtime API connections.

Request Body

Response Body

value

string

The ephemeral token value. Use as a Bearer token in the WebSocket Authorization header, or in the sec-websocket-protocol header with prefix xai-client-secret..

expires_at

integer

Unix timestamp (seconds) when this client secret expires.


Realtime

WSSwss://api.x.ai/v1/realtime

Real-time voice conversations with Grok models via WebSocket. The connection begins with an HTTP GET that is upgraded to WebSocket (status 101). Once connected, the client and server exchange JSON messages to configure the session, stream audio, and receive responses.

Handshake

URL

wss://api.x.ai/v1/realtime

Method

GET

Status

101 Switching Protocols

Headers

Authorization

string

required

Bearer token for authentication. Use your xAI API key (server-side only) or an ephemeral client secret from the Create client secret endpoint.

Bearer $XAI_API_KEY

Sec-WebSocket-Protocol

string

Alternative authentication for browser clients. Pass the ephemeral token with prefix xai-client-secret.. When provided, the Authorization header is not required.

xai-client-secret.<EPHEMERAL_TOKEN>

Server → Client


Text to speech

/v1/tts

Convert text into speech audio.

Request Body

text

string

required

The text to convert to speech. Maximum 15,000 characters. Supports inline speech tags for expressive output: [pause], [long-pause], [hum-tune], [laugh], [chuckle], [giggle], [cry], [tsk], [tongue-click], [lip-smack], [breath], [inhale], [exhale], [sigh]. Also supports wrapping tags for style control: <soft>, <whisper>, <loud>, <build-intensity>, <decrease-intensity>, <higher-pitch>, <lower-pitch>, <slow>, <fast>, <sing-song>, <singing>, <laugh-speak>, <emphasis>.

language

string

required

BCP-47 language code (e.g. en, zh, pt-BR) or auto for automatic language detection. Case-insensitive. Supported values: auto, en, ar-EG, ar-SA, ar-AE, bn, zh, fr, de, hi, id, it, ja, ko, pt-BR, pt-PT, ru, es-MX, es-ES, tr, vi. Additional languages may work with varying accuracy.


Streaming text to speech

WSSwss://api.x.ai/v1/tts

Bidirectional streaming text-to-speech via WebSocket. Send text incrementally and receive audio chunks in real time. Shares the /v1/tts path with the batch POST endpoint — a GET with Upgrade: websocket activates streaming mode. Configuration is done via query parameters at connection time. Supports multi-utterance: after audio.done, send another stream of text.delta messages on the same connection.

Handshake

URL

wss://api.x.ai/v1/tts

Method

GET

Status

101 Switching Protocols

Headers

Authorization

string

required

Bearer token for authentication. Use your xAI API key.

Bearer $XAI_API_KEY

Query Parameters

voice

string

optional

Voice identifier. Case-insensitive.

eveararexsalleo

language

string

required

BCP-47 language code (e.g. `en`, `zh`, `pt-BR`) or `auto` for automatic language detection. Case-insensitive.

autoenar-EGar-SAar-AEbnzhfrdehiiditjakopt-BRpt-PTrues-MXes-EStrvi

codec

string

optional

Audio codec for the output.

mp3wavpcmmulawalaw

sample_rate

integer

optional

Sample rate in Hz.

80001600022050240004410048000

bit_rate

integer

optional

Bit rate in bps. Only applies when `codec` is `mp3`.

320006400096000128000192000

Handshake

URLwss://api.x.ai/v1/tts
MethodGET
Status101 Switching Protocols

List voices

/v1/tts/voices

List all available TTS voices.

Response Body

voices

array

List of available voices.


Get voice

/v1/tts/voices/{voice_id}

Get details for a specific voice.

Path parameters

voice_id

string

required

The unique identifier of the voice (e.g. `eve`, `ara`). Case-insensitive.

Response Body

voice_id

string

Unique identifier for the voice (lowercase). Pass this value as voice_id in TTS requests or as the voice parameter in Realtime API session configuration.

name

string

Human-readable display name for the voice.


Did you find this page helpful?

Last updated: April 8, 2026