Model Capabilities

Custom Voices

View as Markdown

Clone a voice from a short reference clip and use it anywhere a built-in voice works. Upload an audio sample and immediately start using it in our TTS and Voice Agent APIs.

Custom Voices is currently only available in the United States, with the exception of Illinois.

How to Use Custom Voices

After creating a voice in the console, click the three-dot menu on the voice card and select Copy Voice ID. If you created a custom voice via the API (Enterprise only), the voice_id is returned in the response.

Custom voices are interchangeable with built-in voices across all voice APIs. Pass your voice_id to any of:

  • POST /v1/tts
  • wss://api.x.ai/v1/tts
  • wss://api.x.ai/v1/realtime

Built-in voices remain available through GET /v1/tts/voices. Custom voices are returned by GET /v1/custom-voices only — they will not appear in the built-in voice list. Your custom voices are scoped to your team and are never available to other users.


Recording Your Reference Audio

Create a custom voice by cloning a reference clip up to 120 seconds long. For best results:

  • Record in a quiet setting, ideally with a high-quality microphone.
  • Read naturally. If it sounds like you're reading a script, the resulting voice will match this behavior.
  • Longer is better. Clips under 30 seconds may lack detail. Aim for 90–120 seconds for the best results.
  • Speak expressively. The resulting voice will match the expressiveness of your recording.

What to record

The model picks up not just the timbre but the delivery patterns of the reference clip. For best results, match the recording to the content you intend to generate:

  • Customer support — Record realistic support exchanges including greetings, holds, troubleshooting steps, and sign-offs.
  • Audiobook narration — Read a few paragraphs of prose with the pacing and inflection intended for the final output.
  • Conversational assistant — Record natural, unscripted speech such as explaining a topic to a friend.
  • News or documentary — Read a short article in a natural broadcast voice.

A recording that reflects your intended use case will produce better results than a polished but unrelated sample.

Recording setup

  • Microphone. A studio condenser or quality USB microphone is recommended. Phone earbuds are usable but introduce noticeable noise.
  • Pop filter. Recommended. Plosive sounds (p, b) are reproduced as audible thumps without one.
  • Room treatment. Record in a small, soft-furnished room. Hard-walled rooms produce echo and reverb that will be reproduced in the resulting voice.
  • Single speaker. The recording should contain only one voice with no background music or sound effects.
  • Background noise. Silence the room. Turn off HVAC, fans, and notifications. Background noise will be cloned along with the voice.

Create a Custom Voice

Get started in the console — create up to 30 custom voices for free and use them immediately across all voice APIs.

API Quick Start

The POST /v1/custom-voices endpoint is gated to teams on an Enterprise plan. Contact our team to enable API access.

Create a custom voice from a reference audio file, then synthesize speech with it:

import os
import requests

# 1. Create the voice.
with open("reference.wav", "rb") as f:
    create = requests.post(
        "https://api.x.ai/v1/custom-voices",
        headers={"Authorization": f"Bearer {os.environ['XAI_API_KEY']}"},
        files={"file": ("reference.wav", f, "audio/wav")},
        data={
            "name": "Friendly Narrator",
            "language": "en",
            "gender": "female",
            "tone": "warm",
            "use_case": "narration",
        },
    )
create.raise_for_status()
voice_id = create.json()["voice_id"]

# 2. Synthesize speech with it.
speech = requests.post(
    "https://api.x.ai/v1/tts",
    headers={
        "Authorization": f"Bearer {os.environ['XAI_API_KEY']}",
        "Content-Type": "application/json",
    },
    json={
        "text": "Hello! This audio was synthesized using my custom voice.",
        "voice_id": voice_id,
        "language": "en",
    },
)
speech.raise_for_status()
with open("hello.mp3", "wb") as f:
    f.write(speech.content)

Endpoints

All endpoints sit under https://api.x.ai/v1/custom-voices and authenticate with a Bearer API key.

Create a custom voice

POST /v1/custom-voices with multipart/form-data. Only file is required.

FieldTypeRequiredDescription
filebinaryyesReference audio. Max 120 s.
namestringDisplay name.
descriptionstringFree-text description.
genderstringmale, female, or neutral.
accentstringFree text (e.g. British, American).
agestringyoung, middle-aged, or old.
languagestringISO 639 (en) or BCP-47-style (en-US, zh-CN). Region must be uppercase.
use_casestringconversational, narration, characters, educational, advertisement, social_media, entertainment.
tonestringwarm, casual, professional, friendly, authoritative, expressive, calm.

The following formats and settings are recommended for the uploaded reference file:

SettingRecommendation
Codec.wav (uncompressed PCM) is recommended. MP3, FLAC, OGG, Opus, M4A, AAC, MKV, and MP4 are also accepted, but lossy formats may introduce compression artifacts that are reproduced in the resulting voice.
Sample rate24 kHz recommended. Higher rates (44.1 kHz, 48 kHz) are downsampled server-side. Lower rates result in reduced fidelity.
Bit depth16-bit PCM is sufficient. 24-bit is also supported.
ChannelsMono recommended. Stereo files are downmixed automatically, but recording in mono avoids potential phase artifacts.

Length

  • No minimum, 120s maximum. Clips of any length up to 120 seconds are accepted; longer clips are rejected with 400.
  • 90+ seconds recommended. Longer clips capture more prosody and intonation variety, producing a more natural and expressive voice.

A successful create returns 201 with the new voice object:

JSON

{
  "voice_id": "nlbqfwie",
  "name": "Friendly Narrator",
  "description": "Warm, conversational tone for narration.",
  "gender": "female",
  "accent": "American",
  "age": "young",
  "language": "en",
  "use_case": "narration",
  "tone": "warm",
  "created_at": "2026-04-26T18:56:34.872993+00:00"
}

voice_id is an 8-character lowercase alphanumeric identifier.

List custom voices

GET /v1/custom-voices returns all voices owned by your team, paginated.

Query parameterDefaultDescription
limit100Page size, 1-1000.
pagination_tokenToken from the previous response. Omit on the first page.
import os
import requests

response = requests.get(
    "https://api.x.ai/v1/custom-voices",
    headers={"Authorization": f"Bearer {os.environ['XAI_API_KEY']}"},
    params={"limit": 50},
)
for voice in response.json()["voices"]:
    print(f"{voice['voice_id']:10s}  {voice.get('name')}")

Response:

JSON

{
  "voices": [
    {
      "voice_id": "nlbqfwie",
      "name": "Friendly Narrator",
      "description": "Warm, conversational tone for narration.",
      "gender": "female",
      "accent": "American",
      "age": "young",
      "language": "en",
      "use_case": "narration",
      "tone": "warm",
      "created_at": "2026-04-26T18:56:34.872993+00:00"
    }
  ],
  "pagination_token": null
}

Get a custom voice

GET /v1/custom-voices/{voice_id} returns the metadata for a single voice. Returns 404 for unknown ids or for voices owned by another team.

Response body matches the voice object format shown in Create.

Update metadata

PATCH /v1/custom-voices/{voice_id} with a JSON body. All fields are optional and follow these rules:

  • Field omitted — no change.
  • Field set to null — clears the value.
  • Field set to a non-empty string — updates the value.
  • Field set to "" — rejected with 400.

This endpoint never changes the underlying audio. To re-record, delete the voice and create a new one.

import os
import requests

response = requests.patch(
    "https://api.x.ai/v1/custom-voices/nlbqfwie",
    headers={
        "Authorization": f"Bearer {os.environ['XAI_API_KEY']}",
        "Content-Type": "application/json",
    },
    json={"description": "Updated after a tuning pass.", "tone": "calm"},
)
print(response.json())

Returns the full updated voice object:

JSON

{
  "voice_id": "nlbqfwie",
  "name": "Friendly Narrator",
  "description": "Updated after a tuning pass.",
  "gender": "female",
  "accent": "American",
  "age": "young",
  "language": "en",
  "use_case": "narration",
  "tone": "calm",
  "created_at": "2026-04-26T18:56:34.872993+00:00"
}

Download the reference audio

GET /v1/custom-voices/{voice_id}/audio streams the original reference file with the appropriate Content-Type header (e.g. audio/wav, audio/mpeg).

Delete a custom voice

DELETE /v1/custom-voices/{voice_id} removes the voice and its underlying audio.

import os
import requests

requests.delete(
    "https://api.x.ai/v1/custom-voices/nlbqfwie",
    headers={"Authorization": f"Bearer {os.environ['XAI_API_KEY']}"},
)

The response is {"deleted": true}. After deletion, subsequent requests for the same voice_id return 404 and any TTS / Voice Agent calls referencing it will fail with an unknown-voice error.


Using a Custom Voice

Once created, a custom voice_id works wherever a built-in voice_id works.

REST TTS

Bash

curl -X POST https://api.x.ai/v1/tts \
  -H "Authorization: Bearer $XAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Welcome back. How can I help today?",
    "voice_id": "nlbqfwie",
    "language": "en"
  }' \
  --output welcome.mp3

Streaming TTS WebSocket

Pass the custom voice as the voice query parameter when opening the connection. See Text to Speech - Streaming for the full event protocol.

Python

import asyncio
import base64
import json
import os
import websockets

async def stream_with_custom_voice(voice_id: str):
    uri = f"wss://api.x.ai/v1/tts?language=en&voice={voice_id}&codec=mp3"
    async with websockets.connect(
        uri,
        additional_headers={"Authorization": f"Bearer {os.environ['XAI_API_KEY']}"},
    ) as ws:
        await ws.send(json.dumps({"type": "text.delta", "delta": "Streaming with my custom voice."}))
        await ws.send(json.dumps({"type": "text.done"}))
        audio = bytearray()
        async for msg in ws:
            event = json.loads(msg)
            if event["type"] == "audio.delta":
                audio.extend(base64.b64decode(event["delta"]))
            elif event["type"] == "audio.done":
                break
        with open("stream.mp3", "wb") as f:
            f.write(audio)

asyncio.run(stream_with_custom_voice("nlbqfwie"))

Voice Agent API

Set voice in the session.update message. See the Voice Agent API docs for the full session lifecycle.

Python

import asyncio
import json
import os
import websockets

async def realtime_with_custom_voice(voice_id: str):
    async with websockets.connect(
        "wss://api.x.ai/v1/realtime",
        additional_headers={"Authorization": f"Bearer {os.environ['XAI_API_KEY']}"},
    ) as ws:
        await ws.send(json.dumps({
            "type": "session.update",
            "session": {
                "voice": voice_id,
                "instructions": "You are a helpful assistant.",
                "turn_detection": {"type": "server_vad"},
            },
        }))
        # ... continue with the standard realtime event loop ...

asyncio.run(realtime_with_custom_voice("nlbqfwie"))

Limits

Value
Reference audio max duration120 seconds
Custom voices per team30
Voice ID length8 characters, lowercase alphanumeric

Need more than 30 voices?

The default limit is 30 custom voices per team. If you need more, contact us to discuss higher limits.


Error Handling

StatusMeaningAction
201Voice createdSave voice_id and start using it.
200Successful read / update / delete-
400Bad requestCheck: audio under 120 s; label values are within the allowed enums; PATCH does not contain empty strings. Also returned when the team's 30-voice limit is reached — delete an existing voice or request more.
401UnauthorizedAPI key is missing or invalid.
403Custom voices not enabled for this team, or POST /v1/custom-voices was called without an Enterprise contractCreate voices in the console playground, or contact sales to enable the create API.
404Voice not foundThe id does not exist or is owned by another team.
500Server errorRetry with exponential backoff.

Did you find this page helpful?

Last updated: April 26, 2026