Model Capabilities

Voice Agent Builder

View as Markdown

The Voice Agent API lets you build real-time voice applications over WebSocket. Every session requires configuration—instructions, tools, voice settings—sent via session.update. The Agents API adds a persistence layer: define your agent once, then reference it by ID in any session or phone call.

Overview

The workflow has three steps:

  1. Create an agent — store its name, instructions, tools, voice, and knowledge base.
  2. Assign a phone number — purchase a number and link it to the agent.
  3. Use the agent — connect to the Voice Agent API with the stored configuration, or have the agent call a phone number.

Step 1: Create an agent

import os, requests

resp = requests.post(
    "https://api.x.ai/v1/agents",
    headers={"Authorization": f"Bearer {os.environ['XAI_API_KEY']}"},
    json={
        "name": "Order Support",
        "instructions": "You are a friendly order support agent for Acme Corp.",
        "tools": [
            {
                "type": "function",
                "function": {
                    "name": "lookup_order",
                    "description": "Look up the status of a customer order",
                    "parameters": {
                        "type": "object",
                        "properties": {"order_id": {"type": "string"}},
                        "required": ["order_id"],
                    },
                },
            },
            {"type": "web_search"},
        ],
        "voice": {"voice_id": "eve", "vad_threshold": 0.5, "vad_silence_duration_ms": 300},
    },
)
agent = resp.json()["agent"]
print(f"Created agent: {agent['agent_id']}")

The response includes the agent_id (e.g., agent_abc123def456) which you'll use in subsequent calls.

Configuring voice

The voice object controls how the agent sounds and when it detects user speech:

FieldDescription
voice_idWhich voice to use. Options: eve, ara, rex, sal, leo.
vad_thresholdHow sensitive turn detection is. Lower values (e.g., 0.3) pick up quieter speech; higher values (e.g., 0.8) require louder input. Range: 0.0–1.0.
vad_silence_duration_msHow long the agent waits after silence before responding. Lower values (e.g., 200) make the agent more responsive; higher values (e.g., 500) give the caller more time to pause mid-thought. Range: 0–10,000.

Adding tools

Agents support the same tool types as the Voice Agent API:

  • function — custom functions your application executes. The agent generates the arguments; you return the result.
  • web_search — search the web for current information.
  • x_search — search posts on X.
  • file_search — search documents in knowledge base collections (set collection_ids on the agent).
  • mcp — connect to an MCP server for external tool access.

Step 2: Purchase and assign a phone number

Purchase a number in a US area code, then assign it to your agent:

import os, requests

headers = {"Authorization": f"Bearer {os.environ['XAI_API_KEY']}"}

# Purchase a number
phone = requests.post(
    "https://api.x.ai/v1/phone-numbers",
    headers=headers,
    json={"area_code": "415", "name": "Support Line"},
).json()["phone_number"]
print(f"Purchased: {phone['phone_number']}")

# Assign to agent
requests.patch(
    f"https://api.x.ai/v1/agents/{agent['agent_id']}",
    headers=headers,
    json={
        "agent": {},
        "field_mask": {"paths": ["name"]},
        "phone_number_id": phone["phone_number_id"],
    },
)
print(f"Assigned {phone['phone_number']} to {agent['agent_id']}")

Once assigned, the agent uses this number as its caller ID for outbound calls. To reassign a number, update the agent with a different phone_number_id. To unassign, pass "".

Step 3: Use the agent

In a realtime session

Pass the agent's stored configuration to the Voice Agent API via session.update:

Python

import asyncio, json, os, requests, websockets

# Fetch agent config
headers = {"Authorization": f"Bearer {os.environ['XAI_API_KEY']}"}
agent = requests.get(
    "https://api.x.ai/v1/agents/agent_abc123def456",
    headers=headers,
).json()["agent"]

async def start_session():
    async with websockets.connect(
        "wss://api.x.ai/v1/realtime",
        additional_headers=headers,
    ) as ws:
        # Configure session from agent
        await ws.send(json.dumps({
            "type": "session.update",
            "session": {
                "voice": agent["voice"]["voice_id"],
                "instructions": agent["instructions"],
                "tools": agent["tools"],
                "turn_detection": {
                    "type": "server_vad",
                    "threshold": agent["voice"].get("vad_threshold", 0.5),
                    "silence_duration_ms": agent["voice"].get("vad_silence_duration_ms", 300),
                },
            },
        }))

        # Now stream audio...
        print("Session configured. Ready to stream audio.")

asyncio.run(start_session())

This pattern separates configuration (Agents API) from session lifecycle (Voice Agent API). Update the agent once; every new session picks up the latest config.

Outbound phone call

Use the console or the API to place a call. The agent dials the target number using its assigned phone number as caller ID, with its stored instructions, voice, and tools.

Using the console

Everything above can also be done through the xAI Console at console.x.ai without writing code.

Creating an agent

  1. Navigate to VoiceVoice Agents in the console sidebar.
  2. Click Create Agent.
  3. Choose a template (Healthcare, Restaurant, Customer Support, Real Estate, Appointment Booking, Concierge) or select Create Custom to start from scratch.
  4. Enter a name and instructions, then click Create.

The console opens the agent detail view with four tabs:

TabWhat it does
ConfigurationEdit instructions, select a voice (Eve, Ara, Rex, Sal, Leo), tune VAD threshold and silence duration, and assign a phone number.
ToolsAdd function tools with JSON Schema parameters, or enable built-in tools (web search, X search).
Knowledge BaseUpload documents or create text entries for the agent to search via file_search.
TestingTalk to the agent in your browser—type a message or click Talk for a live voice session. The testing playground loads the agent's current configuration automatically.

Assigning a phone number

  1. Go to the Configuration tab of your agent.
  2. Under Telephony, select a phone number from the dropdown (or click Manage Phone Numbers to purchase one first).
  3. Save changes.

To purchase a new number, navigate to VoiceVoice AgentsManage Phone Numbers, then click Add Phone Number and choose a US area code.

Testing in the browser

The Testing tab provides a full voice playground:

  • Click Talk to start a conversation using your microphone with server-side VAD (the agent detects when you stop speaking).
  • Type a message in the text box to send text input instead.
  • Click Call Me to have the agent call your phone—useful for testing the telephony experience end to end.
  • Click Clear Chat to reset the conversation.

The playground uses the same Voice Agent API WebSocket (wss://api.x.ai/v1/realtime) as your production integration, so what you hear in testing is what your users will experience.

Testing with the CLI

The agent-mgmt-cli (agents) is a command-line tool for managing and testing agents without the console UI. It is particularly useful for quick iteration, scripting, and CI workflows.

Setup

Bash

export XAI_API_KEY=xai-...
cargo install --path prod/mc/agent-mgmt-cli

Agent CRUD

Bash

# Create from flags
agents create --name "My Agent" --instructions "You are helpful"

# Create from a JSON file
agents create agent.json

# Create from piped stdin
echo '{"name": "Bot", "instructions": "Be helpful"}' | agents create

# List all agents
agents list

# Get a specific agent
agents get agent_abc123

# Update an agent
agents update agent_abc123 --name "New Name"
agents update agent_abc123 --instructions "Updated prompt"
agents update agent_abc123 --clear-instructions

# Update from a JSON file (must contain agent.agentId)
agents update update.json

# Delete
agents delete agent_abc123

Running an agent (text)

The run command fetches the agent's config and sends a one-shot chat completion to /v1/chat/completions, using the agent's instructions as the system prompt and its tools:

Bash

# Quick test with default model
agents run agent_abc123 --message "What's the status of order ORD-42?"

# Override the model
agents run agent_abc123 --message "Hello" --model grok-3-fast

Realtime voice session (text I/O)

The voice command opens a WebSocket to wss://api.x.ai/v1/realtime, configures the session with the agent's instructions, tools, and voice, then drops you into a text-based REPL. Type a message, press Enter, and the agent responds with streamed text (audio transcripts in text form):

Bash

agents voice agent_abc123
# Connected. Type a message and press Enter. Ctrl+C to quit.

This is the fastest way to test an agent's conversational behavior from the terminal without a browser or microphone.

JSON output

Add --json before any subcommand for machine-readable output, useful for scripting:

Bash

agents --json list
agents --json run agent_abc123 --message "Hello"
agents --json get agent_abc123

Pointing at local dev

Override the base URL to test against a local or staging instance:

Bash

agents --base-url http://localhost:9978 list

Updating an agent

Use PATCH with a field_mask to modify specific fields without overwriting the rest:

Bash

# Change only the voice — everything else stays the same
curl -X PATCH https://api.x.ai/v1/agents/agent_abc123def456 \
  -H "Authorization: Bearer $XAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "agent": {
      "voice": {"voice_id": "leo", "vad_threshold": 0.7}
    },
    "field_mask": {"paths": ["voice"]}
  }'

Only fields listed in field_mask.paths are modified. Valid paths: name, instructions, tools, voice, collection_ids.

To clear a field, include it in the mask but omit it from the agent object. For example, "field_mask": {"paths": ["instructions"]} with an empty agent object clears the instructions.

API reference

For full endpoint details, request/response schemas, and error codes, see the Agents API reference.


Did you find this page helpful?