Model Capabilities
Voice Agent Builder
The Voice Agent API lets you build real-time voice applications over WebSocket. Every session requires configuration—instructions, tools, voice settings—sent via session.update. The Agents API adds a persistence layer: define your agent once, then reference it by ID in any session or phone call.
Overview
The workflow has three steps:
- Create an agent — store its name, instructions, tools, voice, and knowledge base.
- Assign a phone number — purchase a number and link it to the agent.
- Use the agent — connect to the Voice Agent API with the stored configuration, or have the agent call a phone number.
Step 1: Create an agent
import os, requests
resp = requests.post(
"https://api.x.ai/v1/agents",
headers={"Authorization": f"Bearer {os.environ['XAI_API_KEY']}"},
json={
"name": "Order Support",
"instructions": "You are a friendly order support agent for Acme Corp.",
"tools": [
{
"type": "function",
"function": {
"name": "lookup_order",
"description": "Look up the status of a customer order",
"parameters": {
"type": "object",
"properties": {"order_id": {"type": "string"}},
"required": ["order_id"],
},
},
},
{"type": "web_search"},
],
"voice": {"voice_id": "eve", "vad_threshold": 0.5, "vad_silence_duration_ms": 300},
},
)
agent = resp.json()["agent"]
print(f"Created agent: {agent['agent_id']}")
The response includes the agent_id (e.g., agent_abc123def456) which you'll use in subsequent calls.
Configuring voice
The voice object controls how the agent sounds and when it detects user speech:
| Field | Description |
|---|---|
voice_id | Which voice to use. Options: eve, ara, rex, sal, leo. |
vad_threshold | How sensitive turn detection is. Lower values (e.g., 0.3) pick up quieter speech; higher values (e.g., 0.8) require louder input. Range: 0.0–1.0. |
vad_silence_duration_ms | How long the agent waits after silence before responding. Lower values (e.g., 200) make the agent more responsive; higher values (e.g., 500) give the caller more time to pause mid-thought. Range: 0–10,000. |
Adding tools
Agents support the same tool types as the Voice Agent API:
function— custom functions your application executes. The agent generates the arguments; you return the result.web_search— search the web for current information.x_search— search posts on X.file_search— search documents in knowledge base collections (setcollection_idson the agent).mcp— connect to an MCP server for external tool access.
Step 2: Purchase and assign a phone number
Purchase a number in a US area code, then assign it to your agent:
import os, requests
headers = {"Authorization": f"Bearer {os.environ['XAI_API_KEY']}"}
# Purchase a number
phone = requests.post(
"https://api.x.ai/v1/phone-numbers",
headers=headers,
json={"area_code": "415", "name": "Support Line"},
).json()["phone_number"]
print(f"Purchased: {phone['phone_number']}")
# Assign to agent
requests.patch(
f"https://api.x.ai/v1/agents/{agent['agent_id']}",
headers=headers,
json={
"agent": {},
"field_mask": {"paths": ["name"]},
"phone_number_id": phone["phone_number_id"],
},
)
print(f"Assigned {phone['phone_number']} to {agent['agent_id']}")
Once assigned, the agent uses this number as its caller ID for outbound calls. To reassign a number, update the agent with a different phone_number_id. To unassign, pass "".
Step 3: Use the agent
In a realtime session
Pass the agent's stored configuration to the Voice Agent API via session.update:
Python
import asyncio, json, os, requests, websockets
# Fetch agent config
headers = {"Authorization": f"Bearer {os.environ['XAI_API_KEY']}"}
agent = requests.get(
"https://api.x.ai/v1/agents/agent_abc123def456",
headers=headers,
).json()["agent"]
async def start_session():
async with websockets.connect(
"wss://api.x.ai/v1/realtime",
additional_headers=headers,
) as ws:
# Configure session from agent
await ws.send(json.dumps({
"type": "session.update",
"session": {
"voice": agent["voice"]["voice_id"],
"instructions": agent["instructions"],
"tools": agent["tools"],
"turn_detection": {
"type": "server_vad",
"threshold": agent["voice"].get("vad_threshold", 0.5),
"silence_duration_ms": agent["voice"].get("vad_silence_duration_ms", 300),
},
},
}))
# Now stream audio...
print("Session configured. Ready to stream audio.")
asyncio.run(start_session())
This pattern separates configuration (Agents API) from session lifecycle (Voice Agent API). Update the agent once; every new session picks up the latest config.
Outbound phone call
Use the console or the API to place a call. The agent dials the target number using its assigned phone number as caller ID, with its stored instructions, voice, and tools.
Using the console
Everything above can also be done through the xAI Console at console.x.ai without writing code.
Creating an agent
- Navigate to Voice → Voice Agents in the console sidebar.
- Click Create Agent.
- Choose a template (Healthcare, Restaurant, Customer Support, Real Estate, Appointment Booking, Concierge) or select Create Custom to start from scratch.
- Enter a name and instructions, then click Create.
The console opens the agent detail view with four tabs:
| Tab | What it does |
|---|---|
| Configuration | Edit instructions, select a voice (Eve, Ara, Rex, Sal, Leo), tune VAD threshold and silence duration, and assign a phone number. |
| Tools | Add function tools with JSON Schema parameters, or enable built-in tools (web search, X search). |
| Knowledge Base | Upload documents or create text entries for the agent to search via file_search. |
| Testing | Talk to the agent in your browser—type a message or click Talk for a live voice session. The testing playground loads the agent's current configuration automatically. |
Assigning a phone number
- Go to the Configuration tab of your agent.
- Under Telephony, select a phone number from the dropdown (or click Manage Phone Numbers to purchase one first).
- Save changes.
To purchase a new number, navigate to Voice → Voice Agents → Manage Phone Numbers, then click Add Phone Number and choose a US area code.
Testing in the browser
The Testing tab provides a full voice playground:
- Click Talk to start a conversation using your microphone with server-side VAD (the agent detects when you stop speaking).
- Type a message in the text box to send text input instead.
- Click Call Me to have the agent call your phone—useful for testing the telephony experience end to end.
- Click Clear Chat to reset the conversation.
The playground uses the same Voice Agent API WebSocket (wss://api.x.ai/v1/realtime) as your production integration, so what you hear in testing is what your users will experience.
Testing with the CLI
The agent-mgmt-cli (agents) is a command-line tool for managing and testing agents without the console UI. It is particularly useful for quick iteration, scripting, and CI workflows.
Setup
Bash
export XAI_API_KEY=xai-...
cargo install --path prod/mc/agent-mgmt-cli
Agent CRUD
Bash
# Create from flags
agents create --name "My Agent" --instructions "You are helpful"
# Create from a JSON file
agents create agent.json
# Create from piped stdin
echo '{"name": "Bot", "instructions": "Be helpful"}' | agents create
# List all agents
agents list
# Get a specific agent
agents get agent_abc123
# Update an agent
agents update agent_abc123 --name "New Name"
agents update agent_abc123 --instructions "Updated prompt"
agents update agent_abc123 --clear-instructions
# Update from a JSON file (must contain agent.agentId)
agents update update.json
# Delete
agents delete agent_abc123
Running an agent (text)
The run command fetches the agent's config and sends a one-shot chat completion to /v1/chat/completions, using the agent's instructions as the system prompt and its tools:
Bash
# Quick test with default model
agents run agent_abc123 --message "What's the status of order ORD-42?"
# Override the model
agents run agent_abc123 --message "Hello" --model grok-3-fast
Realtime voice session (text I/O)
The voice command opens a WebSocket to wss://api.x.ai/v1/realtime, configures the session with the agent's instructions, tools, and voice, then drops you into a text-based REPL. Type a message, press Enter, and the agent responds with streamed text (audio transcripts in text form):
Bash
agents voice agent_abc123
# Connected. Type a message and press Enter. Ctrl+C to quit.
This is the fastest way to test an agent's conversational behavior from the terminal without a browser or microphone.
JSON output
Add --json before any subcommand for machine-readable output, useful for scripting:
Bash
agents --json list
agents --json run agent_abc123 --message "Hello"
agents --json get agent_abc123
Pointing at local dev
Override the base URL to test against a local or staging instance:
Bash
agents --base-url http://localhost:9978 list
Updating an agent
Use PATCH with a field_mask to modify specific fields without overwriting the rest:
Bash
# Change only the voice — everything else stays the same
curl -X PATCH https://api.x.ai/v1/agents/agent_abc123def456 \
-H "Authorization: Bearer $XAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"agent": {
"voice": {"voice_id": "leo", "vad_threshold": 0.7}
},
"field_mask": {"paths": ["voice"]}
}'
Only fields listed in field_mask.paths are modified. Valid paths: name, instructions, tools, voice, collection_ids.
To clear a field, include it in the mask but omit it from the agent object. For example, "field_mask": {"paths": ["instructions"]} with an empty agent object clears the instructions.
API reference
For full endpoint details, request/response schemas, and error codes, see the Agents API reference.
Did you find this page helpful?