Models
Voice Agent API
The Voice Agent API enables real-time voice conversations over WebSocket, billed by minute of audio plus a flat fee per text input message. Supports function calling with web search, X search, collections, MCP, and custom functions.
How to increase my rate limits?
At a glance
| Details | |
|---|---|
| Modalities | Text, Audio → Text, Audio |
| Audio pricing | $0.05 / min ($3.00 / hr) |
| Text Input pricing | $0.004 / message |
| Region | us-east-1 |
Pricing
The Voice Agent API charges based on audio duration and text events sent without audio.
| Details | |
|---|---|
| Audio | $0.05 / min of audio sent or received ($3.00 / hr) |
| Text Input | $0.004 per conversation.item.create event |
What counts as a text input message
Every conversation.item.create event you send from the client is billed at $0.004, with two exceptions:
function_call_outputitems (server-requested tool results) are not billed.- Items whose content is
input_audiooraudioare billed by the audio meter instead.
response.create is not a billable event. It only asks the model to produce the next turn; any audio the model generates in that turn is billed under the audio meter above.
Rate Limits
| Details | |
|---|---|
| Concurrent sessions | 100 per team |
| Max session duration | 120 minutes |
Capabilities
- Function calling
- Web search
- X search
- Collections search
- Remote MCP tools
Availability
| Details | |
|---|---|
| Cluster | us-east-1 |
Documentation
- Voice Agent Guide — Getting started with real-time voice conversations
- API Reference — WebSocket endpoint reference
- Pricing — Full pricing overview
Last updated: May 25, 2026