Models

Voice Agent API

The Voice Agent API enables real-time voice conversations over WebSocket, billed by minute of audio plus a flat fee per text input message. Supports function calling with web search, X search, collections, MCP, and custom functions.

How to increase my rate limits?


At a glance

Details
ModalitiesText, Audio → Text, Audio
Audio pricing$0.05 / min ($3.00 / hr)
Text Input pricing$0.004 / message
Regionus-east-1

Pricing

The Voice Agent API charges based on audio duration and text events sent without audio.

Details
Audio$0.05 / min of audio sent or received ($3.00 / hr)
Text Input$0.004 per conversation.item.create event

What counts as a text input message

Every conversation.item.create event you send from the client is billed at $0.004, with two exceptions:

  • function_call_output items (server-requested tool results) are not billed.
  • Items whose content is input_audio or audio are billed by the audio meter instead.

response.create is not a billable event. It only asks the model to produce the next turn; any audio the model generates in that turn is billed under the audio meter above.


Rate Limits

Details
Concurrent sessions100 per team
Max session duration120 minutes

Capabilities

  • Function calling
  • Web search
  • X search
  • Collections search
  • Remote MCP tools

Availability

Details
Clusterus-east-1

Documentation


Last updated: May 25, 2026