#### Models

# Voice Agent API

The Voice Agent API enables real-time voice conversations over WebSocket, billed at a flat rate per minute of audio duration. Supports function calling with web search, X search, collections, MCP, and custom functions.

## At a glance

| | Details |
|---|---|
| **Modalities** | Text, Audio → Text, Audio |
| **Pricing** | $0.05 / min ($3.00 / hr) |
| **Region** | us-east-1 |

## Pricing

| | Details |
|---|---|
| **Per minute** | $0.05 / min ($3.00 / hr) |

> [!NOTE]
>
> When using the Voice Agent API with tools such as function calling, web search, X search, collections, or MCP, you will be charged for tool invocations in addition to the per-minute voice session cost. See [Tool Invocation Costs](/developers/pricing#tool-invocation-costs) for tool pricing details.

> [!NOTE]
>
> Usage is billed by audio duration. If you send 1 hour of audio data to the API, it will be billed as 1 hour of usage, even if the WebSocket connection time is less than 1 hour.

## Rate Limits

| | Details |
|---|---:|
| **Concurrent sessions** | 100 per team |
| **Max session duration** | 120 minutes |

## Capabilities

* Function calling
* Web search
* X search
* Collections search
* Remote MCP tools

## Availability

| | Details |
|---|---|
| **Cluster** | us-east-1 |

## Documentation

* [Voice Agent Guide](/developers/model-capabilities/audio/voice-agent) — Getting started with real-time voice conversations
* [API Reference](/developers/rest-api-reference/inference/voice) — WebSocket endpoint reference
* [Pricing](/developers/pricing#voice-api-pricing) — Full pricing overview
