Models and Pricing
Speech to Text
The Speech to Text API transcribes audio into text. Use the REST endpoint for file-based batch transcription, or the streaming endpoint for real-time low-latency transcription.
How to increase my rate limits?
At a glance
| Details | |
|---|---|
| Modalities | Audio → Text |
| REST pricing | $0.10 / hr |
| Streaming pricing | $0.20 / hr |
| Region | us-east-1 |
Pricing
| Details | |
|---|---|
| REST (per hour) | $0.10 / hr |
| Streaming (per hour) | $0.20 / hr |
Rate Limits
| REST | Streaming | |
|---|---|---|
| RPM (Requests per minute) | 600 | 600 |
| RPS (Requests per second) | 10 | 10 |
| Concurrent sessions | — | 100 per team |
Capabilities
- REST and streaming transcription
- Multiple audio formats (WAV, MP3, WebM, OGG, M4A)
- Multiple languages
- Real-time interim results (streaming)
Availability
| Details | |
|---|---|
| Cluster | us-east-1 |
Documentation
- Speech to Text Guide — Getting started with speech to text
- Voice APIs Guide — Overview of all voice capabilities
- Models and Pricing — Full pricing overview
Did you find this page helpful?
Last updated: April 15, 2026