Models

Speech to Text

View as Markdown

Create API key Try Playground Meet grok-4.5

The Speech to Text API transcribes audio into text. Use the REST endpoint for file-based batch transcription, or the streaming endpoint for real-time low-latency transcription.

How to increase my rate limits?

At a glance

	Details
Modalities	Audio → Text
REST pricing	$0.10 / hr
Streaming pricing	$0.20 / hr
Region	us-east-1

Pricing

	Details
REST (per hour)	$0.10 / hr
Streaming (per hour)	$0.20 / hr

Rate Limits

	REST	Streaming
RPS (Requests per second)	10	10
Concurrent sessions	—	100 per team

Capabilities

REST and streaming transcription
Multiple audio formats (WAV, MP3, WebM, OGG, M4A)
Multiple languages
Real-time interim results (streaming)
Keyterm prompting for domain-specific vocabulary
Smart Turn end-of-turn detection (streaming) — ML-based prediction of whether the speaker has finished their thought

Availability

	Details
Cluster	us-east-1

Documentation

Speech to Text Guide — Getting started with speech to text
Voice Overview — Overview of all voice capabilities
Pricing — Full pricing overview

Last updated: July 23, 2026