Key Information
Rate Limits
Every xAI API team has per-model rate limits on two dimensions: requests per minute (RPM) and tokens per minute (TPM). These limits scale with your team's tier, which is determined by cumulative spend on the API.
You can view your team's current tier and per-model limits on the Rate Limits page in the xAI Console.
Rate limit tiers
Your tier is based on cumulative spend on the xAI API since January 1, 2026. Tiers unlock automatically as your spend increases.
| Tier | Spend threshold |
|---|---|
| Tier 0 | $0 (default) |
| Tier 1 | $50 |
| Tier 2 | $250 |
| Tier 3 | $1,000 |
| Tier 4 | $5,000 |
| Enterprise | Available on request |
Qualification is based on total revenue received through prepaid credit purchases or successfully fulfilled invoices. Once you qualify for a tier, you stay there permanently; tiers never downgrade.
Rate limit tiers apply to text and embedding models. For increases to Voice and Imagine API limits, contact sales@x.ai.
Per-model limits
Each tier sets hard RPM and TPM caps per model. Limits scale exponentially with tier. Exceeding either limit returns a 429 Too Many Requests error.
The table below shows limits at each tier for every model. You can also view your team's personalized limits on the Rate Limits page in the xAI Console.
What counts toward TPM
All tokens consumed by a request count toward the TPM limit for that model:
- Prompt tokens (text, image, and audio)
- Completion tokens
- Reasoning tokens (on reasoning models)
- Cached prompt tokens (still count toward TPM, though they are billed at a reduced rate)
For details on how tokens are counted and priced, see Models and Pricing. For per-request cost tracking, see Cost Tracking.
Handling rate limit errors
When you exceed your rate limit, the API returns HTTP 429. Implement exponential backoff to handle this gracefully:
import os
import time
from xai_sdk import Client
from xai_sdk.chat import user
from xai_sdk.exceptions import RateLimitError
client = Client(api_key=os.getenv("XAI_API_KEY"))
def request_with_backoff(prompt, max_retries=5):
chat = client.chat.create(model="grok-4.3")
chat.append(user(prompt))
for attempt in range(max_retries):
try:
return chat.sample()
except RateLimitError:
wait = 2 ** attempt
time.sleep(wait)
raise RateLimitError("Max retries exceeded")
Increasing your limits
- Spend more. Tiers upgrade automatically based on cumulative spend. No action required on your part.
- Request an increase. Submit a request through the xAI Console if you need higher limits without additional spend, or limits beyond Tier 4.
- Contact sales. For enterprise-grade capacity or Provisioned Throughput, email sales@x.ai.
Last updated: May 11, 2026