Key Information

Rate Limits

Every xAI API team has per-model rate limits on two dimensions: requests per second (RPS) and tokens per minute (TPM). Your per-second limit is derived from your per-minute request budget (RPM / 60): you cannot spend a full minute's requests in a single second, which protects the API from sudden bursts. These limits scale with your team's tier, which is determined by cumulative spend on the API.

You can view your team's current tier and per-model limits on the Rate Limits page in the xAI Console.


Rate limit tiers

Your tier is based on cumulative spend on the xAI API since January 1, 2026. Tiers unlock automatically as your spend increases.

TierSpend threshold
Tier 0$0 (default)
Tier 1$50
Tier 2$250
Tier 3$1,000
Tier 4$5,000
EnterpriseAvailable on request

Qualification is based on total revenue received through prepaid credit purchases or successfully fulfilled invoices. Once you qualify for a tier, you stay there permanently; tiers never downgrade.

Rate limit tiers apply to text and embedding models. For increases to Voice and Imagine API limits, contact sales@x.ai.


Per-model limits

Each tier sets hard RPS and TPM caps per model. Limits scale exponentially with tier. Exceeding any limit returns a 429 Too Many Requests error.

The table below lists RPS and TPM limits at each tier for every model. You can also view your team's personalized limits on the Rate Limits page in the xAI Console.

ModelTier
Language Models
grok-4.20-0309-non-reasoningT0T1T2T3T430406010016610M15M25M45M85M
grok-4.20-0309-reasoningT0T1T2T3T430406010016610M15M25M45M85M
grok-4.20-multi-agent-0309T0T1T2T3T47101525452.5M3.7M6.2M11M21M
grok-4.3T0T1T2T3T430406010016610M15M25M45M85M
grok-build-0.1T0T1T2T3T430406010016610M15M25M45M85M
Image Generation
grok-imagine-imageAll tiers5
grok-imagine-image-qualityAll tiers5
Video Generation
grok-imagine-videoAll tiers1
grok-imagine-video-1.5All tiers1

What counts toward TPM

All tokens consumed by a request count toward the TPM limit for that model:

  • Prompt tokens (text, image, and audio)
  • Completion tokens
  • Reasoning tokens (on reasoning models)
  • Cached prompt tokens (still count toward TPM, though they are billed at a reduced rate)

For details on how tokens are counted and priced, see Models and Pricing. For per-request cost tracking, see Cost Tracking.


Handling rate limit errors

When you exceed your rate limit, the API returns HTTP 429. Implement exponential backoff to handle this gracefully:

import os
import time
from xai_sdk import Client
from xai_sdk.chat import user
from xai_sdk.exceptions import RateLimitError

client = Client(api_key=os.getenv("XAI_API_KEY"))

def request_with_backoff(prompt, max_retries=5):
    chat = client.chat.create(model="grok-4.3")
    chat.append(user(prompt))
    for attempt in range(max_retries):
        try:
            return chat.sample()
        except RateLimitError:
            wait = 2 ** attempt
            time.sleep(wait)
    raise RateLimitError("Max retries exceeded")

Increasing your limits

  • Spend more. Tiers upgrade automatically based on cumulative spend. No action required on your part.
  • Request an increase. Submit a request through the xAI Console if you need higher limits without additional spend, or limits beyond Tier 4.
  • Contact sales. For enterprise-grade capacity, please email sales@x.ai.

Last updated: June 20, 2026