Advanced API Usage

Priority Processing

Priority Processing gives your xAI API requests higher scheduling priority, which typically results in lower time-to-first-token (TTFT) and faster inter-token latency (ITL), especially during periods of high demand. Add service_tier: "priority" to any request body to opt in—no capacity reservations or advance provisioning required. The parameter is supported on text inference endpoints: Chat Completions and Responses.

When priority capacity is available, requests are scheduled ahead of standard traffic. The response always includes a service_tier field indicating whether priority was granted; check it to confirm.


How it works

Add the service_tier field to any supported request. The API returns the tier that was actually used in the response, so you can confirm the upgrade took effect.

The service_tier field accepts the following values:

ValueMeaning
"default"Standard processing. This is the same as omitting the field entirely.
"priority"Request higher scheduling priority at a premium token price.

Priority requests are billed at a premium per-token rate. Cache discounts still apply to cached input tokens before the multiplier. For current per-model rates and the exact priority premium, see the Pricing page.


Quick start

Pass service_tier: "priority" in your request body. The response includes a service_tier field confirming which tier was used.

curl https://api.x.ai/v1/responses \
  -H "Authorization: Bearer $XAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "grok-4.3",
    "input": "Explain the Riemann hypothesis in one paragraph.",
    "service_tier": "priority"
  }'

The response includes "service_tier": "priority" when the request was served at the priority tier, or "service_tier": "default" if it was served at the default tier instead. You are only billed at the priority rate when the response confirms "priority".

JSON

{
  "id": "resp_abc123",
  "model": "grok-4.3",
  "service_tier": "priority",
  "usage": {
    "input_tokens": 42,
    "output_tokens": 156,
    "cost_in_usd_ticks": 37756000
  }
}

Best practices

  • Latency-sensitive paths first — Priority Processing is most valuable for user-facing requests where response time directly affects experience. Background jobs, evaluations, and bulk processing are better served by the Batch API.
  • Monitor the service_tier field — Log the returned tier to track how often your requests are served at priority versus default and to correlate with your latency metrics.
  • Combine with prompt caching — Cached input tokens are discounted before the priority multiplier is applied, so prompt caching and priority processing complement each other well.

Last updated: June 12, 2026