When consecutive requests share the same starting messages, the xAI API automatically caches them. On the next request, messages at the beginning that match exactly are served from cache:

Faster time-to-first-token — the model skips re-computing cached messages
Lower cost — cached tokens are billed at a reduced rate

The xAI API performs prompt caching automatically. However, we recommend setting the x-grok-conv-id HTTP header to maximize your cache hit rate.

In this section

How It Works — Understand how caching works from the start of your messages array
Maximizing Cache Hits — Set up x-grok-conv-id and prompt_cache_key for optimal caching
What Breaks Caching — Common mistakes that cause cache misses
Usage & Pricing — Read cached token counts and understand billing
Best Practices & FAQ — Tips, supported models, and common questions

Last updated: March 16, 2026

Advanced API Usage

Prompt Caching

In this section