Advanced API Usage
Prompt Caching
When consecutive requests share the same starting messages, the xAI API automatically caches them. On the next request, messages at the beginning that match exactly are served from cache:
- Faster time-to-first-token — the model skips re-computing cached messages
- Lower cost — cached tokens are billed at a reduced rate
The xAI API performs prompt caching automatically. However, we recommend setting the x-grok-conv-id HTTP header to maximize your cache hit rate.
In this section
- How It Works — Understand how caching works from the start of your messages array
- Maximizing Cache Hits — Set up
x-grok-conv-idandprompt_cache_keyfor optimal caching - What Breaks Caching — Common mistakes that cause cache misses
- Usage & Pricing — Read cached token counts and understand billing
- Best Practices & FAQ — Tips, supported models, and common questions
Did you find this page helpful?