Advanced API Usage

Prompt Caching

View as Markdown

When consecutive requests share the same starting messages, the xAI API automatically caches them. On the next request, messages at the beginning that match exactly are served from cache:

  • Faster time-to-first-token — the model skips re-computing cached messages
  • Lower cost — cached tokens are billed at a reduced rate

The xAI API performs prompt caching automatically. However, we recommend setting the x-grok-conv-id HTTP header to maximize your cache hit rate.


In this section


Did you find this page helpful?