Prompt Caching

What Breaks Caching

View as Markdown

Any change to earlier messages breaks the cache. Only append new messages at the end.

Keep messages unchanged. For cache hits in multi-turn conversations, never edit, remove, or reorder earlier messages — only append new ones. For reasoning models, you must include reasoning_content from previous responses; omitting it is the top cause of cache misses.

For reasoning models, you can maintain cache hits by either:

  • Sending back the encrypted reasoning content — Include the reasoning_content from the previous response. See Encrypted Reasoning Content for details.
  • Using stateful responses — Use previous_response_id to automatically continue the conversation. See Chaining the Conversation for details.

Cache hit — appending a new message

The prompt prefix is identical to the previous request, with only a new user message appended:

# Turn 1: Initial request (establishes the cache)
curl https://api.x.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $XAI_API_KEY" \
  -H "x-grok-conv-id: conv_abc123" \
  -d '{
    "model": "grok-4.20-beta-latest-non-reasoning",
    "messages": [
      {"role": "system", "content": "You are Grok, a helpful and truthful AI assistant built by xAI."},
      {"role": "user", "content": "What is prompt caching?"},
      {"role": "assistant", "content": "Prompt caching stores KV pairs from unchanged prompt prefixes so they can be reused on subsequent requests. This makes responses faster and cheaper."}
    ]
  }'

# Turn 2: Cache HIT — exact prefix preserved, new message appended
curl https://api.x.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $XAI_API_KEY" \
  -H "x-grok-conv-id: conv_abc123" \
  -d '{
    "model": "grok-4.20-beta-latest-non-reasoning",
    "messages": [
      {"role": "system", "content": "You are Grok, a helpful and truthful AI assistant built by xAI."},
      {"role": "user", "content": "What is prompt caching?"},
      {"role": "assistant", "content": "Prompt caching stores KV pairs from unchanged prompt prefixes so they can be reused on subsequent requests. This makes responses faster and cheaper."},
      {"role": "user", "content": "Show me a code example."}
    ]
  }'

Cache miss — editing an earlier message

Changing the content of any earlier message breaks the prefix match:

Bash

# Cache MISS — editing the assistant message content
curl https://api.x.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $XAI_API_KEY" \
  -H "x-grok-conv-id: conv_abc123" \
  -d '{
    "model": "grok-4.20-beta-latest-non-reasoning",
    "messages": [
      {"role": "system", "content": "You are Grok, a helpful and truthful AI assistant built by xAI."},
      {"role": "user", "content": "What is prompt caching?"},
      {"role": "assistant", "content": "Prompt caching stores KV pairs from unchanged prompt prefixes so they can be reused on subsequent requests. This makes responses faster and cheaper."},
      {"role": "assistant", "content": "It stores KV pairs."},
      {"role": "user", "content": "Show me a code example."}
    ]
  }'

What changed: The assistant response on line 11 was shortened to "It stores KV pairs." (line 12).


Cache miss — removing a message

Removing any message from the conversation breaks the prefix:

Bash

# Cache MISS — the assistant message was removed
curl https://api.x.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $XAI_API_KEY" \
  -H "x-grok-conv-id: conv_abc123" \
  -d '{
    "model": "grok-4.20-beta-latest-non-reasoning",
    "messages": [
      {"role": "system", "content": "You are Grok, a helpful and truthful AI assistant built by xAI."},
      {"role": "user", "content": "What is prompt caching?"},
      {"role": "assistant", "content": "Prompt caching stores KV pairs from unchanged prompt prefixes so they can be reused on subsequent requests. This makes responses faster and cheaper."},
      {"role": "user", "content": "Show me a code example."}
    ]
  }'

What changed: The assistant message on line 11 was removed entirely.


Cache miss — reordering messages

Changing the order of messages also breaks the prefix:

Bash

# Cache MISS — user and system messages are swapped
curl https://api.x.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $XAI_API_KEY" \
  -H "x-grok-conv-id: conv_abc123" \
  -d '{
    "model": "grok-4.20-beta-latest-non-reasoning",
    "messages": [
      {"role": "user", "content": "What is prompt caching?"},
      {"role": "system", "content": "You are Grok, a helpful and truthful AI assistant built by xAI."},
      {"role": "assistant", "content": "Prompt caching stores KV pairs from unchanged prompt prefixes so they can be reused on subsequent requests. This makes responses faster and cheaper."},
      {"role": "user", "content": "Show me a code example."}
    ]
  }'

What changed: Lines 9 and 10 were swapped — the user message now comes before the system message.


Next


Did you find this page helpful?