Prompt Caching

What Breaks Caching

Any change to earlier messages breaks the cache. Only append new messages at the end.

Keep messages unchanged. For cache hits in multi-turn conversations, never edit, remove, or reorder earlier messages — only append new ones. For reasoning models, you must include reasoning_content from previous responses; omitting it is the top cause of cache misses.

For reasoning models, you can maintain cache hits by either:

  • Sending back the encrypted reasoning content — Include the reasoning_content from the previous response. See Encrypted Reasoning Content for details.
  • Using stateful responses — Use previous_response_id to automatically continue the conversation. See Chaining the Conversation for details.

Cache hit — appending a new message

The prompt prefix is identical to the previous request, with only a new user message appended:

# Turn 1: Initial request (establishes the cache)
curl https://api.x.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $XAI_API_KEY" \
  -H "x-grok-conv-id: conv_abc123" \
  -d '{
    "model": "grok-4.3",
    "messages": [
      {"role": "system", "content": "You are Grok, a helpful and truthful AI assistant built by xAI."},
      {"role": "user", "content": "What is prompt caching?"},
      {"role": "assistant", "content": "Prompt caching stores KV pairs from unchanged prompt prefixes so they can be reused on subsequent requests. This makes responses faster and cheaper."}
    ]
  }'

# Turn 2: Cache HIT — exact prefix preserved, new message appended
curl https://api.x.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $XAI_API_KEY" \
  -H "x-grok-conv-id: conv_abc123" \
  -d '{
    "model": "grok-4.3",
    "messages": [
      {"role": "system", "content": "You are Grok, a helpful and truthful AI assistant built by xAI."},
      {"role": "user", "content": "What is prompt caching?"},
      {"role": "assistant", "content": "Prompt caching stores KV pairs from unchanged prompt prefixes so they can be reused on subsequent requests. This makes responses faster and cheaper."},
      {"role": "user", "content": "Show me a code example."}
    ]
  }'

Cache miss — editing an earlier message

Changing the content of any earlier message breaks the prefix match:

Bash

# Cache MISS — editing the assistant message content
curl https://api.x.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $XAI_API_KEY" \
  -H "x-grok-conv-id: conv_abc123" \
  -d '{
    "model": "grok-4.3",
    "messages": [
      {"role": "system", "content": "You are Grok, a helpful and truthful AI assistant built by xAI."},
      {"role": "user", "content": "What is prompt caching?"},
      {"role": "assistant", "content": "Prompt caching stores KV pairs from unchanged prompt prefixes so they can be reused on subsequent requests. This makes responses faster and cheaper."},
      {"role": "assistant", "content": "It stores KV pairs."},
      {"role": "user", "content": "Show me a code example."}
    ]
  }'

What changed: The assistant response on line 11 was shortened to "It stores KV pairs." (line 12).


Cache miss — removing a message

Removing any message from the conversation breaks the prefix:

Bash

# Cache MISS — the assistant message was removed
curl https://api.x.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $XAI_API_KEY" \
  -H "x-grok-conv-id: conv_abc123" \
  -d '{
    "model": "grok-4.3",
    "messages": [
      {"role": "system", "content": "You are Grok, a helpful and truthful AI assistant built by xAI."},
      {"role": "user", "content": "What is prompt caching?"},
      {"role": "assistant", "content": "Prompt caching stores KV pairs from unchanged prompt prefixes so they can be reused on subsequent requests. This makes responses faster and cheaper."},
      {"role": "user", "content": "Show me a code example."}
    ]
  }'

What changed: The assistant message on line 11 was removed entirely.


Cache miss — reordering messages

Changing the order of messages also breaks the prefix:

Bash

# Cache MISS — user and system messages are swapped
curl https://api.x.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $XAI_API_KEY" \
  -H "x-grok-conv-id: conv_abc123" \
  -d '{
    "model": "grok-4.3",
    "messages": [
      {"role": "user", "content": "What is prompt caching?"},
      {"role": "system", "content": "You are Grok, a helpful and truthful AI assistant built by xAI."},
      {"role": "assistant", "content": "Prompt caching stores KV pairs from unchanged prompt prefixes so they can be reused on subsequent requests. This makes responses faster and cheaper."},
      {"role": "user", "content": "Show me a code example."}
    ]
  }'

What changed: Lines 9 and 10 were swapped — the user message now comes before the system message.


Next


Last updated: March 16, 2026