Prompt Caching
What Breaks Caching
Any change to earlier messages breaks the cache. Only append new messages at the end.
Keep messages unchanged. For cache hits in multi-turn conversations, never edit, remove, or reorder earlier messages — only append new ones. For reasoning models, you must include reasoning_content from previous responses; omitting it is the top cause of cache misses.
For reasoning models, you can maintain cache hits by either:
- Sending back the encrypted reasoning content — Include the
reasoning_contentfrom the previous response. See Encrypted Reasoning Content for details. - Using stateful responses — Use
previous_response_idto automatically continue the conversation. See Chaining the Conversation for details.
Cache hit — appending a new message
The prompt prefix is identical to the previous request, with only a new user message appended:
# Turn 1: Initial request (establishes the cache)
curl https://api.x.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $XAI_API_KEY" \
-H "x-grok-conv-id: conv_abc123" \
-d '{
"model": "grok-4.20-beta-latest-non-reasoning",
"messages": [
{"role": "system", "content": "You are Grok, a helpful and truthful AI assistant built by xAI."},
{"role": "user", "content": "What is prompt caching?"},
{"role": "assistant", "content": "Prompt caching stores KV pairs from unchanged prompt prefixes so they can be reused on subsequent requests. This makes responses faster and cheaper."}
]
}'
# Turn 2: Cache HIT — exact prefix preserved, new message appended
curl https://api.x.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $XAI_API_KEY" \
-H "x-grok-conv-id: conv_abc123" \
-d '{
"model": "grok-4.20-beta-latest-non-reasoning",
"messages": [
{"role": "system", "content": "You are Grok, a helpful and truthful AI assistant built by xAI."},
{"role": "user", "content": "What is prompt caching?"},
{"role": "assistant", "content": "Prompt caching stores KV pairs from unchanged prompt prefixes so they can be reused on subsequent requests. This makes responses faster and cheaper."},
{"role": "user", "content": "Show me a code example."}
]
}'
Cache miss — editing an earlier message
Changing the content of any earlier message breaks the prefix match:
Bash
# Cache MISS — editing the assistant message content
curl https://api.x.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $XAI_API_KEY" \
-H "x-grok-conv-id: conv_abc123" \
-d '{
"model": "grok-4.20-beta-latest-non-reasoning",
"messages": [
{"role": "system", "content": "You are Grok, a helpful and truthful AI assistant built by xAI."},
{"role": "user", "content": "What is prompt caching?"},
{"role": "assistant", "content": "Prompt caching stores KV pairs from unchanged prompt prefixes so they can be reused on subsequent requests. This makes responses faster and cheaper."},
{"role": "assistant", "content": "It stores KV pairs."},
{"role": "user", "content": "Show me a code example."}
]
}'
What changed: The assistant response on line 11 was shortened to "It stores KV pairs." (line 12).
Cache miss — removing a message
Removing any message from the conversation breaks the prefix:
Bash
# Cache MISS — the assistant message was removed
curl https://api.x.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $XAI_API_KEY" \
-H "x-grok-conv-id: conv_abc123" \
-d '{
"model": "grok-4.20-beta-latest-non-reasoning",
"messages": [
{"role": "system", "content": "You are Grok, a helpful and truthful AI assistant built by xAI."},
{"role": "user", "content": "What is prompt caching?"},
{"role": "assistant", "content": "Prompt caching stores KV pairs from unchanged prompt prefixes so they can be reused on subsequent requests. This makes responses faster and cheaper."},
{"role": "user", "content": "Show me a code example."}
]
}'
What changed: The assistant message on line 11 was removed entirely.
Cache miss — reordering messages
Changing the order of messages also breaks the prefix:
Bash
# Cache MISS — user and system messages are swapped
curl https://api.x.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $XAI_API_KEY" \
-H "x-grok-conv-id: conv_abc123" \
-d '{
"model": "grok-4.20-beta-latest-non-reasoning",
"messages": [
{"role": "user", "content": "What is prompt caching?"},
{"role": "system", "content": "You are Grok, a helpful and truthful AI assistant built by xAI."},
{"role": "assistant", "content": "Prompt caching stores KV pairs from unchanged prompt prefixes so they can be reused on subsequent requests. This makes responses faster and cheaper."},
{"role": "user", "content": "Show me a code example."}
]
}'
What changed: Lines 9 and 10 were swapped — the user message now comes before the system message.
Next
Did you find this page helpful?