Model Capabilities
Reasoning
presencePenalty, frequencyPenalty and stop parameters are not supported by reasoning models.
Adding them in the request would result in an error.
Key Features
- Think Before Responding: Reasoning models think through problems step-by-step before delivering an answer.
- Math & Quantitative Strength: Excels at numerical challenges, logic puzzles, and complex analytical tasks.
- Reasoning Trace: Usage metrics expose
reasoning_tokens. Some models can also return encrypted reasoning viainclude: ["reasoning.encrypted_content"](see below).
Encrypted Reasoning Content
The reasoning content is encrypted by us and can be returned if you pass include: ["reasoning.encrypted_content"] to the Responses API. You can send the encrypted content back to provide more context to a previous conversation. See Adding encrypted thinking content for more details on how to use the content.
When using the Vercel AI SDK, encrypted reasoning content is automatically included under the hood as long as store: false is not specified. No additional configuration is needed.
The reasoning parameter
reasoning_effort is not supported by grok-4.20 or grok-4-1-fast. Specifying reasoning_effort on these models will return an error. These models reason automatically without any configuration.
The only model that accepts the reasoning parameter is grok-4.20-multi-agent, where it controls how many agents collaborate on a request — not how hard the model thinks.
Multi-agent model: agent count (not thinking effort)
For grok-4.20-multi-agent, the reasoning parameter does not control how hard the model thinks. Instead, it controls how many agents collaborate on the request.
When using grok-4.20-multi-agent, the reasoning.effort parameter selects between two agent configurations:
| Setting | Agent Count | Best For |
|---|---|---|
"low" or "medium" | 4 agents | Quick research, focused queries |
"high" or "xhigh" | 16 agents | Deep research, complex multi-faceted topics |
More agents means deeper, more thorough research at the cost of higher token usage and latency. For full details and code examples, see the Multi Agent documentation.
Summary table
| Model | reasoning parameter | Behavior |
|---|---|---|
grok-4.20-multi-agent | reasoning.effort: "low" / "medium" / "high" / "xhigh" | Controls agent count (4 or 16) |
grok-4.20, grok-4-1-fast | Not supported | Reasons automatically; returns an error if specified |
Usage Example
Here is a simple example using grok-4.20-reasoning to multiply 101 by 3. No reasoning_effort parameter is needed — the model reasons automatically.
import os
from xai_sdk import Client
from xai_sdk.chat import system, user
client = Client(
api_key=os.getenv("XAI_API_KEY"),
timeout=3600, # Override default timeout with longer timeout for reasoning models
)
chat = client.chat.create(
model="grok-4.20-reasoning",
messages=[system("You are a highly intelligent AI assistant.")],
)
chat.append(user("What is 101*3?"))
response = chat.sample()
print("Final Response:")
print(response.content)
print("Number of completion tokens:")
print(response.usage.completion_tokens)
print("Number of reasoning tokens:")
print(response.usage.reasoning_tokens)
Sample Output
Output
Final Response:
The result of 101 multiplied by 3 is 303.
Number of completion tokens:
14
Number of reasoning tokens:
310
Notes on Consumption
When you use a reasoning model, the reasoning tokens are billed as part of your total consumption.
For the multi-agent model, all tokens consumed by both the leader agent and sub-agents are billed. Choosing 16 agents (via "high" / "xhigh") will use significantly more tokens than 4 agents. See the Multi Agent pricing section for details.
Did you find this page helpful?
Last updated: April 3, 2026