Model Capabilities

Reasoning

Key Features

  • Think Before Responding: Reasoning models think through problems step-by-step before delivering an answer.
  • Math & Quantitative Strength: Excels at numerical challenges, logic puzzles, and complex analytical tasks.
  • Reasoning Trace: Usage metrics expose reasoning_tokens. Some models can also return encrypted reasoning via include: ["reasoning.encrypted_content"] (see below).

Encrypted Reasoning Content

The reasoning content is encrypted by us and can be returned if you pass include: ["reasoning.encrypted_content"] to the Responses API. You can send the encrypted content back to provide more context to a previous conversation. See Adding encrypted thinking content for more details on how to use the content.

When using the Vercel AI SDK, encrypted reasoning content is automatically included under the hood as long as store: false is not specified. No additional configuration is needed.


The reasoning_effort parameter

grok-4.3 supports the reasoning_effort parameter, which controls how much effort the model spends thinking before responding.

If not specified, reasoning_effort defaults to "low". If set to "none", no reasoning will occur.

presencePenalty, frequencyPenalty, and stop cannot be used with reasoning models. Requests that include them return an error.

Effort levels

SettingDescriptionBest For
"none"Disables reasoning entirely; no thinking tokens are usedSimple use cases that require a near-instant response.
"low" (default)Uses some reasoning tokens, but still fastGeneral agentic use and tool calling.
"medium"More thinking for less-latency sensitive applicationsComplex data analysis and long-context reasoning.
"high"Uses more reasoning tokens for deeper thinkingVery challenging problems, complex math, multi-step logic, competition-level tasks

Setting reasoning effort

The following example sets reasoning_effort to "high" for a challenging math proof. You can substitute "none", "low", or "medium" as needed.

import os

from xai_sdk import Client
from xai_sdk.chat import system, user

client = Client(
    api_key=os.getenv("XAI_API_KEY"),
    timeout=3600,
)

chat = client.chat.create(
    model="grok-4.3",
    reasoning_effort="high",
    messages=[system("You are a highly intelligent AI assistant.")],
)
chat.append(user("Find all prime numbers p such that p^2 + 2 is also prime. Prove your answer."))

response = chat.sample()

print("Final Response:")
print(response.content)

Multi-agent model

For grok-4.20-multi-agent, the reasoning.effort parameter controls how many agents collaborate on a request rather than reasoning depth. See the Multi Agent documentation for details.

Summary table

Modelreasoning parameterBehavior
grok-4.3reasoning.effort: "none" / "low" (default) / "medium" / "high"Controls reasoning depth ("none" disables it)
grok-4.20-multi-agentreasoning.effort: "low" / "medium" / "high" / "xhigh"Controls agent count (4 or 16)

Summarized Reasoning Content

For grok-4.3, we expose summarizations of the model's internal reasoning. Here's an example of how to stream the reasoning summary deltas alongside the final response:

import os

from xai_sdk import Client
from xai_sdk.chat import system, user

client = Client(
    api_key=os.getenv("XAI_API_KEY"),
    timeout=3600, # Override default timeout with longer timeout for reasoning models
)

chat = client.chat.create(
    model="grok-4.3",
    messages=[system("You are a highly intelligent AI assistant.")],
)
chat.append(user("A projectile is launched at 30 m/s at 37° above horizontal from a 45 m cliff. Find its speed on impact. (g=10 m/s²)"))

content_started = False

print("\n\n--------- Reasoning ---------", flush=True)

latest_response = None
for response, chunk in chat.stream():
    if chunk.reasoning_content:
        print(chunk.reasoning_content, end="", flush=True)

Sample Output

Output

--------- Reasoning ---------
The problem is: A projectile is launched at 30 m/s at 37° above horizontal from a 45 m cliff. Find its speed on impact. (g=10 m/s²)
I need to find the speed when the projectile hits the ground. It's launched at 30 m/s at 37° from a 45 m cliff, with g=10 m/s².

Conservation of energy is a good approach. The initial kinetic energy is (1/2)mv² with v=30 m/s, and initial potential energy is mgh with h=45 m, taking ground as zero.

At impact, potential energy is zero, so initial KE + initial PE = final KE.

Thus, (1/2)m(30)² + mg(45) = (1/2)m v_f²

v_f² = 900 + 2*10*45 = 900 + 900 = 1800

v_f = sqrt(1800) = 30√2 m/s ≈ 42.4 m/s

The angle doesn't affect the final speed because the initial kinetic energy and potentialenergy change are the same regardless of direction, as long as the speed and height are the same.

Yes, that makes sense. The final speed is sqrt(v0² + 2gh), independent of the launch angle.

Notes on Consumption

When you use a reasoning model, the reasoning tokens are billed as part of your total consumption.

For the multi-agent model, all tokens consumed by both the leader agent and sub-agents are billed. Choosing 16 agents (via "high" / "xhigh") will use significantly more tokens than 4 agents. See the Multi Agent pricing section for details.


Last updated: May 7, 2026