Guides
Overview
The xAI API supports agentic server-side tool calling which enables the model to autonomously explore, search, and execute code to solve complex queries. Unlike traditional tool-calling where clients must handle each tool invocation themselves, xAI's agentic API manages the entire reasoning and tool-execution loop on the server side.
xAI Python SDK Users: Version 1.3.1 of the xai-sdk package is required to use the agentic tool calling API.
Tools Pricing
Agentic requests are priced based on two components: token usage and tool invocations. Since the agent autonomously decides how many tools to call, costs scale with query complexity.
For more details on Tools pricing, please check out the pricing page.
Agentic Tool Calling
When you provide server-side tools to a request, the xAI server orchestrates an autonomous reasoning loop rather than returning tool calls for you to execute. This creates a seamless experience where the model acts as an intelligent agent that researches, analyzes, and responds automatically.
Behind the scenes, the model follows an iterative reasoning process:
- Analyzes the query and current context to determine what information is needed
- Decides what to do next: Either make a tool call to gather more information or provide a final answer
- If making a tool call: Selects the appropriate tool and parameters based on the reasoning
- Executes the tool in real-time on the server and receives the results
- Processes the tool response and integrates it with previous context and reasoning
- Repeats the loop: Uses the new information to decide whether more research is needed or if a final answer can be provided
- Returns the final response once the agent determines it has sufficient information to answer comprehensively
This autonomous orchestration enables complex multi-step research and analysis to happen automatically, with clients seeing the final result as well as optional real-time progress indicators like tool call notifications during streaming.
Core Capabilities
- Web Search: Real-time search across the internet with the ability to both search the web and browse web pages.
- X Search: Semantic and keyword search across X posts, users, and threads.
- Code Execution: The model can write and execute Python code for calculations, data analysis, and complex computations.
- Image/Video Understanding: Optional visual content understanding and analysis for search results encountered (video understanding is only available for X posts).
- Collections Search: The model can search through your uploaded knowledge bases and collections to retrieve relevant information.
- Remote MCP Tools: Connect to external MCP servers to access custom tools.
- Document Search: Upload files and chat with them using intelligent document search. This tool is automatically enabled when you attach files to a chat message.
Quick Start
We strongly recommend using the xAI Python SDK in streaming mode when using agentic tool calling. Doing so grants you the full feature set of the API, including the ability to get real-time observability and immediate feedback during potentially long-running requests.
Here is a quick start example of using the agentic tool calling API.
import os
from xai_sdk import Client
from xai_sdk.chat import user
from xai_sdk.tools import web_search, x_search, code_execution
client = Client(api_key=os.getenv("XAI_API_KEY"))
chat = client.chat.create(
model="grok-4-1-fast", # reasoning model
# All server-side tools active
tools=[
web_search(),
x_search(),
code_execution(),
],
include=["verbose_streaming"],
)
# Feel free to change the query here to a question of your liking
chat.append(user("What are the latest updates from xAI?"))
is_thinking = True
for response, chunk in chat.stream():
# View the server-side tool calls as they are being made in real-time
for tool_call in chunk.tool_calls:
print(f"\nCalling tool: {tool_call.function.name} with arguments: {tool_call.function.arguments}")
if response.usage.reasoning_tokens and is_thinking:
print(f"\rThinking... ({response.usage.reasoning_tokens} tokens)", end="", flush=True)
if chunk.content and is_thinking:
print("\n\nFinal Response:")
is_thinking = False
if chunk.content and not is_thinking:
print(chunk.content, end="", flush=True)
print("\n\nCitations:")
print(response.citations)
print("\n\nUsage:")
print(response.usage)
print(response.server_side_tool_usage)
print("\n\nServer Side Tool Calls:")
print(response.tool_calls)You will be able to see output like:
Output
Thinking... (270 tokens)
Calling tool: x_user_search with arguments: {"query":"xAI official","count":1}
Thinking... (348 tokens)
Calling tool: x_user_search with arguments: {"query":"xAI","count":5}
Thinking... (410 tokens)
Calling tool: x_keyword_search with arguments: {"query":"from:xai","limit":10,"mode":"Latest"}
Thinking... (667 tokens)
Calling tool: web_search with arguments: {"query":"xAI latest updates site:x.ai","num_results":5}
Thinking... (850 tokens)
Calling tool: browse_page with arguments: {"url": "https://x.ai/news"}
Thinking... (1215 tokens)
Final Response:
### Latest Updates from xAI (as of October 12, 2025)
xAI primarily shares real-time updates via their official X (Twitter) account (@xai), with more formal announcements on their website (x.ai). Below is a summary of the most recent developments...
... full response omitted for brevity
Citations:
[
'https://x.com/i/user/1912644073896206336',
'https://x.com/i/user/1019237602585645057',
'https://x.com/i/status/1975607901571199086',
'https://x.com/i/status/1975608122845896765',
'https://x.com/i/status/1975608070245175592',
'https://x.com/i/user/1603826710016819209',
'https://x.com/i/status/1975608007250829383',
'https://status.x.ai/',
'https://x.com/i/user/150543432',
'https://x.com/i/status/1975608184711880816',
'https://x.com/i/status/1971245659660718431',
'https://x.com/i/status/1975608132530544900',
'https://x.com/i/user/1661523610111193088',
'https://x.com/i/status/1977121515587223679',
'https://x.ai/news/grok-4-fast',
'https://x.com/i/status/1975608017396867282',
'https://x.ai/',
'https://x.com/i/status/1975607953391755740',
'https://x.com/i/user/1875560944044273665',
'https://x.ai/news',
'https://docs.x.ai/docs/release-notes'
]
Usage:
completion_tokens: 1216
prompt_tokens: 29137
total_tokens: 31568
prompt_text_tokens: 29137
reasoning_tokens: 1215
cached_prompt_text_tokens: 22565
server_side_tools_used: SERVER_SIDE_TOOL_X_SEARCH
server_side_tools_used: SERVER_SIDE_TOOL_X_SEARCH
server_side_tools_used: SERVER_SIDE_TOOL_X_SEARCH
server_side_tools_used: SERVER_SIDE_TOOL_WEB_SEARCH
server_side_tools_used: SERVER_SIDE_TOOL_WEB_SEARCH
{'SERVER_SIDE_TOOL_X_SEARCH': 3, 'SERVER_SIDE_TOOL_WEB_SEARCH': 2}
Server Side Tool Calls:
[id: "call_51132959"
function {
name: "x_user_search"
arguments: "{"query":"xAI official","count":1}"
}
, id: "call_00956753"
function {
name: "x_user_search"
arguments: "{"query":"xAI","count":5}"
}
, id: "call_07881908"
function {
name: "x_keyword_search"
arguments: "{"query":"from:xai","limit":10,"mode":"Latest"}"
}
, id: "call_43296276"
function {
name: "web_search"
arguments: "{"query":"xAI latest updates site:x.ai","num_results":5}"
}
, id: "call_70310550"
function {
name: "browse_page"
arguments: "{"url": "https://x.ai/news"}"
}
]Understanding the Agentic Tool Calling Response
The agentic tool calling API provides rich observability into the autonomous research process. This section dives deep into the original code snippet above, covering key ways to effectively use the API and understand both real-time streaming responses and final results:
Real-time server-side tool calls
When executing agentic requests using streaming, you can observe every tool call decision the model makes in real-time via the tool_calls attribute on the chunk object. This shows the exact parameters the agent chose for each tool invocation, giving you visibility into its search strategy. Occasionally the model may decide to invoke multiple tools in parallel during a single turn, in which case each entry in the list of tool_calls would represent one of those parallel tool calls; otherwise, only a single entry would be present in tool_calls.
Note: Only the tool call invocations themselves are shown - server-side tool call outputs are not returned in the API response. The agent uses these outputs internally to formulate its final response, but they are not exposed to the user.
When using the xAI Python SDK in streaming mode, it will automatically accumulate the tool_calls into the response object for you, letting you access a final list of all the server-side tool calls made during the agentic loop. This is demonstrated in the section below.
Python
for tool_call in chunk.tool_calls:
print(f"\nCalling tool: {tool_call.function.name} with arguments: {tool_call.function.arguments}")
Output
Calling tool: x_user_search with arguments: {"query":"xAI official","count":1}
Calling tool: x_user_search with arguments: {"query":"xAI","count":5}
Calling tool: x_keyword_search with arguments: {"query":"from:xai","limit":10,"mode":"Latest"}
Calling tool: web_search with arguments: {"query":"xAI latest updates site:x.ai","num_results":5}
Calling tool: browse_page with arguments: {"url": "https://x.ai/news"}
Citations
The agent tools API provides two types of citation information: All Citations (a complete list of all sources encountered) and Inline Citations (markdown-style links embedded directly in the response text, with structured metadata also available on the response object).
All Citations
The citations attribute on the response object provides a comprehensive list of URLs for all sources the agent encountered during its search process. This list is always returned by default — no additional configuration is required.
Citations are automatically collected from successful tool executions and provide full traceability of the agent's information sources. They are returned when the agentic request completes and are not available in real-time during streaming.
Note that not every URL in this list will necessarily be directly referenced in the final answer. The agent may examine a source during its research process and determine it is not sufficiently relevant to the user's query, but the URL will still appear in this list for transparency.
Python
response.citations
Output
[
'https://x.com/i/user/1912644073896206336',
'https://x.com/i/status/1975607901571199086',
'https://x.ai/news',
'https://docs.x.ai/docs/release-notes',
...
]Inline Citations
xAI Python SDK Users: Version 1.5.0 or later of the xai-sdk package is required to use the inline citations feature.
Inline citations are markdown-style links (e.g., [[1]](https://x.ai/news)) inserted directly into the response text at the points where the model references sources. In addition to these visible links, structured metadata is available on the response object with precise positional information. Unlike the citations list which contains all encountered URLs, inline citations only include sources that the model explicitly cited in its response.
Important: Enabling inline citations does not guarantee that the model will cite sources on every answer. The model decides when and where to include citations based on the context and nature of the query. Some responses may have no inline citations if the model determines citations are not necessary for that particular answer.
Enabling Inline Citations
Inline citations are opt-in and must be explicitly enabled by passing include=["inline_citations"] when creating your chat:
Python
import os
from xai_sdk import Client
from xai_sdk.chat import user
from xai_sdk.tools import web_search, x_search
client = Client(api_key=os.getenv("XAI_API_KEY"))
chat = client.chat.create(
model="grok-4-1-fast",
tools=[
web_search(),
x_search(),
],
include=["inline_citations"], # Enable inline citations
)Markdown Citation Format
When inline citations are enabled, the model will insert markdown-style citation links directly into the response text at the points where sources are referenced:
Output
The latest announcements from xAI, primarily from their official X account (@xai) and website (x.ai/news), date back to November 19, 2025.[[1]](https://x.ai/news/)[[2]](https://x.ai/)[[3]](https://x.com/i/status/1991284813727474073)When rendered as markdown, this displays as clickable links:
The latest announcements from xAI, primarily from their official X account (@xai) and website (x.ai/news), date back to November 19, 2025.[1][2][3]
The format is [[N]](url) where:
Nis the sequential display number for the citation starting from 1.urlis the source URL
Citation placement: Citations are typically placed at the end of a sentence, with multiple citations for the same statement appearing adjacent to one another as shown in the example above.
Citation numbering: Citation numbers always start from 1 and increment sequentially. If the same source is cited again later in the response, the original citation number will be reused to maintain consistency.
Accessing Structured Inline Citation Data
In addition to the markdown links in the response text, structured inline citation data is available via the inline_citations property. Each InlineCitation object contains:
| Field | Type | Description |
|---|---|---|
id | string | The display number as a string (e.g., "1", "2") |
start_index | int | Character position where the citation starts in the response text |
end_index | int | Character position where the citation ends (exclusive, following Python slice convention) |
web_citation | WebCitation | Present if the citation is from a web source (contains url) |
x_citation | XCitation | Present if the citation is from an X/Twitter source (contains url) |
Python
# After streaming or sampling completes, access the structured inline citations from the response object:
for citation in response.inline_citations:
print(f"Citation [{citation.id}]:")
print(f" Position: {citation.start_index} to {citation.end_index}")
# Check citation type
if citation.HasField("web_citation"):
print(f" Web URL: {citation.web_citation.url}")
elif citation.HasField("x_citation"):
print(f" X URL: {citation.x_citation.url}")Output
Citation [1]:
Position: 37 to 76
Web URL: https://x.ai/news/grok-4-fast
Citation [2]:
Position: 124 to 171
X URL: https://x.com/xai/status/1234567890Using Position Indices
The start_index and end_index values follow Python slice convention:
start_index: The character position of the first opening square bracket[of the citation (e.g., the position of the first[in[[1]](url))end_index: The character position immediately after the closing parenthesis)(exclusive, so the citation text iscontent[start_index:end_index])
This means you can extract the exact citation markdown from the response text using a simple slice:
Python
content = response.content
for citation in response.inline_citations:
# Extract the markdown link from the response text
citation_text = content[citation.start_index:citation.end_index]
print(f"Citation text: {citation_text}")Output
Citation text: [[1]](https://x.ai/news/grok-4-fast)
Citation text: [[2]](https://x.com/xai/status/1234567890)Streaming Inline Citations
During streaming, inline citations are accumulated and available on the final response. The markdown links appear in real-time in the chunk.content as the model generates text, while the structured InlineCitation objects are populated as citations are detected:
Python
for response, chunk in chat.stream():
# Markdown links appear in chunk.content in real-time
if chunk.content:
print(chunk.content, end="", flush=True)
# Inline citations can also be accessed per-chunk during streaming
for citation in chunk.inline_citations:
print(f"\nNew citation: [{citation.id}]")
# After streaming, access all accumulated inline citations
print("\n\nAll inline citations:")
for citation in response.inline_citations:
url = ""
if citation.HasField("web_citation"):
url = citation.web_citation.url
elif citation.HasField("x_citation"):
url = citation.x_citation.url
print(f" [{citation.id}] {url}")Server-side Tool Calls vs Tool Usage
The API provides two related but distinct metrics for server-side tool executions:
tool_calls - All Attempted Calls
Python
response.tool_calls
Returns a list of all attempted tool calls made during the agentic process. Each entry is a ToolCall object containing:
id: Unique identifier for the tool callfunction.name: The name of the specific server-side tool calledfunction.arguments: The parameters passed to the server-side tool
This includes every tool call attempt, even if some fail.
Output
[id: "call_51132959"
function {
name: "x_user_search"
arguments: "{"query":"xAI official","count":1}"
}
, id: "call_07881908"
function {
name: "x_keyword_search"
arguments: "{"query":"from:xai","limit":10,"mode":"Latest"}"
}
, id: "call_43296276"
function {
name: "web_search"
arguments: "{"query":"xAI latest updates site:x.ai","num_results":5}"
}
]server_side_tool_usage - Successful Calls (Billable)
Python
response.server_side_tool_usage
Returns a map of successfully executed tools and their invocation counts. This represents only the tool calls that returned meaningful responses and is what determines your billing.
Output
{'SERVER_SIDE_TOOL_X_SEARCH': 3, 'SERVER_SIDE_TOOL_WEB_SEARCH': 2}Tool Call Function Names vs Usage Categories
The function names in tool_calls represent the precise/exact name of the tool invoked by the model, while the entries in server_side_tool_usage provide a more high-level categorization that aligns with the original tool passed in the tools array of the request.
Function Name to Usage Category Mapping:
| Usage Category | Function Name(s) |
|---|---|
SERVER_SIDE_TOOL_WEB_SEARCH | web_search, web_search_with_snippets, browse_page |
SERVER_SIDE_TOOL_X_SEARCH | x_user_search, x_keyword_search, x_semantic_search, x_thread_fetch |
SERVER_SIDE_TOOL_CODE_EXECUTION | code_execution |
SERVER_SIDE_TOOL_VIEW_X_VIDEO | view_x_video |
SERVER_SIDE_TOOL_VIEW_IMAGE | view_image |
SERVER_SIDE_TOOL_COLLECTIONS_SEARCH | collections_search |
SERVER_SIDE_TOOL_MCP | {server_label}.{tool_name} if server_label provided, otherwise {tool_name} |
When Tool Calls and Usage Differ
In most cases, tool_calls and server_side_tool_usage will show the same tools. However, they can differ when:
- Failed tool executions: The model attempts to browse a non-existent webpage, fetch a deleted X post, or encounters other execution errors
- Invalid parameters: Tool calls with malformed arguments that can't be processed
- Network or service issues: Temporary failures in the tool execution pipeline
The agentic system is robust enough to handle these failures gracefully, updating its trajectory and continuing with alternative approaches when needed.
Billing Note: Only successful tool executions (server_side_tool_usage) are billed. Failed attempts are not charged.
Server-side Tool Call and Client-side Tool Call
Agentic tool calling supports mixing server-side tools and client-side tools, which enables more use cases when some private tools and data are needed during the agentic tool calling process.
To determine whether the received tool calls need to be executed by the client side, you can simply check the type of the tool call.
For xAI Python SDK users, you can use the provided get_tool_call_type function to get the type of the tool calls.
For a full guide into requests that mix server-side and client-side tools, please check out the advanced usage page.
xAI Python SDK Users: Version 1.4.0 of the xai-sdk package is the minimum requirement to use the get_tool_call_type function.
Python
# ...
response = chat.sample()
from xai_sdk.tools import get_tool_call_type
for tool_call in response.tool_calls:
print(get_tool_call_type(tool_call))
The available tool call types are listed below:
| Tool call types | Description |
|---|---|
"client_side_tool" | Indicates this tool call is a client-side tool call, and an invocation to this function on the client side is required and the tool output needs to be appended to the chat |
"web_search_tool" | Indicates this tool call is a web-search tool call, which is performed by xAI server, NO action needed from the client side |
"x_search_tool" | Indicates this tool call is an x-search tool call, which is performed by xAI server, NO action needed from the client side |
"code_execution_tool" | Indicates this tool call is a code-execution tool call, which is performed by xAI server, NO action needed from the client side |
"collections_search_tool" | Indicates this tool call is a collections-search tool call, which is performed by xAI server, NO action needed from the client side |
"mcp_tool" | Indicates this tool call is an MCP tool call, which is performed by xAI server, NO action needed from the client side |
Understanding Token Usage
Agentic requests have unique token usage patterns compared to standard chat completions. Here's how each token field in the usage object is calculated:
completion_tokens
Represents only the final text output of the model - the comprehensive answer returned to the user. This is typically much smaller than you might expect for such rich, research-driven responses, as the agent performs all its intermediate reasoning and tool orchestration internally.
prompt_tokens
Represents the cumulative input tokens across all inference requests made during the agentic process. Since agentic workflows involve multiple reasoning steps with tool calls, the model makes several inference requests throughout the research process. Each request includes the full conversation history up to that point, which grows as the agent progresses through its research.
While this can result in higher prompt_tokens counts, agentic requests benefit significantly from prompt caching. The majority of the prompt (the conversation prefix) remains unchanged between inference steps, allowing for efficient caching of the shared context. This means that while the total prompt_tokens may appear high, much of the computation is optimized through intelligent caching of the stable conversation history, leading to better cost efficiency overall.
reasoning_tokens
Represents the tokens used for the model's internal reasoning process during agentic workflows. This includes the computational work the agent performs to plan tool calls, analyze results, and formulate responses, but excludes the final output tokens.
cached_prompt_text_tokens
Indicates how many prompt tokens were served from cache rather than recomputed. This shows the efficiency gains from prompt caching - higher values indicate better cache utilization and lower costs.
prompt_image_tokens
Represents the tokens derived from visual content that the agent processes during the request. These tokens are produced when visual understanding is enabled and the agent views images (e.g., via web browsing) or analyzes video frames on X. They are counted separately from text tokens and reflect the cost of ingesting visual features alongside the textual context. If no images or videos are processed, this value will be zero.
prompt_text_tokens and total_tokens
prompt_text_tokens reflects the actual text tokens in prompts (excluding any special tokens), while total_tokens is the sum of all token types used in the request.
Limiting Tool Call Turns in Agentic Requests
The max_turns parameter allows you to control the maximum number of assistant/tool-call turns the agent can perform during a single agentic request. This provides a powerful mechanism to balance response latency, cost, and research depth.
Understanding Turns vs Tool Calls
Important: The max_turns value does not directly limit the number of individual tool calls. Instead, it limits the number of assistant turns in the agentic loop. During a single turn, the model may decide to invoke multiple tools in parallel. For example, with max_turns=1, the agent could still execute 3 or more tool calls if it chooses to run them in parallel within that single turn.
A "turn" represents one iteration of the agentic reasoning loop:
- The model analyzes the current context
- The model decides to call one or more tools (potentially in parallel)
- Tools execute and return results
- The model processes the results
This completes one turn. If more research is needed and max_turns allows, the loop continues. Below is an example of how to use max_turns in an agentic request.
Python
import os
from xai_sdk import Client
from xai_sdk.chat import user
from xai_sdk.tools import web_search, x_search
client = Client(api_key=os.getenv("XAI_API_KEY"))
chat = client.chat.create(
model="grok-4-1-fast",
tools=[
web_search(),
x_search(),
],
max_turns=3, # Limit to 3 assistant/tool-call turns
)
chat.append(user("What is the latest news from xAI?"))
response = chat.sample()
print(response.content)When to Use max_turns
| Use Case | Recommended max_turns | Tradeoff |
|---|---|---|
| Quick lookups | 1-2 | Fastest response, may miss deeper insights |
| Balanced research | 3-5 | Good balance of speed and thoroughness |
| Deep research | 10+ or unset | Most comprehensive, longer latency and higher cost |
Common scenarios for limiting turns:
- Latency-sensitive applications: When you need faster responses and can accept less exhaustive research
- Cost control: To cap the maximum number of tool invocations and limit per-request spend
- Simple queries: When the question likely only needs one or two searches to answer
- Predictable behavior: When you want more consistent response times across requests
To allow the agent to make as many tool calls as needed, and determine by itself when to stop, you should leave max_turns unset.
Default Behavior
If max_turns is not specified, the server applies a global default cap. The effective limit is always the minimum of your requested max_turns and the server's global cap—this ensures requests cannot exceed system-defined limits.
Note: The max_turns parameter is ignored for non-agentic requests (requests without server-side tools).
Behavior When Limit is Reached
When the agent reaches the max_turns limit, it will stop making additional server-side tool calls and proceed directly to generating a final response based on the information gathered so far. The agent will do its best to provide a helpful answer with the available context, though it may note that additional research could provide more complete information.
Synchronous Agentic Requests (Non-streaming)
Although not typically recommended, for simpler use cases or when you want to wait for the complete agentic workflow to finish before processing the response, you can use synchronous requests:
import os
from xai_sdk import Client
from xai_sdk.chat import user
from xai_sdk.tools import code_execution, web_search, x_search
client = Client(api_key=os.getenv("XAI_API_KEY"))
chat = client.chat.create(
model="grok-4-1-fast", # reasoning model
tools=[
web_search(),
x_search(),
code_execution(),
],
)
chat.append(user("What is the latest update from xAI?"))
# Get the final response in one go once it's ready
response = chat.sample()
print("\n\nFinal Response:")
print(response.content)
# Access the citations of the final response
print("\n\nCitations:")
print(response.citations)
# Access the usage details from the entire search process
print("\n\nUsage:")
print(response.usage)
print(response.server_side_tool_usage)
# Access the server side tool calls of the final response
print("\n\nServer Side Tool Calls:")
print(response.tool_calls)Synchronous requests will wait for the entire agentic process to complete before returning the response. This is simpler for basic use cases but provides less visibility into the intermediate steps compared to streaming.
Using Tools with Responses API
We also support using the Responses API in both streaming and non-streaming modes.
import os
from xai_sdk import Client
from xai_sdk.chat import user
from xai_sdk.tools import web_search, x_search
client = Client(api_key=os.getenv("XAI_API_KEY"))
chat = client.chat.create(
model="grok-4-fast",
store_messages=True, # Enable Responses API
tools=[
web_search(),
x_search(),
],
)
chat.append(user("What is the latest update from xAI?"))
response = chat.sample()
print(response.content)
print(response.citations)
# The response id can be used to continue the conversation
print(response.id)Identifying the Client-side Tool Call
A critical step in mixing server-side tools and client-side tools is to identify whether a returned tool call is a client-side tool that needs to be executed locally on the client side.
Similar to the way in xAI Python SDK, you can identify the client-side tool call by checking the type of the output entries (response.output[].type) in the response of OpenAI Responses API.
| Types | Description |
|---|---|
"function_call" | Indicates this tool call is a client-side tool call, and an invocation to this function on the client side is required and the tool output needs to be appended to the chat |
"web_search_call" | Indicates this tool call is a web-search tool call, which is performed by xAI server, NO action needed from the client side |
"x_search_call" | Indicates this tool call is an x-search tool call, which is performed by xAI server, NO action needed from the client side |
"code_interpreter_call" | Indicates this tool call is a code-execution tool call, which is performed by xAI server, NO action needed from the client side |
"file_search_call" | Indicates this tool call is a collections-search tool call, which is performed by xAI server, NO action needed from the client side |
"mcp_call" | Indicates this tool call is an MCP tool call, which is performed by xAI server, NO action needed from the client side |
Accessing Outputs from Server Side Tool Calls
By default, the output of server-side tool calls will not be returned to the caller since the outputs can be large and require extra compute to process on the client side.
However, the output of server-side tool calls can be useful in some cases, for example providing a good user experience on the client side. We introduce a new parameter includewhich allows you to selectively specify which server-side tool outputs should be included in the responses.
Specify Tool Outputs to Return
In order to include the tool output in the response, you will need to specify which server-side tool outputs you wish to include in the response.
For the xAI-SDK, the mapping below shows the tool name and corresponding values that should be passed to the include array:
| Tool | Value for include field in xAI-SDK | Included Outputs |
|---|---|---|
"web_search" | "web_search_call_output" | Encrypted web search content |
"x_search" | "x_search_call_output" | Encrypted X search content |
"code_execution" | "code_execution_call_output" | Full plaintext execution output |
"collections_search" | "collections_search_call_output" | Plaintext content chunks matching the search query |
"document_search" | "document_search_call_output" | Plaintext extracted document content |
"mcp" | "mcp_call_output" | Full plaintext MCP tool output |
You can also enable the tool output for the server-side tool calls when using Responses AI, the structures of the tool output are compatible with the equivalent standard server-side tools in Responses API.
| Tool | Responses API tool name | Value for include field in Responses API |
|---|---|---|
"web_search" | "web_search" | "web_search_call.action.sources" |
"code_execution" | "code_interpreter" | "code_interpreter_call.outputs" |
"collections_search" | "file_search" | "file_search_call.results" |
"mcp" | "mcp" | No need to include since it will be always returned in Responses API |
import os
from xai_sdk import Client
from xai_sdk.chat import user, CompletionOutputChunk
from xai_sdk.proto import chat_pb2
from xai_sdk.tools import code_execution
client = Client(api_key=os.getenv("XAI_API_KEY"))
chat = client.chat.create(
model="grok-4-1-fast", # reasoning model
tools=[
code_execution(),
],
include=["code_execution_call_output"],
)
chat.append(user("What is the 100th Fibonacci number?"))
# stream or sample the response...Find the Tool Output from Response
The the tool outputs from the specified tools will be included in the response from the server. And different APIs or SDKs will require different ways to process the tool output.
Using xAI Python SDK
The tool output content will be included in the output entries with the role is set to tool.
A complete tool call is also attached to each of the output entries that represents the tool output content, you can use that to check the details of the corresponding tool call.
Below is a sample of print the tool output during streaming.
Python
# previous steps to set up `chat`...
is_thinking = True
for response, chunk in chat.stream():
# View the server-side tool calls as they are being made in real-time
for tool_call in chunk.tool_calls:
print(f"\nCalling tool: {tool_call.function.name} with arguments: {tool_call.function.arguments}")
if response.usage.reasoning_tokens and is_thinking:
print(f"\rThinking... ({response.usage.reasoning_tokens} tokens)", end="", flush=True)
if chunk.content and is_thinking:
print("\n\nFinal Response:")
is_thinking = False
if chunk.content and not is_thinking:
print(chunk.content, end="", flush=True)
# View the tool output as they are being completed in real-time
for tool_output in chunk.tool_outputs:
if tool_output.content:
tool_call = tool_output.tool_calls[0]
print(f"\nTool '{tool_call.function.name}' completed with output: {tool_output.content}")
Using Responses API
In Responses API, different server-side tools will have different dedicated structures to host the tool arguments and the tool outputs,
e.g. web_search_call, code_interpreter_call, file_search_call, mcp_call. For more details, you will need to check the official Responses API specifications.
Here we use the code_interpreter as the example: scan through the response.output array and find the entries with the type of code_interpreter_call.
# previous step to generate the `response` ...
print(next(
output
for output in response.output
if output.type == "code_interpreter_call"
))Structured Outputs with Tools
You can also combine agent tool calling with structured outputs to get type-safe responses from tool-augmented queries.
For more details, consult the Structured Outputs with Tools section in the Structured Outputs guide.
Agentic Tool Calling Requirements and Limitations
Model Compatibility
- Supported Models:
grok-4,grok-4-fast,grok-4-fast-non-reasoning,grok-4-1-fast,grok-4-1-fast-non-reasoning - Strongly Recommended:
grok-4-1-fast- specifically trained to excel at agentic tool calling
Request Constraints
- No batch requests:
n > 1not supported - Limited sampling params: Only
temperatureandtop_pare respected
FAQ and Troubleshooting
I'm seeing empty or incorrect content when using agentic tool calling with the xAI Python SDK
Please make sure to upgrade to the latest version of the xAI SDK. Agentic tool calling requires version 1.3.1 or above.