Tools

Streaming & Synchronous Requests

Agentic requests can be executed in either streaming or synchronous mode. This page covers both approaches and how to use them effectively.


We strongly recommend using streaming mode when using agentic tool calling. It provides:

  • Real-time observability of tool calls as they happen
  • Immediate feedback during potentially long-running requests
  • Reasoning token counts as the model thinks

Streaming Example

import os

from xai_sdk import Client
from xai_sdk.chat import user
from xai_sdk.tools import code_execution, web_search, x_search

client = Client(api_key=os.getenv("XAI_API_KEY"))
chat = client.chat.create(
    model="grok-4-1-fast-reasoning",
    tools=[
        web_search(),
        x_search(),
        code_execution(),
    ],
    include=["verbose_streaming"],
)

chat.append(user("What are the latest updates from xAI?"))

is_thinking = True
for response, chunk in chat.stream():
    # View server-side tool calls in real-time
    for tool_call in chunk.tool_calls:
        print(f"\nCalling tool: {tool_call.function.name}")
    if response.usage.reasoning_tokens and is_thinking:
        print(f"\rThinking... ({response.usage.reasoning_tokens} tokens)", end="", flush=True)
    if chunk.content and is_thinking:
        print("\n\nFinal Response:")
        is_thinking = False
    if chunk.content and not is_thinking:
        print(chunk.content, end="", flush=True)

print("\nCitations:", response.citations)

Synchronous Mode

For simpler use cases or when you want to wait for the complete agentic workflow to finish before processing the response, you can use synchronous requests:

import os

from xai_sdk import Client
from xai_sdk.chat import user
from xai_sdk.tools import code_execution, web_search, x_search

client = Client(api_key=os.getenv("XAI_API_KEY"))
chat = client.chat.create(
    model="grok-4-1-fast-reasoning",
    tools=[
        web_search(),
        x_search(),
        code_execution(),
    ],
)

chat.append(user("What is the latest update from xAI?"))

# Get the final response in one go once it's ready
response = chat.sample()

print("Final Response:")
print(response.content)

print("\nCitations:")
print(response.citations)

print("\nUsage:")
print(response.usage)
print(response.server_side_tool_usage)

Synchronous requests will wait for the entire agentic process to complete before returning. This is simpler for basic use cases but provides less visibility into intermediate steps.


Using Tools with Responses API

We also support using the Responses API in both streaming and non-streaming modes:

import os

from xai_sdk import Client
from xai_sdk.chat import user
from xai_sdk.tools import web_search, x_search

client = Client(api_key=os.getenv("XAI_API_KEY"))
chat = client.chat.create(
    model="grok-4-1-fast-reasoning",
    store_messages=True,  # Enable Responses API
    tools=[
        web_search(),
        x_search(),
    ],
)

chat.append(user("What is the latest update from xAI?"))
response = chat.sample()

print(response.content)
print(response.citations)

# The response id can be used to continue the conversation
print(response.id)

Accessing Tool Outputs

By default, server-side tool call outputs are not returned since they can be large. However, you can opt-in to receive them:

xAI SDK

ToolValue for include field
"web_search""web_search_call_output"
"x_search""x_search_call_output"
"code_execution""code_execution_call_output"
"collections_search""collections_search_call_output"
"attachment_search""attachment_search_call_output"
"mcp""mcp_call_output"

Python

import os
from xai_sdk import Client
from xai_sdk.chat import user
from xai_sdk.tools import code_execution

client = Client(api_key=os.getenv("XAI_API_KEY"))
chat = client.chat.create(
    model="grok-4-1-fast-reasoning",
    tools=[
        code_execution(),
    ],
    include=["code_execution_call_output"],
)
chat.append(user("What is the 100th Fibonacci number?"))

# stream or sample the response...

Responses API

ToolResponses API tool nameValue for include field
"web_search""web_search""web_search_call.action.sources"
"code_execution""code_interpreter""code_interpreter_call.outputs"
"collections_search""file_search""file_search_call.results"
"mcp""mcp"Always returned in Responses API