Advanced API Usage

Batch API

View as Markdown

The Batch API lets you process large volumes of requests asynchronously with reduced pricing and higher rate limits. For pricing details, see Batch API Pricing.

What is the Batch API?

When you make a standard API call to Grok, you send a request and wait for an immediate response. This approach is perfect for interactive applications like chatbots, real-time assistants, or any use case where users are waiting for a response.

The Batch API takes a different approach. Instead of processing requests immediately, you submit them to a queue where they're processed in the background. You don't get an instant response—instead, you check back later to retrieve your results.

Key differences from real-time API requests:

Real-time APIBatch API
Response timeImmediate (seconds)Typically within 24 hours
CostStandard pricingReduced pricing (see details)
Rate limitsPer-minute limits applyRequests don't count towards rate limits
Use caseInteractive, real-timeBackground processing, bulk jobs

Processing time: Most batch requests complete within 24 hours, though processing time may vary depending on system load and batch size.

You can also create, monitor, and manage batches through the xAI Console. The Console provides a visual interface for tracking batch progress and viewing results.


When to use the Batch API

The Batch API is ideal when you don't need immediate results and want to reduce your API costs:

  • Running evaluations and benchmarks - Test model performance across thousands of prompts
  • Processing large datasets - Analyze customer feedback, classify support tickets, extract entities
  • Content moderation at scale - Review backlogs of user-generated content
  • Document summarization - Process reports, research papers, or legal documents in bulk
  • Data enrichment pipelines - Add AI-generated insights to database records
  • Scheduled overnight jobs - Generate daily reports or prepare data for dashboards

How it works

The Batch API workflow consists of four main steps:

  1. Create a batch - A batch is a container that groups related requests together
  2. Add requests - Submit your inference requests to the batch queue
  3. Monitor progress - Poll the batch status to track completion
  4. Retrieve results - Fetch responses for all processed requests

Let's walk through each step.


Step 1: Create a batch

A batch acts as a container for your requests. Think of it as a folder that groups related work together—you might create separate batches for different datasets, experiments, or job types.

When you create a batch, you receive a batch_id that you'll use to add requests and retrieve results.

from xai_sdk import Client

client = Client()

# Create a batch with a descriptive name
batch = client.batch.create(batch_name="customer_feedback_analysis")
print(f"Created batch: {batch.batch_id}")

# Store the batch_id for later use
batch_id = batch.batch_id

Step 2: Add requests to the batch

With your batch created, you can now add requests to it. Each request will be processed asynchronously.

With the xAI SDK, adding batch requests is simple: use chat.create() for text, image.prepare() for images, or video.prepare() for videos, then pass them as a list. You can also upload a JSONL file if you prefer.

Important: Assign a unique batch_request_id to each request. This ID lets you match results back to their original requests, which becomes important when you're processing hundreds or thousands of items. If you don't provide an ID, we generate a UUID for you. Using your own IDs is useful for idempotency (ensuring a request is only processed once) and for linking batch requests to records in your own system.

from xai_sdk import Client
from xai_sdk.chat import system, user
from xai_sdk.tools import web_search, x_search

client = Client()

batch_requests = []

# Chat completion with tools
chat = client.chat.create(
    model="grok-4.20-reasoning",
    batch_request_id="chat_001",
    tools=[web_search(), x_search()],
)
chat.append(system("Analyze market sentiment from recent news and posts."))
chat.append(user("What is the current sentiment around TSLA stock?"))
batch_requests.append(chat)

# Image generation
image_req = client.image.prepare(
    prompt="A sleek modern laptop on a minimalist desk",
    model="grok-imagine-image",
    batch_request_id="img_001",
)
batch_requests.append(image_req)

# Image edit
image_edit_req = client.image.prepare(
    prompt="Add a rainbow in the background",
    model="grok-imagine-image",
    image_url="https://picsum.photos/800",
    batch_request_id="img_edit_001",
)
batch_requests.append(image_edit_req)

# Video generation
video_req = client.video.prepare(
    prompt="A product rotating on a turntable with dramatic lighting",
    model="grok-imagine-video",
    batch_request_id="vid_001",
)
batch_requests.append(video_req)

# Video edit
video_edit_req = client.video.prepare(
    prompt="Make it slow motion",
    model="grok-imagine-video",
    video_url="https://lorem.video/cat_360p_3s",
    batch_request_id="vid_edit_001",
)
batch_requests.append(video_edit_req)

# Add all requests to the batch
client.batch.add(batch_id=batch.batch_id, batch_requests=batch_requests)
print(f"Added {len(batch_requests)} requests to batch")

Step 3: Monitor batch progress

After adding requests, they begin processing in the background. Since batch processing is asynchronous, you need to poll the batch status to know when results are ready.

The batch state includes counters for pending, successful, and failed requests. Poll periodically until num_pending reaches zero, which indicates all requests have been processed (either successfully or with errors).

import time
from xai_sdk import Client

client = Client()

# Poll until all requests are processed
print("Waiting for batch to complete...")
while True:
    batch = client.batch.get(batch_id=batch.batch_id)
    
    pending = batch.state.num_pending
    completed = batch.state.num_success + batch.state.num_error
    total = batch.state.num_requests
    
    print(f"Progress: {completed}/{total} complete, {pending} pending")
    
    if pending == 0:
        print("Batch processing complete!")
        break
    
    # Wait before polling again (avoid hammering the API)
    time.sleep(5)

Understanding batch states

The Batch API tracks state at two levels: the batch level and the individual request level.

Batch-level state shows aggregate progress across all requests in a given batch, accessible through the batch.state object returned by the client.batch.get() method:

CounterDescription
num_requestsTotal number of requests added to the batch
num_pendingRequests waiting to be processed
num_successRequests that completed successfully
num_errorRequests that failed with an error
num_cancelledRequests that were cancelled

When num_pending reaches zero, all requests have been processed (either successfully, with errors, or cancelled).

Individual request states describe where each request is in its lifecycle, accessible through the batch_request_metadata object returned by the client.batch.list_batch_requests() method:

StateDescription
pendingRequest is queued and waiting to be processed
succeededRequest completed successfully, result is available
failedRequest encountered an error during processing
cancelledRequest was cancelled (e.g., when the batch was cancelled before this request was processed)

Batch lifecycle: A batch can also be cancelled or expire. If you cancel a batch, pending requests won't be processed, but already-completed results remain available. Batches have an expiration time after which results are no longer accessible—check the expires_at field when retrieving batch details.


Step 4: Retrieve results

You can retrieve results at any time, even before the entire batch completes. Results are available as soon as individual requests finish processing, so you can start consuming completed results while other requests are still in progress.

Each result is linked to its original request via the batch_request_id you assigned earlier. For chat completions, use result.response which has the familiar fields: .content, .usage, .finish_reason, and more. For image requests, use result.image_response which provides .url, .base64, .usage, and .model. For video requests, use result.video_response which provides .url, .duration, .usage, and .model. These are the same response types returned by the regular client.image.sample() and client.video.generate() methods.

The SDK provides convenient .succeeded and .failed properties to separate successful responses from errors.

Pagination: Results are returned in pages. Use the limit parameter to control page size and pagination_token to fetch subsequent pages. When pagination_token is None, you've reached the end.

from xai_sdk import Client

client = Client()

# Paginate through all results
all_succeeded = []
all_failed = []
pagination_token = None

while True:
    # Fetch a page of results (limit controls page size)
    page = client.batch.list_batch_results(
        batch_id=batch.batch_id,
        limit=100,
        pagination_token=pagination_token,
    )
    
    # Collect results from this page
    all_succeeded.extend(page.succeeded)
    all_failed.extend(page.failed)
    
    # Check if there are more pages
    if page.pagination_token is None:
        break
    pagination_token = page.pagination_token

# Process results - handle different response types
print(f"Successfully processed: {len(all_succeeded)} requests")
for result in all_succeeded:
    rid = result.batch_request_id
    resp = result.proto.response

    if resp.HasField("completion_response"):
        # Chat completion response
        print(f"[{rid}] {result.response.content}")
        print(f"  Tokens used: {result.response.usage.total_tokens}")
    elif resp.HasField("image_response"):
        # Image generation response
        print(f"[{rid}] Image URL: {result.image_response.url}")
    elif resp.HasField("video_response"):
        # Video generation response
        print(f"[{rid}] Video URL: {result.video_response.url}")

if all_failed:
    print(f"\nFailed: {len(all_failed)} requests")
    for result in all_failed:
        print(f"[{result.batch_request_id}] Error: {result.error_message}")

Additional operations

Beyond the core workflow, the Batch API provides additional operations for managing your batches.

Cancel a batch

You can cancel a batch before all requests complete. Already-processed requests remain available in the results, but pending requests will not be processed. You cannot add more requests to a cancelled batch.

from xai_sdk import Client

client = Client()

# Cancel processing
cancelled_batch = client.batch.cancel(batch_id=batch.batch_id)
print(f"Cancelled batch: {cancelled_batch.batch_id}")
print(f"Completed before cancellation: {cancelled_batch.state.num_success} requests")

List all batches

View all batches belonging to your team. Batches are retained until they expire (check the expires_at field). This endpoint supports the same limit and pagination_token parameters for paginating through large lists.

from xai_sdk import Client

client = Client()

# List recent batches
response = client.batch.list(limit=20)

for batch in response.batches:
    status = "complete" if batch.state.num_pending == 0 else "processing"
    print(f"{batch.name} ({batch.batch_id}): {status}")

Check individual request status

For detailed tracking, you can inspect the metadata for each request in a batch. This shows the status, timing, and other details for individual requests. This endpoint supports the same limit and pagination_token parameters for paginating through large batches.

from xai_sdk import Client

client = Client()

# Get metadata for individual requests
metadata = client.batch.list_batch_requests(batch_id=batch.batch_id)

for request in metadata.batch_request_metadata:
    print(f"Request {request.batch_request_id}: {request.state}")

Track costs

Each batch tracks the total processing cost. Access the cost breakdown after processing to understand your spending. For pricing details, see Batch API Pricing on the Models and Pricing page.

from xai_sdk import Client

client = Client()

# Get batch with cost information
batch = client.batch.get(batch_id=batch.batch_id)

# Cost is returned in ticks (1e-10 USD) for precision
total_cost_usd = batch.cost_breakdown.total_cost_usd_ticks / 1e10
print("Total cost: $%.4f" % total_cost_usd)

Complete example

This end-to-end example demonstrates a realistic batch workflow: analyzing customer feedback at scale. It creates a batch, submits feedback items for sentiment analysis, waits for processing, and outputs the results. For simplicity, this example doesn't paginate results—see Step 4 for pagination when processing larger batches.

import time
from xai_sdk import Client
from xai_sdk.chat import system, user

client = Client()

# Sample dataset: customer feedback to analyze
feedback_data = [
    {"id": "fb_001", "text": "Absolutely love this product! Best purchase ever."},
    {"id": "fb_002", "text": "Delivery was late and the packaging was damaged."},
    {"id": "fb_003", "text": "Works fine, nothing special to report."},
    {"id": "fb_004", "text": "Customer support was incredibly helpful!"},
    {"id": "fb_005", "text": "The app keeps crashing on my phone."},
]

# Step 1: Create a batch
print("Creating batch...")
batch = client.batch.create(batch_name="feedback_sentiment_analysis")
print(f"Batch created: {batch.batch_id}")

# Step 2: Build and add requests
print("\nAdding requests...")
batch_requests = []
for item in feedback_data:
    chat = client.chat.create(
        model="grok-4.20-reasoning",
        batch_request_id=item["id"],
    )
    chat.append(system(
        "Analyze the sentiment of the customer feedback. "
        "Respond with exactly one word: positive, negative, or neutral."
    ))
    chat.append(user(item["text"]))
    batch_requests.append(chat)

client.batch.add(batch_id=batch.batch_id, batch_requests=batch_requests)
print(f"Added {len(batch_requests)} requests")

# Step 3: Wait for completion
print("\nProcessing...")
while True:
    batch = client.batch.get(batch_id=batch.batch_id)
    pending = batch.state.num_pending
    completed = batch.state.num_success + batch.state.num_error
    
    print(f"  {completed}/{batch.state.num_requests} complete")
    
    if pending == 0:
        break
    time.sleep(2)

# Step 4: Retrieve and display results
print("\n--- Results ---")
results = client.batch.list_batch_results(batch_id=batch.batch_id)

# Create a lookup for original feedback text
feedback_lookup = {item["id"]: item["text"] for item in feedback_data}

for result in results.succeeded:
    original_text = feedback_lookup.get(result.batch_request_id, "")
    sentiment = result.response.content.strip().lower()
    print(f"[{sentiment.upper()}] {original_text[:50]}...")

# Report any failures
if results.failed:
    print("\n--- Errors ---")
    for result in results.failed:
        print(f"[{result.batch_request_id}] {result.error_message}")

# Display cost
cost_usd = batch.cost_breakdown.total_cost_usd_ticks / 1e10
print("\nTotal cost: $%.4f" % cost_usd)

JSONL File Upload

As an alternative to adding requests via the SDK, you can create batches by uploading a JSONL file. This is useful when generating requests from scripts, pipelines, or external tools.

Each line in the file is a JSON object with four fields: custom_id (unique identifier, maps to batch_request_id), method (always "POST"), url (API endpoint path), and body (the JSON request payload matching the REST API reference for that endpoint).

JSON

{"custom_id": "chat-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "grok-4-1-fast-reasoning", "messages": [{"role": "user", "content": "Classify this as positive, negative, or neutral: The product exceeded my expectations!"}]}}
{"custom_id": "search-1", "method": "POST", "url": "/v1/responses", "body": {"model": "grok-4-1-fast-reasoning", "tools": [{"type": "web_search"}, {"type": "x_search"}], "input": [{"role": "user", "content": "What are the latest SpaceX launches?"}]}}
{"custom_id": "mcp-1", "method": "POST", "url": "/v1/responses", "body": {"model": "grok-4-1-fast-reasoning", "tools": [{"type": "mcp", "server_label": "deepwiki", "server_url": "https://mcp.deepwiki.com/mcp"}], "input": [{"role": "user", "content": "What does the xai-sdk-python repo do?"}]}}
{"custom_id": "img-1", "method": "POST", "url": "/v1/images/generations", "body": {"model": "grok-imagine-image", "prompt": "A futuristic city skyline at sunset"}}
{"custom_id": "img-edit-1", "method": "POST", "url": "/v1/images/edits", "body": {"model": "grok-imagine-image", "prompt": "Add a rainbow", "image": {"url": "https://picsum.photos/800"}}}
{"custom_id": "vid-1", "method": "POST", "url": "/v1/videos/generations", "body": {"model": "grok-imagine-video", "prompt": "A rocket launching from Mars", "duration": 8}}
{"custom_id": "vid-edit-1", "method": "POST", "url": "/v1/videos/edits", "body": {"model": "grok-imagine-video", "prompt": "Make it slow motion", "video": {"url": "https://lorem.video/cat_360p_3s"}}}

You can mix different endpoints in the same file. Each request is routed independently.

Supported url values:

URLDescription
/v1/chat/completionsChat completions
/v1/responsesModel responses
/v1/images/generationsImage generation
/v1/images/editsImage editing
/v1/videos/generations or /v1/videosVideo generation
/v1/videos/editsVideo editing

Upload the file via the Files API, then create a batch referencing it:

from xai_sdk import Client

client = Client()

# Upload the JSONL file
file = client.files.upload(
    file=open("batch_requests.jsonl", "rb"),
)

# Create a batch with the file ID
batch = client.batch.create(
    batch_name="sentiment_analysis",
    input_file_id=file.id,
)
print(f"Created batch: {batch.batch_id}")

The file is processed asynchronously in the background. If any line is invalid, the batch is cancelled with an error message. Monitor progress and retrieve results the same way as inline batches.

File-based batches are sealed after creation — you cannot add more requests via AddBatchRequests. Maximum file size is 200 MB with up to 50,000 requests. Each custom_id must be unique within the file.


Limitations

Batches

  • A team can have an unlimited number of batches.
  • Maximum batch creation rate: 2 batch creations per second per team.

Batch Requests

  • A batch can contain an unlimited number of requests in theory, but extremely large batches (>1,000,000 requests) may be throttled for processing stability.
  • Each individual request that can be added to a batch has a maximum payload size of 25MB.
  • A team can send up to 1000 add-batch-requests API calls every 30 seconds (this is a rolling limit shared across all batches in the team).
  • Image and video results contain signed URLs that expire after 1 hour. Download the media promptly after retrieving results.

Tool Use

Both server-side tools and client-side function tools are supported in batch requests.

  • Server-side tools (web search, code execution, MCP, etc.) work the same as in the real-time API — they are executed during processing and the final response is returned.
  • Client-side function tools are supported: the model returns tool_calls in the response for you to handle offline. Multi-turn tool calling requires submitting a new batch request with the tool result messages included in the conversation.


Did you find this page helpful?