Advanced API Usage

Batch API

The Batch API lets you process large volumes of requests asynchronously at 50% off compared to real-time API calls.

What is the Batch API?

When you make a standard API call to Grok, you send a request and wait for an immediate response. This approach is perfect for interactive applications like chatbots, real-time assistants, or any use case where users are waiting for a response.

The Batch API takes a different approach. Instead of processing requests immediately, you submit them to a queue where they're processed in the background. You don't get an instant response—instead, you check back later to retrieve your results.

Key differences from real-time API requests:

Real-time APIBatch API
Response timeImmediate (seconds)Typically within 24 hours
CostStandard pricing50% off standard pricing
Rate limitsPer-minute limits applyRequests don't count towards rate limits
Use caseInteractive, real-timeBackground processing, bulk jobs

Processing time: Most batch requests complete within 24 hours, though processing time may vary depending on system load and batch size.

You can also create, monitor, and manage batches through the xAI Console. The Console provides a visual interface for tracking batch progress and viewing results.


When to use the Batch API

The Batch API is ideal when you don't need immediate results and want to cut your API costs in half:

  • Running evaluations and benchmarks - Test model performance across thousands of prompts
  • Processing large datasets - Analyze customer feedback, classify support tickets, extract entities
  • Content moderation at scale - Review backlogs of user-generated content
  • Document summarization - Process reports, research papers, or legal documents in bulk
  • Data enrichment pipelines - Add AI-generated insights to database records
  • Scheduled overnight jobs - Generate daily reports or prepare data for dashboards

How it works

The Batch API workflow consists of four main steps:

  1. Create a batch - A batch is a container that groups related requests together
  2. Add requests - Submit your inference requests to the batch queue
  3. Monitor progress - Poll the batch status to track completion
  4. Retrieve results - Fetch responses for all processed requests

Let's walk through each step.


Step 1: Create a batch

A batch acts as a container for your requests. Think of it as a folder that groups related work together—you might create separate batches for different datasets, experiments, or job types.

When you create a batch, you receive a batch_id that you'll use to add requests and retrieve results.

from xai_sdk import Client

client = Client()

# Create a batch with a descriptive name
batch = client.batch.create(batch_name="customer_feedback_analysis")
print(f"Created batch: {batch.batch_id}")

# Store the batch_id for later use
batch_id = batch.batch_id

Step 2: Add requests to the batch

With your batch created, you can now add requests to it. Each request is a standard chat completion that will be processed asynchronously.

With the xAI SDK, adding batch requests is simple: create Chat objects the same way you would for regular chat completions, then pass them as a list. You don't need to construct JSONL files or deal with complex request formats. Just use the familiar chat.create() and chat.append() pattern you already know.

Important: Assign a unique batch_request_id to each request. This ID lets you match results back to their original requests, which becomes important when you're processing hundreds or thousands of items. If you don't provide an ID, we generate a UUID for you. Using your own IDs is useful for idempotency (ensuring a request is only processed once) and for linking batch requests to records in your own system.

from xai_sdk import Client
from xai_sdk.chat import system, user

client = Client()

# Sample data to process
feedback_items = [
    {"id": "feedback_001", "text": "The product exceeded my expectations!"},
    {"id": "feedback_002", "text": "Shipping took way too long."},
    {"id": "feedback_003", "text": "It works as described, nothing special."},
]

# Build batch requests using familiar Chat objects
batch_requests = []
for item in feedback_items:
    # Create a Chat exactly like you would for a regular request
    chat = client.chat.create(
        model="grok-4-1-fast-reasoning",
        batch_request_id=item["id"],  # Add an ID to track this request
    )
    # Append messages the same way as always
    chat.append(system("Classify the sentiment as positive, negative, or neutral."))
    chat.append(user(item["text"]))
    
    batch_requests.append(chat)

# Pass the list of Chat objects to the batch
client.batch.add(batch_id=batch.batch_id, batch_requests=batch_requests)
print(f"Added {len(batch_requests)} requests to batch")

Step 3: Monitor batch progress

After adding requests, they begin processing in the background. Since batch processing is asynchronous, you need to poll the batch status to know when results are ready.

The batch state includes counters for pending, successful, and failed requests. Poll periodically until num_pending reaches zero, which indicates all requests have been processed (either successfully or with errors).

import time
from xai_sdk import Client

client = Client()

# Poll until all requests are processed
print("Waiting for batch to complete...")
while True:
    batch = client.batch.get(batch_id=batch.batch_id)
    
    pending = batch.state.num_pending
    completed = batch.state.num_success + batch.state.num_error
    total = batch.state.num_requests
    
    print(f"Progress: {completed}/{total} complete, {pending} pending")
    
    if pending == 0:
        print("Batch processing complete!")
        break
    
    # Wait before polling again (avoid hammering the API)
    time.sleep(5)

Understanding batch states

The Batch API tracks state at two levels: the batch level and the individual request level.

Batch-level state shows aggregate progress across all requests in a given batch, accessible through the batch.state object returned by the client.batch.get() method:

CounterDescription
num_requestsTotal number of requests added to the batch
num_pendingRequests waiting to be processed
num_successRequests that completed successfully
num_errorRequests that failed with an error
num_cancelledRequests that were cancelled

When num_pending reaches zero, all requests have been processed (either successfully, with errors, or cancelled).

Individual request states describe where each request is in its lifecycle, accessible through the batch_request_metadata object returned by the client.batch.list_batch_requests() method:

StateDescription
pendingRequest is queued and waiting to be processed
succeededRequest completed successfully, result is available
failedRequest encountered an error during processing
cancelledRequest was cancelled (e.g., when the batch was cancelled before this request was processed)

Batch lifecycle: A batch can also be cancelled or expire. If you cancel a batch, pending requests won't be processed, but already-completed results remain available. Batches have an expiration time after which results are no longer accessible—check the expires_at field when retrieving batch details.


Step 4: Retrieve results

You can retrieve results at any time, even before the entire batch completes. Results are available as soon as individual requests finish processing, so you can start consuming completed results while other requests are still in progress.

Each result is linked to its original request via the batch_request_id you assigned earlier. The result.response object is the same SDK Response you'd get from a regular chat completion, with all the familiar fields: .content, .usage, .finish_reason, and more.

The SDK provides convenient .succeeded and .failed properties to separate successful responses from errors.

Pagination: Results are returned in pages. Use the limit parameter to control page size and pagination_token to fetch subsequent pages. When pagination_token is None, you've reached the end.

from xai_sdk import Client

client = Client()

# Paginate through all results
all_succeeded = []
all_failed = []
pagination_token = None

while True:
    # Fetch a page of results (limit controls page size)
    page = client.batch.list_batch_results(
        batch_id=batch.batch_id,
        limit=100,
        pagination_token=pagination_token,
    )
    
    # Collect results from this page
    all_succeeded.extend(page.succeeded)
    all_failed.extend(page.failed)
    
    # Check if there are more pages
    if page.pagination_token is None:
        break
    pagination_token = page.pagination_token

# Process all results
print(f"Successfully processed: {len(all_succeeded)} requests")
for result in all_succeeded:
    # Access the full Response object
    print(f"[{result.batch_request_id}] {result.response.content}")
    print(f"  Tokens used: {result.response.usage.total_tokens}")

if all_failed:
    print(f"\nFailed: {len(all_failed)} requests")
    for result in all_failed:
        print(f"[{result.batch_request_id}] Error: {result.error_message}")

Additional operations

Beyond the core workflow, the Batch API provides additional operations for managing your batches.

Cancel a batch

You can cancel a batch before all requests complete. Already-processed requests remain available in the results, but pending requests will not be processed. You cannot add more requests to a cancelled batch.

from xai_sdk import Client

client = Client()

# Cancel processing
cancelled_batch = client.batch.cancel(batch_id=batch.batch_id)
print(f"Cancelled batch: {cancelled_batch.batch_id}")
print(f"Completed before cancellation: {cancelled_batch.state.num_success} requests")

List all batches

View all batches belonging to your team. Batches are retained until they expire (check the expires_at field). This endpoint supports the same limit and pagination_token parameters for paginating through large lists.

from xai_sdk import Client

client = Client()

# List recent batches
response = client.batch.list(limit=20)

for batch in response.batches:
    status = "complete" if batch.state.num_pending == 0 else "processing"
    print(f"{batch.name} ({batch.batch_id}): {status}")

Check individual request status

For detailed tracking, you can inspect the metadata for each request in a batch. This shows the status, timing, and other details for individual requests. This endpoint supports the same limit and pagination_token parameters for paginating through large batches.

from xai_sdk import Client

client = Client()

# Get metadata for individual requests
metadata = client.batch.list_batch_requests(batch_id=batch.batch_id)

for request in metadata.batch_request_metadata:
    print(f"Request {request.batch_request_id}: {request.state}")

Track costs

Each batch tracks the total processing cost. Access the cost breakdown after processing to understand your spending. Batch requests are billed at 50% of standard API pricing, so you'll see significant savings compared to real-time requests.

Python

from xai_sdk import Client

client = Client()

# Get batch with cost information
batch = client.batch.get(batch_id=batch.batch_id)

# Cost is returned in ticks (1e-10 USD) for precision
total_cost_usd = batch.cost_breakdown.total_cost_usd_ticks / 1e10
print("Total cost: $%.4f" % total_cost_usd)

Complete example

This end-to-end example demonstrates a realistic batch workflow: analyzing customer feedback at scale. It creates a batch, submits feedback items for sentiment analysis, waits for processing, and outputs the results. For simplicity, this example doesn't paginate results—see Step 4 for pagination when processing larger batches.

Python

import time
from xai_sdk import Client
from xai_sdk.chat import system, user

client = Client()

# Sample dataset: customer feedback to analyze
feedback_data = [
    {"id": "fb_001", "text": "Absolutely love this product! Best purchase ever."},
    {"id": "fb_002", "text": "Delivery was late and the packaging was damaged."},
    {"id": "fb_003", "text": "Works fine, nothing special to report."},
    {"id": "fb_004", "text": "Customer support was incredibly helpful!"},
    {"id": "fb_005", "text": "The app keeps crashing on my phone."},
]

# Step 1: Create a batch
print("Creating batch...")
batch = client.batch.create(batch_name="feedback_sentiment_analysis")
print(f"Batch created: {batch.batch_id}")

# Step 2: Build and add requests
print("\nAdding requests...")
batch_requests = []
for item in feedback_data:
    chat = client.chat.create(
        model="grok-4-1-fast-reasoning",
        batch_request_id=item["id"],
    )
    chat.append(system(
        "Analyze the sentiment of the customer feedback. "
        "Respond with exactly one word: positive, negative, or neutral."
    ))
    chat.append(user(item["text"]))
    batch_requests.append(chat)

client.batch.add(batch_id=batch.batch_id, batch_requests=batch_requests)
print(f"Added {len(batch_requests)} requests")

# Step 3: Wait for completion
print("\nProcessing...")
while True:
    batch = client.batch.get(batch_id=batch.batch_id)
    pending = batch.state.num_pending
    completed = batch.state.num_success + batch.state.num_error
    
    print(f"  {completed}/{batch.state.num_requests} complete")
    
    if pending == 0:
        break
    time.sleep(2)

# Step 4: Retrieve and display results
print("\n--- Results ---")
results = client.batch.list_batch_results(batch_id=batch.batch_id)

# Create a lookup for original feedback text
feedback_lookup = {item["id"]: item["text"] for item in feedback_data}

for result in results.succeeded:
    original_text = feedback_lookup.get(result.batch_request_id, "")
    sentiment = result.response.content.strip().lower()
    print(f"[{sentiment.upper()}] {original_text[:50]}...")

# Report any failures
if results.failed:
    print("\n--- Errors ---")
    for result in results.failed:
        print(f"[{result.batch_request_id}] {result.error_message}")

# Display cost
cost_usd = batch.cost_breakdown.total_cost_usd_ticks / 1e10
print("\nTotal cost: $%.4f" % cost_usd)

Limitations

Batches

  • A team can have an unlimited number of batches.
  • Maximum batch creation rate: 1 batch creation per second per team.

Batch Requests

  • A batch can contain an unlimited number of requests in theory, but extremely large batches (>1,000,000 requests) may be throttled for processing stability.
  • Each individual request that can be added to a batch has a maximum payload size of 25MB.
  • A team can send up to 100 add-batch-requests API calls every 30 seconds (this is a rolling limit shared across all batches in the team).

Unsupported Features

  • Agentic requests using server-side tools (such as web search, code execution, or MCP tools) are not supported in batch requests.
  • Client-side tools (function calling) are not supported in batch requests.