Guides

Batch API

If you have services that need to use xAI API but do not require real-time results, you can use the Batch API to lower costs and increase your rate limit utilization.

When you submit an inference request via the Batch API, it is added to a job queue and processed asynchronously. This approach prevents immediate processing, allowing requests to be distributed over time to stay within your rate limits, avoiding errors and maximizing efficiency.

For example, you might want to use the Batch API for scenarios like:

Processing large datasets for AI model inference, such as analyzing text or generating embeddings for machine learning tasks.
Running periodic jobs, like summarizing reports or generating insights from historical data.
Handling bulk requests, such as translating multiple documents or generating responses for a large set of queries.

A batch is a container that can hold multiple batch requests. Each batch request contains a completions request that will be processed by our Batch API and returned when you list batch results.

You can view more details about making requests on our gRPC or REST API reference.

Limitations

Batches

A team can have an unlimited number of batches (subject to fair usage).
Maximum batch creation rate: 1 batch creation per second per team.

Batch Requests

A batch can contain an unlimited number of requests in theory, but extremely large batches (>100,000 requests) may be throttled for processing stability.
Each request to add batch requests has a maximum payload size of 25MB.
A team can send up to 100 add-batch-requests API calls every 30 seconds (this is a rolling limit shared across all batches in the team).

Limitations

Batches

A team can have an unlimited number of batches (subject to fair usage).
Maximum batch creation rate: 1 batch creation per second per team.

Batch Requests

A batch can contain an unlimited number of requests in theory, but extremely large batches (>100,000 requests) may be throttled for processing stability.
Each request to add batch requests has a maximum payload size of 25MB.
A team can send up to 100 add-batch-requests API calls every 30 seconds (this is a rolling limit shared across all batches in the team).

Create a new batch

You need to create a new batch before adding batch requests to it.

from xai_sdk import Client


client = Client()

# Create a new batch
batch = client.batch.create(batch_name="my_new_batch")
print(f"Created new batch: {batch}")

Add batch requests to the batch

Once you have created a new batch, we can add batch requests to the batch for processing. The requests will be queued for processing immediately after you have sent the requests.

# Following the added batch above in "create a new batch"

from xai_sdk.chat import user

# Create a new chat instance
chat = client.chat.create("grok-4")
chat.append(user("Hello!"))

# Add the batch request to the batch
client.batch.add(batch_id=batch.batch_id, batch_requests=[chat])

Cancel a batch

You may want to cancel the processing of a batch after submitting the requests. You can cancel the processing before all of the requests in the batch have finished processing.

You can still see cancelled batches when you list all batches, or access its results. You cannot add more requests to the batch once you have cancelled it.

# After creating a batch

# Cancel the batch with the provided ID and return the cancelled batch.
cancelled_batch = client.batch.cancel(batch.batch_id)
print(cancelled_batch)

List batch results

After the processing has started, you can retrieve the results from the batch.

# After creating a batch and adding requests to it

# List results with page size 10
batch_results_response = client.batch.list_batch_results(batch.batch_id, 10)

print(batch_results_response.results)

# Fetch the next page
if batch_results_response.pagination_token:
    batch_results_response = client.batch.list_batch_results(batch.batch_id, 10, batch_results_response.pagination_token)
    print(batch_results_response.results)

List all batches of a team

You can list all batches that belong to your team, as long as they have not expired. You can find the expiration time of a batch when you create a batch, list the batches or cancel a batch.

# Returns a responses page with batches and pagination_token
list_batches_response = client.batch.list(10)

print(list_batches_response.batches)

# Fetch the next page
if list_batches_response.pagination_token:
    list_batches_response = client.batch.list(10, pagination_token=list_batches_response.pagination_token)
    print(list_batches_response.batches)

Get information about a specific batch (Retrieve batch metadata)

You can also get granular metadata details of each batch request within a batch.

metadata_response = client.batch.list_batch_requests(batch.batch_id, 10)

print(metadata_response.batch_request_metadata)

if metadata_response.pagination_token:
    metadata_response = client.batch.list_batch_requests(batch.batch_id, 10, metadata_response.pagination_token)
    print(metadata_response.batch_request_metadata)