===/overview=== # Welcome to xAI Documentation ## Get started ## Quick start with the API ===/console/billing=== #### Key Information # Manage Billing **Ensure you are in the desired team before changing billing information. Changes made to a team will affect all users in that team.** There are two billing options: * **Prepaid credits:** Pre-purchase credits for your team. API consumption will be deducted from this credit balance. * **Monthly invoiced billing:** Receive an invoice for your API consumption at the end of the month. If you don't have sufficient prepaid credits, your default payment method will be charged. **Monthly invoiced billing is disabled by default.** To request this, contact sales@x.ai, or use the contact link on the [Billing](https://console.x.ai/team/default/billing) page: ## Prepaid credits This is the most common way to use the API, and allows you to control spending by purchasing credits in advance. Your usage can then be monitored on the [Usage explorer](https://console.x.ai/team/default/usage) page. Purchase credits via [Billing -> API spend management](https://console.x.ai/team/default/billing). From here you can also view your credit balance, and use a promo code if you have one. Note: When you make the purchase via bank transfer instead of credit card, the payment will take 2-3 business days to process. You will be granted credits after the process has completed. Currently you can only purchase prepaid credits via Guest Checkout due to regulatory requirements. ### Auto top-up Auto top-ups automatically purchase more API credits when your balance drops below a set threshold. We recommend enabling this to avoid service interruptions. This can be disabled at any time. You can configure: * The **credit balance** your team needs to drop to in order to trigger a top-up. * The **top-up amount** of credits that will be purchased (minimum $25). * The **maximum total value** of top-ups that are allowed per **month**. There is a limit of to avoid unexpectedly large spend. Please ensure the amount per top-up and total top-ups values are sufficient for your monthly usage. Warnings are shown on the API spend management card when you're close to a spending limit: * When you’ve used **80% of the total monthly limit** that you set. * When you only have **1 of the 5 top-ups per 24 hours** left. ## Monthly invoiced billing and invoiced billing limit Enterprise customers might find it beneficial to enroll in monthly invoiced billing to avoid disruption to their services. When you have set a **$0 invoiced billing limit** (default), xAI will only use your available prepaid credits. **Your API requests will be automatically rejected once your prepaid credits are depleted.** If you want to use monthly billing, you can **increase your invoiced billing limit** on [Billing -> API Credits](https://console.x.ai/team/default/billing) page. xAI will attempt to use your prepaid credits first, and the remaining amount will be charged to your default payment method at the end of the month. This ensures you won't experience interruption while consuming the API. Once your monthly invoiced billing amount has reached the invoiced billing limit, you won't be able to get response until you have raised the invoiced billing limit. ## Saving payment method When you make a purchase, we automatically keep it on file to make your next purchase easier. You can also manually add payment method on xAI Console [Billing -> Billing details -> Add Payment Information](https://console.x.ai/team/default/billing). Currently we don't allow user to remove the last payment method on file. There might be changes in the future. ## Invoices You can view your invoices for prepaid credits and monthly invoices on [Billing -> Invoices](https://console.x.ai/team/default/billing/invoices). ## Billing address and tax information Enter your billing information carefully, as it will appear on your invoices. We are not able to regenerate the invoices at the moment. Your billing address and tax information will be displayed on the invoice. On [Billing -> Payment](https://console.x.ai/team/default/billing), you can also add/change your billing address. When you add/change billing address, you can optionally add your organization's tax information. ===/console/collections=== #### Guides # Using Collections in Console This guide walks you through managing collections using the [xAI Console](https://console.x.ai) interface. ## Creating a new collection Navigate to the **Collections** tab in the [xAI Console](https://console.x.ai). Make sure you are in the correct team. Click on "Create new collection" to create a new `collection`. You can choose to enable generate embeddings on document upload or not. We recommend leaving the generate embeddings setting to on. ## Viewing and editing collection configuration You can view and edit the Collection's configuration by clicking on Edit Collection. This opens up the following modal where you can view the configuration and make changes. ## Adding a document to the collection Once you have created the new `collection`, you can click on it in the collections table to view the `documents` included in the `collection`. Click on "Upload document" to upload a new `document`. Once the upload has completed, each document is given a File ID. You can view the File ID, Collection ID and hash of the `document` by clicking on the `document` in the documents table. ## Deleting documents and collections You can delete `documents` and `collections` by clicking on the more button on the right side of the collections or documents table. ===/console/faq/accounts=== #### FAQ # Accounts ## How do I create an account for the API? You can create an account at https://accounts.x.ai, or https://console.x.ai. To link your X account automatically to your xAI account, choose to sign up with X account. You can create multiple accounts of different sign-in methods with the same email. When you sign-up with a sign-in method and with the same email, we will prompt you whether you want to create a new account, or link to the existing account. We will not be able to merge the content, subscriptions, etc. of different accounts. ## How do I update my xAI account email? You can visit [xAI Accounts](https://accounts.x.ai). On the Account page, you can update your email. ## How do I add other sign-in methods? Once you have signed-up for an account, you can add additional sign-in methods by going to [xAI Accounts](https://accounts.x.ai). ## I've forgotten my Multi-Factor Authentication (MFA) method, can you remove it? You can generate your recovery codes at [xAI Accounts](https://accounts.x.ai) Security page. We can't remove or reset your MFA method unless you have recovery codes due to security considerations. Please reach out to support@x.ai if you would like to delete the account instead. ## If I already have an account for Grok, can I use the same account for API access? Yes, the account is shared between Grok and xAI API. You can manage the sign-in details at https://accounts.x.ai. However, the billing is separate for Grok and xAI API. You can manage your billing for xAI API on [xAI Console](https://console.x.ai). To manage billing for Grok, visit https://grok.com -> Settings -> Billing, or directly with Apple/Google if you made the purchase via Apple App Store or Google Play. ## How do I manage my account? You can visit [xAI Accounts](https://accounts.x.ai) to manage your account. Please note the xAI account is different from the X account, and xAI cannot assist you with X account issues. Please contact X via [X Help Center](https://help.x.com/) or Premium Support if you encounters any issues with your X account. ## I received an email of someone logging into my xAI account xAI will send an email to you when someone logs into your xAI account. The login location is an approximation based on your IP address, which is dependent on your network setup and ISP and might not reflect exactly where the login happened. If you think the login is not you, please [reset your password](https://accounts.x.ai/request-reset-password) and [clear your login sessions](https://accounts.x.ai/sessions). We also recommend all users to [add a multi-factor authentication method](https://accounts.x.ai/security). ## How do I delete my xAI account? We are sorry to see you go! You can visit [xAI Accounts](https://accounts.x.ai/account) to delete your account. You can restore your account after log in again and confirming restoration within 30 days. You can cancel the deletion within 30 days by logging in again to any xAI websites and follow the prompt to confirm restoring the account. For privacy requests, please go to: https://privacy.x.ai. ===/console/faq/billing=== #### FAQ # Billing ## I'm having payment issues with an Indian payment card Unfortunately we cannot process Indian payment cards for our API service. We are working toward supporting it but you might want to consider using a third-party API in the meantime. As Grok Website and Apps' payments are handled differently, those are not affected. ## When will I be charged? * Prepaid Credits: If you choose to use prepaid credits, you’ll be charged when you buy them. These credits will be assigned to the team you select during purchase. * Monthly Invoiced Billing: If you set your [invoiced spending limit](/console/billing#monthly-invoiced-billing-and-invoiced-billing-limit) above $0, any usage beyond your prepaid credits will be charged at the end of the month. * API Usage: When you make API requests, the cost is calculated immediately. The amount is either deducted from your available prepaid credits or added to your monthly invoice if credits are exhausted. If you change your [invoiced spending limit](/console/billing#monthly-invoiced-billing-and-invoiced-billing-limit) to be greater than $0, you will be charged at the end of the month for any extra consumption after your prepaid credit on the team has run out. Your API consumption will be calculated when making the requests, and the corresponding amount will be deducted from your remaining credits or added to your monthly invoice. Check out [Billing](/console/billing) for more information. ## Can you retroactively generate an invoice with new billing information? We are unable to retroactively generate an invoice. Please ensure your billing information is correct on [xAI Console](https://console.x.ai) Billing -> Payment. ## Can prepaid API credits be refunded? Unfortunately, we are not able to offer refunds on any prepaid credit purchase unless in regions required by law. For details, please visit https://x.ai/legal/terms-of-service-enterprise. ### My prompt token consumption from the API is different from the token count I get from xAI Console Tokenizer or tokenize text endpoint The inference endpoints add pre-defined tokens to help us process the request. Therefore, these tokens would be added to the total prompt token consumption. For more information, see: [Estimating consumption with tokenizer on xAI Console or Estimating consumption with tokenizer on xAI Console or through API](/developers/rate-limits#estimating-consumption-with-tokenizer-on-xai-console-or-through-api). ===/console/faq/security=== #### FAQ # Security ## Does xAI train on customers' API requests? xAI never trains on your API inputs or outputs without your explicit permission. API requests and responses are temporarily stored on our servers for 30 days in case they need to be audited for potential abuse or misuse. This data is automatically deleted after 30 days. ## Is the xAI API HIPAA compliant? To inquire about a Business Associate Agreement (BAA), please complete our [BAA Questionnaire](https://forms.gle/YAEdX3XUp6MvdEXW9). A member of our team will review your responses and reach out with next steps. ## Is xAI GDPR and SOC II compliant? We are SOC 2 Type 2 compliant. Customers with a signed NDA can refer to our [Trust Center](https://trust.x.ai/) for up-to-date information on our certifications and data governance. ## Do you have Audit Logs? Team admins are able to view an audit log of user interactions. This lists all of the user interactions with our API server. You can view it at [xAI Console -> Audit Log](https://console.x.ai/team/default/audit). The admin can also search by Event ID, Description or User to filter the results shown. For example, this is to filter by description matching `ListApiKeys`: You can also view the audit log across a range of dates with the time filter: ## How can I securely manage my API keys? Treat your xAI API keys as sensitive information, like passwords or credit card details. Do not share keys between teammates to avoid unauthorized access. Store keys securely using environment variables or secret management tools. Avoid committing keys to public repositories or source code. Rotate keys regularly for added security. If you suspect a compromise, log into the xAI console first. Ensure you are viewing the correct team, as API keys are tied to specific teams. Navigate to the "API Keys" section via the sidebar. In the API Keys table, click the vertical ellipsis (three dots) next to the key. Select "Disable key" to deactivate it temporarily or "Delete key" to remove it permanently. Then, click the "Create API Key" button to generate a new one and update your applications. xAI partners with GitHub's Secret Scanning program to detect leaked keys. If a leak is found, we disable the key and notify you via email. Monitor your account for unusual activity to stay protected. ===/console/usage=== #### Key Information # Usage Explorer Sometimes as a team admin, you might want to monitor the API consumption, either to track spending, or to detect anomalies. xAI Console provides an easy-to-use [Usage Explorer](https://console.x.ai/team/default/usage) for team admins to track API usage across API keys, models, etc. ## Basic usage [Usage Explorer](https://console.x.ai/team/default/usage) page provides intuitive dropdown menus for you to customize how you want to view the consumptions. For example, you can view your daily credit consumption with `Granularity: Daily`: By default, the usage is calculated by cost in USD. You can select Dimension -> Tokens or Dimension -> Billing items to change the dimension to token count or billing item count. You can also see the usage with grouping. This way, you can easily compare the consumption across groups. In this case, we are trying to compare consumptions across test and production API keys, so we select `Group by: API Key`: ## Filters The basic usage should suffice if you are only viewing general information. However, you can also use filters to conditionally display information. The filters dropdown gives you the options to filter by a particular API key, a model, a request IP, a cluster, or the token type. ===/developers/advanced-api-usage/async=== #### Advanced API Usage # Asynchronous Requests When working with the xAI API, you may need to process hundreds or even thousands of requests. Sending these requests sequentially can be extremely time-consuming. To improve efficiency, you can use `AsyncClient` from `xai_sdk` or `AsyncOpenAI` from `openai`, which allows you to send multiple requests concurrently. The example below is a Python script demonstrating how to use `AsyncClient` to batch and process requests asynchronously, significantly reducing the overall execution time: You can also use our Batch API to queue the requests and fetch them later. Please visit [Batch API](/developers/advanced-api-usage/batch-api) for more information. ## Rate Limits Adjust the `max_concurrent` param to control the maximum number of parallel requests. You are unable to concurrently run your requests beyond the rate limits shown in the API console. ```pythonXAI import asyncio import os from xai_sdk import AsyncClient from xai_sdk.chat import Response, user async def main(): client = AsyncClient( api_key=os.getenv("XAI_API_KEY"), timeout=3600, # Override default timeout with longer timeout for reasoning models ) model = "grok-4.20-reasoning" requests = [ "Tell me a joke", "Write a funny haiku", "Generate a funny X post", "Say something unhinged", ] # Define a semaphore to limit concurrent requests (e.g., max 2 concurrent requests at a time) max_in_flight_requests = 2 semaphore = asyncio.Semaphore(max_in_flight_requests) async def process_request(request) -> Response: async with semaphore: print(f"Processing request: {request}") chat = client.chat.create(model=model, max_tokens=100) chat.append(user(request)) return await chat.sample() tasks = [] for request in requests: tasks.append(process_request(request)) responses = await asyncio.gather(*tasks) for i, response in enumerate(responses): print(f"Total tokens used for response {i}: {response.usage.total_tokens}") if __name__ == "__main__": asyncio.run(main()) ``` ```pythonOpenAISDK import asyncio import os import httpx from asyncio import Semaphore from openai import AsyncOpenAI client = AsyncOpenAI( api_key=os.getenv("XAI_API_KEY"), base_url="https://api.x.ai/v1", timeout=httpx.Timeout(3600.0) # Override default timeout with longer timeout for reasoning models ) async def send_request(sem: Semaphore, request: str) -> dict: """Send a single request to xAI with semaphore control.""" # The 'async with sem' ensures only a limited number of requests run at once async with sem: return await client.chat.completions.create( model="grok-4.20-reasoning", messages=[{"role": "user", "content": request}] ) async def process_requests(requests: list[str], max_concurrent: int = 2) -> list[dict]: """Process multiple requests with controlled concurrency.""" # Create a semaphore that limits how many requests can run at the same time # Think of it like having only 2 "passes" to make requests simultaneously sem = Semaphore(max_concurrent) # Create a list of tasks (requests) that will run using the semaphore tasks = [send_request(sem, request) for request in requests] # asyncio.gather runs all tasks in parallel but respects the semaphore limit # It waits for all tasks to complete and returns their results return await asyncio.gather(*tasks) async def main() -> None: """Main function to handle requests and display responses.""" requests = [ "Tell me a joke", "Write a funny haiku", "Generate a funny X post", "Say something unhinged" ] # This starts processing all asynchronously, but only 2 at a time # Instead of waiting for each request to finish before starting the next, # we can have 2 requests running at once, making it faster overall responses = await process_requests(requests) # Print each response in order for i, response in enumerate(responses): print(f"# Response {i}:") print(response.choices[0].message.content) if __name__ == "__main__": asyncio.run(main()) ``` ===/developers/advanced-api-usage/batch-api=== #### Advanced API Usage # Batch API The Batch API lets you process large volumes of requests asynchronously with reduced pricing and higher rate limits. For pricing details, see [Batch API Pricing](/developers/models#batch-api-pricing). ## What is the Batch API? When you make a standard API call to Grok, you send a request and wait for an immediate response. This approach is perfect for interactive applications like chatbots, real-time assistants, or any use case where users are waiting for a response. The Batch API takes a different approach. Instead of processing requests immediately, you submit them to a queue where they're processed in the background. You don't get an instant response—instead, you check back later to retrieve your results. **Key differences from real-time API requests:** | | Real-time API | Batch API | |---|---|---| | **Response time** | Immediate (seconds) | Typically within 24 hours | | **Cost** | Standard pricing | Reduced pricing ([see details](/developers/models#batch-api-pricing)) | | **Rate limits** | Per-minute limits apply | Requests don't count towards rate limits | | **Use case** | Interactive, real-time | Background processing, bulk jobs | **Processing time:** Most batch requests complete within **24 hours**, though processing time may vary depending on system load and batch size. You can also create, monitor, and manage batches through the [xAI Console](https://console.x.ai/team/default/batches). The Console provides a visual interface for tracking batch progress and viewing results. ## When to use the Batch API The Batch API is ideal when you don't need immediate results and want to **reduce your API costs**: * **Running evaluations and benchmarks** - Test model performance across thousands of prompts * **Processing large datasets** - Analyze customer feedback, classify support tickets, extract entities * **Content moderation at scale** - Review backlogs of user-generated content * **Document summarization** - Process reports, research papers, or legal documents in bulk * **Data enrichment pipelines** - Add AI-generated insights to database records * **Scheduled overnight jobs** - Generate daily reports or prepare data for dashboards ## How it works The Batch API workflow consists of four main steps: 1. **Create a batch** - A batch is a container that groups related requests together 2. **Add requests** - Submit your inference requests to the batch queue 3. **Monitor progress** - Poll the batch status to track completion 4. **Retrieve results** - Fetch responses for all processed requests Let's walk through each step. ## Step 1: Create a batch A batch acts as a container for your requests. Think of it as a folder that groups related work together—you might create separate batches for different datasets, experiments, or job types. When you create a batch, you receive a `batch_id` that you'll use to add requests and retrieve results. ```bash curl -X POST https://api.x.ai/v1/batches \\ -H "Content-Type: application/json" \\ -H "Authorization: Bearer $XAI_API_KEY" \\ -d '{ "name": "customer_feedback_analysis" }' ``` ```pythonXAI from xai_sdk import Client client = Client() # Create a batch with a descriptive name batch = client.batch.create(batch_name="customer_feedback_analysis") print(f"Created batch: {batch.batch_id}") # Store the batch_id for later use batch_id = batch.batch_id ``` ```javascriptWithoutSDK // Create a batch with a descriptive name const response = await fetch("https://api.x.ai/v1/batches", { method: "POST", headers: { "Content-Type": "application/json", Authorization: \`Bearer \${process.env.XAI_API_KEY}\`, }, body: JSON.stringify({ name: "customer_feedback_analysis" }), }); const batch = await response.json(); console.log(\`Created batch: \${batch.batch_id}\`); // Store the batch_id for later use const batchId = batch.batch_id; ``` ## Step 2: Add requests to the batch With your batch created, you can now add requests to it. Each request will be processed asynchronously. **With the xAI SDK, adding batch requests is simple:** use `chat.create()` for text, `image.prepare()` for images, or `video.prepare()` for videos, then pass them as a list. You can also upload a [JSONL file](#jsonl-file-upload) if you prefer. **Important:** Assign a unique `batch_request_id` to each request. This ID lets you match results back to their original requests, which becomes important when you're processing hundreds or thousands of items. If you don't provide an ID, we generate a UUID for you. Using your own IDs is useful for idempotency (ensuring a request is only processed once) and for linking batch requests to records in your own system. ```pythonXAI from xai_sdk import Client from xai_sdk.chat import system, user from xai_sdk.tools import web_search, x_search, mcp client = Client() batch_requests = [] # Chat completion with tools chat = client.chat.create( model="grok-4.20-reasoning", batch_request_id="chat_001", tools=[web_search(), x_search()], ) chat.append(system("Analyze market sentiment from recent news and posts.")) chat.append(user("What is the current sentiment around TSLA stock?")) batch_requests.append(chat) # Image generation image_req = client.image.prepare( prompt="A sleek modern laptop on a minimalist desk", model="grok-imagine-image", batch_request_id="img_001", ) batch_requests.append(image_req) # Image edit image_edit_req = client.image.prepare( prompt="Add a rainbow in the background", model="grok-imagine-image", image_url="https://picsum.photos/800", batch_request_id="img_edit_001", ) batch_requests.append(image_edit_req) # Video generation video_req = client.video.prepare( prompt="A product rotating on a turntable with dramatic lighting", model="grok-imagine-video", batch_request_id="vid_001", ) batch_requests.append(video_req) # Video edit video_edit_req = client.video.prepare( prompt="Make it slow motion", model="grok-imagine-video", video_url="https://lorem.video/cat_360p_3s", batch_request_id="vid_edit_001", ) batch_requests.append(video_edit_req) # Remote MCP mcp_chat = client.chat.create( model="grok-4.20-reasoning", batch_request_id="mcp_001", tools=[mcp(server_url="https://mcp.deepwiki.com/mcp")], ) mcp_chat.append(user("What does the xai-sdk-python repo do?")) batch_requests.append(mcp_chat) # Add all requests to the batch client.batch.add(batch_id=batch.batch_id, batch_requests=batch_requests) print(f"Added {len(batch_requests)} requests to batch") ``` ```bash curl -X POST https://api.x.ai/v1/batches/{batch_id}/requests \\ -H "Content-Type: application/json" \\ -H "Authorization: Bearer $XAI_API_KEY" \\ -d '{ "batch_requests": [ { "batch_request_id": "feedback_001", "batch_request": { "responses": { "input": [ {"role": "system", "content": "Classify the sentiment as positive, negative, or neutral."}, {"role": "user", "content": "The product exceeded my expectations!"} ], "model": "grok-4.20-reasoning" } } }, { "batch_request_id": "feedback_002", "batch_request": { "responses": { "input": [ {"role": "system", "content": "Classify the sentiment as positive, negative, or neutral."}, {"role": "user", "content": "Shipping took way too long."} ], "model": "grok-4.20-reasoning" } } } ] }' ``` ```javascriptWithoutSDK const batchRequests = []; // Chat completion with tools (uses "responses" endpoint for server-side tool support) batchRequests.push({ batch_request_id: "chat_001", batch_request: { responses: { model: "grok-4.20-reasoning", tools: [{ type: "web_search" }, { type: "x_search" }], input: [ { role: "system", content: "Analyze market sentiment from recent news and posts." }, { role: "user", content: "What is the current sentiment around TSLA stock?" }, ], }, }, }); // Image generation batchRequests.push({ batch_request_id: "img_001", batch_request: { image_generation: { prompt: "A sleek modern laptop on a minimalist desk", model: "grok-imagine-image", }, }, }); // Image edit batchRequests.push({ batch_request_id: "img_edit_001", batch_request: { image_edit: { prompt: "Add a rainbow in the background", model: "grok-imagine-image", image: { url: "https://picsum.photos/800", type: "image_url" }, }, }, }); // Video generation batchRequests.push({ batch_request_id: "vid_001", batch_request: { video_generation: { prompt: "A product rotating on a turntable with dramatic lighting", model: "grok-imagine-video", }, }, }); // Video edit batchRequests.push({ batch_request_id: "vid_edit_001", batch_request: { video_generation: { prompt: "Make it slow motion", model: "grok-imagine-video", video: { url: "https://lorem.video/cat_360p_3s" }, }, }, }); // Remote MCP batchRequests.push({ batch_request_id: "mcp_001", batch_request: { responses: { model: "grok-4.20-reasoning", tools: [{ type: "mcp", server_label: "deepwiki", server_url: "https://mcp.deepwiki.com/mcp" }], input: [{ role: "user", content: "What does the xai-sdk-python repo do?" }], }, }, }); // Add all requests to the batch const response = await fetch(\`https://api.x.ai/v1/batches/\${batchId}/requests\`, { method: "POST", headers: { "Content-Type": "application/json", Authorization: \`Bearer \${process.env.XAI_API_KEY}\`, }, body: JSON.stringify({ batch_requests: batchRequests }), }); if (!response.ok) throw new Error(\`Failed to add requests: \${await response.text()}\`); console.log(\`Added \${batchRequests.length} requests to batch\`); ``` ## Step 3: Monitor batch progress After adding requests, they begin processing in the background. Since batch processing is asynchronous, you need to poll the batch status to know when results are ready. The batch state includes counters for pending, successful, and failed requests. Poll periodically until `num_pending` reaches zero, which indicates all requests have been processed (either successfully or with errors). ```bash # Check batch status curl https://api.x.ai/v1/batches/{batch_id} \\ -H "Authorization: Bearer $XAI_API_KEY" # Response includes state with request counts: # { # "state": { # "num_requests": 100, # "num_pending": 25, # "num_success": 70, # "num_error": 5 # } # } ``` ```pythonXAI import time from xai_sdk import Client client = Client() # Poll until all requests are processed print("Waiting for batch to complete...") while True: batch = client.batch.get(batch_id=batch.batch_id) pending = batch.state.num_pending completed = batch.state.num_success + batch.state.num_error total = batch.state.num_requests print(f"Progress: {completed}/{total} complete, {pending} pending") if pending == 0: print("Batch processing complete!") break # Wait before polling again (avoid hammering the API) time.sleep(5) ``` ```javascriptWithoutSDK // Poll until all requests are processed console.log("Waiting for batch to complete..."); const interval = setInterval(async () => { const response = await fetch( \`https://api.x.ai/v1/batches/\${batchId}\`, { headers: { Authorization: \`Bearer \${process.env.XAI_API_KEY}\` } } ); const batch = await response.json(); const { num_pending, num_success, num_error, num_requests } = batch.state; const completed = num_success + num_error; console.log(\`Progress: \${completed}/\${num_requests} complete, \${num_pending} pending\`); if (num_requests > 0 && num_pending === 0) { clearInterval(interval); console.log("Batch processing complete!"); } // Wait before polling again (avoid hammering the API) }, 5000); ``` ### Understanding batch states The Batch API tracks state at two levels: the **batch level** and the **individual request level**. **Batch-level state** shows aggregate progress across all requests in a given batch, accessible through the `batch.state` object returned by the `client.batch.get()` method: | Counter | Description | |---|---| | `num_requests` | Total number of requests added to the batch | | `num_pending` | Requests waiting to be processed | | `num_success` | Requests that completed successfully | | `num_error` | Requests that failed with an error | | `num_cancelled` | Requests that were cancelled | When `num_pending` reaches zero, all requests have been processed (either successfully, with errors, or cancelled). **Individual request states** describe where each request is in its lifecycle, accessible through the `batch_request_metadata` object returned by the `client.batch.list_batch_requests()` [method](#check-individual-request-status): | State | Description | |---|---| | `pending` | Request is queued and waiting to be processed | | `succeeded` | Request completed successfully, result is available | | `failed` | Request encountered an error during processing | | `cancelled` | Request was cancelled (e.g., when the batch was cancelled before this request was processed) | **Batch lifecycle:** A batch can also be cancelled or expire. [If you cancel a batch](#cancel-a-batch), pending requests won't be processed, but already-completed results remain available. Batches have an expiration time after which results are no longer accessible—check the `expires_at` field when retrieving batch details. ## Step 4: Retrieve results You can retrieve results at any time, even before the entire batch completes. Results are available as soon as individual requests finish processing, so you can start consuming completed results while other requests are still in progress. Each result is linked to its original request via the `batch_request_id` you assigned earlier. For chat completions, use `result.response` which has the familiar fields: `.content`, `.usage`, `.finish_reason`, and more. For image requests, use `result.image_response` which provides `.url`, `.base64`, `.usage`, and `.model`. For video requests, use `result.video_response` which provides `.url`, `.duration`, `.usage`, and `.model`. These are the same response types returned by the regular `client.image.sample()` and `client.video.generate()` methods. The SDK provides convenient `.succeeded` and `.failed` properties to separate successful responses from errors. **Pagination:** Results are returned in pages. Use the `limit` parameter to control page size and `pagination_token` to fetch subsequent pages. When `pagination_token` is `None`, you've reached the end. ```pythonXAI from xai_sdk import Client client = Client() # Paginate through all results all_succeeded = [] all_failed = [] pagination_token = None while True: # Fetch a page of results (limit controls page size) page = client.batch.list_batch_results( batch_id=batch.batch_id, limit=100, pagination_token=pagination_token, ) # Collect results from this page all_succeeded.extend(page.succeeded) all_failed.extend(page.failed) # Check if there are more pages if page.pagination_token is None: break pagination_token = page.pagination_token # Process results - handle different response types print(f"Successfully processed: {len(all_succeeded)} requests") for result in all_succeeded: rid = result.batch_request_id resp = result.proto.response if resp.HasField("completion_response"): # Chat completion response print(f"[{rid}] {result.response.content}") print(f" Tokens used: {result.response.usage.total_tokens}") elif resp.HasField("image_response"): # Image generation response print(f"[{rid}] Image URL: {result.image_response.url}") elif resp.HasField("video_response"): # Video generation response print(f"[{rid}] Video URL: {result.video_response.url}") if all_failed: print(f"\\nFailed: {len(all_failed)} requests") for result in all_failed: print(f"[{result.batch_request_id}] Error: {result.error_message}") ``` ```bash # Fetch first page curl "https://api.x.ai/v1/batches/{batch_id}/results?page_size=100" \\ -H "Authorization: Bearer $XAI_API_KEY" # Use pagination_token from response to fetch next page curl "https://api.x.ai/v1/batches/{batch_id}/results?page_size=100&pagination_token={token}" \\ -H "Authorization: Bearer $XAI_API_KEY" ``` ```javascriptWithoutSDK // Paginate through all results const allSucceeded = []; const allFailed = []; let paginationToken = undefined; while (true) { // Fetch a page of results (limit controls page size) const url = new URL(\`https://api.x.ai/v1/batches/\${batchId}/results\`); url.searchParams.set("page_size", "100"); if (paginationToken) url.searchParams.set("pagination_token", paginationToken); const res = await fetch(url, { headers: { Authorization: \`Bearer \${process.env.XAI_API_KEY}\` }, }); const page = await res.json(); // Collect results from this page for (const result of page.results) { const response = result.batch_result?.response; if (response?.chat_get_completion || response?.image_generation || response?.video_generation) { allSucceeded.push(result); } else { allFailed.push(result); } } // Check if there are more pages if (!page.pagination_token) break; paginationToken = page.pagination_token; } // Process all results console.log(\`Successfully processed: \${allSucceeded.length} requests\`); for (const result of allSucceeded) { const response = result.batch_result.response; const content = response.chat_get_completion?.choices[0].message.content ?? response.image_generation?.data[0].url ?? response.video_generation?.video.url; const tokens = response.chat_get_completion?.usage?.total_tokens; // Access the full response object console.log(\`[\${result.batch_request_id}] \${content}\`); if (tokens != null) console.log(\` Tokens used: \${tokens}\`); } if (allFailed.length > 0) { console.log(\`\\nFailed: \${allFailed.length} requests\`); for (const result of allFailed) { console.log(\`[\${result.batch_request_id}] Error: \${result.error_message}\`); } } ``` ## Additional operations Beyond the core workflow, the Batch API provides additional operations for managing your batches. ### Cancel a batch You can cancel a batch before all requests complete. Already-processed requests remain available in the results, but pending requests will not be processed. You cannot add more requests to a cancelled batch. ```bash curl -X POST https://api.x.ai/v1/batches/{batch_id}:cancel \\ -H "Authorization: Bearer $XAI_API_KEY" ``` ```pythonXAI from xai_sdk import Client client = Client() # Cancel processing cancelled_batch = client.batch.cancel(batch_id=batch.batch_id) print(f"Cancelled batch: {cancelled_batch.batch_id}") print(f"Completed before cancellation: {cancelled_batch.state.num_success} requests") ``` ```javascriptWithoutSDK // Cancel processing const response = await fetch( \`https://api.x.ai/v1/batches/\${batchId}:cancel\`, { method: "POST", headers: { Authorization: \`Bearer \${process.env.XAI_API_KEY}\` } } ); const cancelledBatch = await response.json(); console.log(\`Cancelled batch: \${cancelledBatch.batch_id}\`); console.log(\`Completed before cancellation: \${cancelledBatch.state.num_success} requests\`); ``` ### List all batches View all batches belonging to your team. Batches are retained until they expire (check the `expires_at` field). This endpoint supports the same `limit` and `pagination_token` parameters for paginating through large lists. ```bash curl "https://api.x.ai/v1/batches?page_size=20" \\ -H "Authorization: Bearer $XAI_API_KEY" ``` ```pythonXAI from xai_sdk import Client client = Client() # List recent batches response = client.batch.list(limit=20) for batch in response.batches: status = "complete" if batch.state.num_pending == 0 else "processing" print(f"{batch.name} ({batch.batch_id}): {status}") ``` ```javascriptWithoutSDK // List recent batches const response = await fetch( "https://api.x.ai/v1/batches?page_size=20", { headers: { Authorization: \`Bearer \${process.env.XAI_API_KEY}\` } } ); const data = await response.json(); for (const batch of data.batches) { const status = batch.state.num_pending === 0 ? "complete" : "processing"; console.log(\`\${batch.name} (\${batch.batch_id}): \${status}\`); } ``` ### Check individual request status For detailed tracking, you can inspect the metadata for each request in a batch. This shows the status, timing, and other details for individual requests. This endpoint supports the same `limit` and `pagination_token` parameters for paginating through large batches. ```bash curl "https://api.x.ai/v1/batches/{batch_id}/requests?page_size=50" \\ -H "Authorization: Bearer $XAI_API_KEY" ``` ```pythonXAI from xai_sdk import Client client = Client() # Get metadata for individual requests metadata = client.batch.list_batch_requests(batch_id=batch.batch_id) for request in metadata.batch_request_metadata: print(f"Request {request.batch_request_id}: {request.state}") ``` ```javascriptWithoutSDK // Get metadata for individual requests const response = await fetch( \`https://api.x.ai/v1/batches/\${batchId}/requests?page_size=50\`, { headers: { Authorization: \`Bearer \${process.env.XAI_API_KEY}\` } } ); const data = await response.json(); for (const req of data.batch_request_metadata) { console.log(\`Request \${req.batch_request_id}: \${req.state}\`); } ``` ### Track costs Each batch tracks the total processing cost. Access the cost breakdown after processing to understand your spending. For pricing details, see [Batch API Pricing on the Models and Pricing page](/developers/models#batch-api-pricing). ```bash # Get batch with cost information curl -s "https://api.x.ai/v1/batches/{batch_id}/results?page_size=100" \\ -H "Authorization: Bearer $XAI_API_KEY" # Cost per result can be found on response.results[].batch_result.response.chat_get_completion.usage.cost_in_usd_ticks # Cost is returned in ticks (1e-10 USD) for precision ``` ```pythonXAI from xai_sdk import Client client = Client() # Get batch with cost information batch = client.batch.get(batch_id=batch.batch_id) # Cost is returned in ticks (1e-10 USD) for precision total_cost_usd = batch.cost_breakdown.total_cost_usd_ticks / 1e10 print("Total cost: $%.4f" % total_cost_usd) ``` ```javascriptWithoutSDK // Get batch with cost information const response = await fetch( \`https://api.x.ai/v1/batches/\${batchId}/results?page_size=100\`, { headers: { Authorization: \`Bearer \${process.env.XAI_API_KEY}\` } } ); const data = await response.json(); // Cost is returned in ticks (1e-10 USD) for precision let totalTicks = 0; for (const r of data.results) { totalTicks += r.batch_result?.response?.chat_get_completion?.usage?.cost_in_usd_ticks ?? 0; } console.log(\`Total cost: $\${(totalTicks / 1e10).toFixed(4)}\`); ``` ## Complete example This end-to-end example demonstrates a realistic batch workflow: analyzing customer feedback at scale. It creates a batch, submits feedback items for sentiment analysis, waits for processing, and outputs the results. For simplicity, this example doesn't paginate results—see [Step 4](#step-4-retrieve-results) for pagination when processing larger batches. ```pythonXAI import time from xai_sdk import Client from xai_sdk.chat import system, user client = Client() # Sample dataset: customer feedback to analyze feedback_data = [ {"id": "fb_001", "text": "Absolutely love this product! Best purchase ever."}, {"id": "fb_002", "text": "Delivery was late and the packaging was damaged."}, {"id": "fb_003", "text": "Works fine, nothing special to report."}, {"id": "fb_004", "text": "Customer support was incredibly helpful!"}, {"id": "fb_005", "text": "The app keeps crashing on my phone."}, ] # Step 1: Create a batch print("Creating batch...") batch = client.batch.create(batch_name="feedback_sentiment_analysis") print(f"Batch created: {batch.batch_id}") # Step 2: Build and add requests print("\\nAdding requests...") batch_requests = [] for item in feedback_data: chat = client.chat.create( model="grok-4.20-reasoning", batch_request_id=item["id"], ) chat.append(system( "Analyze the sentiment of the customer feedback. " "Respond with exactly one word: positive, negative, or neutral." )) chat.append(user(item["text"])) batch_requests.append(chat) client.batch.add(batch_id=batch.batch_id, batch_requests=batch_requests) print(f"Added {len(batch_requests)} requests") # Step 3: Wait for completion print("\\nProcessing...") while True: batch = client.batch.get(batch_id=batch.batch_id) pending = batch.state.num_pending completed = batch.state.num_success + batch.state.num_error print(f" {completed}/{batch.state.num_requests} complete") if pending == 0: break time.sleep(2) # Step 4: Retrieve and display results print("\\n--- Results ---") results = client.batch.list_batch_results(batch_id=batch.batch_id) # Create a lookup for original feedback text feedback_lookup = {item["id"]: item["text"] for item in feedback_data} for result in results.succeeded: original_text = feedback_lookup.get(result.batch_request_id, "") sentiment = result.response.content.strip().lower() print(f"[{sentiment.upper()}] {original_text[:50]}...") # Report any failures if results.failed: print("\\n--- Errors ---") for result in results.failed: print(f"[{result.batch_request_id}] {result.error_message}") # Display cost cost_usd = batch.cost_breakdown.total_cost_usd_ticks / 1e10 print("\\nTotal cost: $%.4f" % cost_usd) ``` ```javascriptWithoutSDK const BASE_URL = "https://api.x.ai/v1"; const headers = { "Content-Type": "application/json", Authorization: \`Bearer \${process.env.XAI_API_KEY}\` }; // Sample dataset: customer feedback to analyze const feedbackData = [ { id: "fb_001", text: "Absolutely love this product! Best purchase ever." }, { id: "fb_002", text: "Delivery was late and the packaging was damaged." }, { id: "fb_003", text: "Works fine, nothing special to report." }, { id: "fb_004", text: "Customer support was incredibly helpful!" }, { id: "fb_005", text: "The app keeps crashing on my phone." }, ]; // Step 1: Create a batch console.log("Creating batch..."); const batchRes = await fetch(\`\${BASE_URL}/batches\`, { method: "POST", headers, body: JSON.stringify({ name: "feedback_sentiment_analysis" }), }); const batch = await batchRes.json(); const batchId = batch.batch_id; console.log(\`Batch created: \${batchId}\`); // Step 2: Build and add requests console.log("\\nAdding requests..."); const response = await fetch(\`\${BASE_URL}/batches/\${batchId}/requests\`, { method: "POST", headers, body: JSON.stringify({ batch_requests: feedbackData.map((item) => ({ batch_request_id: item.id, batch_request: { chat_get_completion: { model: "grok-4.20-reasoning", messages: [ { role: "system", content: "Analyze the sentiment of the customer feedback. Respond with exactly one word: positive, negative, or neutral.", }, { role: "user", content: item.text }, ], }, }, })), }), }); if (!response.ok) throw new Error(\`Failed to add requests: \${await response.text()}\`); console.log(\`Added \${feedbackData.length} requests\`); // Step 3: Wait for completion console.log("\\nProcessing..."); const interval = setInterval(async () => { const statusRes = await fetch(\`\${BASE_URL}/batches/\${batchId}\`, { headers }); const status = await statusRes.json(); const { num_pending, num_success, num_error, num_requests } = status.state; console.log(\` \${num_success + num_error}/\${num_requests} complete\`); if (num_requests > 0 && num_pending === 0) { clearInterval(interval); // Step 4: Retrieve and display results console.log("\\n--- Results ---"); const resultsRes = await fetch(\`\${BASE_URL}/batches/\${batchId}/results?page_size=100\`, { headers }); const { results } = await resultsRes.json(); // Create a lookup for original feedback text const feedbackLookup = Object.fromEntries(feedbackData.map((item) => [item.id, item.text])); const succeeded = results.filter((r) => r.batch_result?.response?.chat_get_completion); const failed = results.filter((r) => !r.batch_result?.response?.chat_get_completion); for (const result of succeeded) { const originalText = feedbackLookup[result.batch_request_id] ?? ""; const sentiment = result.batch_result.response.chat_get_completion.choices[0].message.content.trim().toLowerCase(); console.log(\`[\${sentiment.toUpperCase()}] \${originalText.slice(0, 50)}...\`); } // Report any failures if (failed.length > 0) { console.log("\\n--- Errors ---"); for (const result of failed) { console.log(\`[\${result.batch_request_id}] \${result.error_message}\`); } } // Display cost let totalTicks = 0; for (const r of results) { totalTicks += r.batch_result?.response?.chat_get_completion?.usage?.cost_in_usd_ticks ?? 0; } console.log(\`\\nTotal cost: $\${(totalTicks / 1e10).toFixed(4)}\`); } }, 2000); ``` ## JSONL File Upload As an alternative to adding requests via the SDK, you can create batches by uploading a JSONL file. This is useful when generating requests from scripts, pipelines, or external tools. Each line in the file is a JSON object with four fields: `custom_id` (unique identifier, maps to `batch_request_id`), `method` (always `"POST"`), `url` (API endpoint path), and `body` (the JSON request payload matching the [REST API reference](/developers/rest-api-reference) for that endpoint). ```json {"custom_id": "chat-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "grok-4-1-fast-reasoning", "messages": [{"role": "user", "content": "Classify this as positive, negative, or neutral: The product exceeded my expectations!"}]}} {"custom_id": "search-1", "method": "POST", "url": "/v1/responses", "body": {"model": "grok-4-1-fast-reasoning", "tools": [{"type": "web_search"}, {"type": "x_search"}], "input": [{"role": "user", "content": "What are the latest SpaceX launches?"}]}} {"custom_id": "mcp-1", "method": "POST", "url": "/v1/responses", "body": {"model": "grok-4-1-fast-reasoning", "tools": [{"type": "mcp", "server_label": "deepwiki", "server_url": "https://mcp.deepwiki.com/mcp"}], "input": [{"role": "user", "content": "What does the xai-sdk-python repo do?"}]}} {"custom_id": "img-1", "method": "POST", "url": "/v1/images/generations", "body": {"model": "grok-imagine-image", "prompt": "A futuristic city skyline at sunset"}} {"custom_id": "img-edit-1", "method": "POST", "url": "/v1/images/edits", "body": {"model": "grok-imagine-image", "prompt": "Add a rainbow", "image": {"url": "https://picsum.photos/800"}}} {"custom_id": "vid-1", "method": "POST", "url": "/v1/videos/generations", "body": {"model": "grok-imagine-video", "prompt": "A rocket launching from Mars", "duration": 8}} {"custom_id": "vid-edit-1", "method": "POST", "url": "/v1/videos/edits", "body": {"model": "grok-imagine-video", "prompt": "Make it slow motion", "video": {"url": "https://lorem.video/cat_360p_3s"}}} ``` You can mix different endpoints in the same file. Each request is routed independently. Supported `url` values: | URL | Description | |---|---| | `/v1/chat/completions` | [Chat completions](/developers/model-capabilities/text/generate-text) | | `/v1/responses` | [Model responses](/developers/model-capabilities/text/generate-text) | | `/v1/images/generations` | [Image generation](/developers/model-capabilities/images/generation) | | `/v1/images/edits` | [Image editing](/developers/model-capabilities/images/generation) | | `/v1/videos/generations` or `/v1/videos` | [Video generation](/developers/model-capabilities/video/generation) | | `/v1/videos/edits` | [Video editing](/developers/model-capabilities/video/generation) | Upload the file via the [Files API](/developers/files), then create a batch referencing it: ```pythonXAI from xai_sdk import Client client = Client() # Upload the JSONL file file = client.files.upload( file=open("batch_requests.jsonl", "rb"), ) # Create a batch with the file ID batch = client.batch.create( batch_name="sentiment_analysis", input_file_id=file.id, ) print(f"Created batch: {batch.batch_id}") ``` ```bash # Upload the JSONL file curl -X POST https://api.x.ai/v1/files \\ -H "Authorization: Bearer $XAI_API_KEY" \\ -F file="@batch_requests.jsonl" # Create a batch with the file ID curl -X POST https://api.x.ai/v1/batches \\ -H "Content-Type: application/json" \\ -H "Authorization: Bearer $XAI_API_KEY" \\ -d '{ "name": "sentiment_analysis", "input_file_id": "file-abc123" }' ``` ```javascriptWithoutSDK import fs from "fs"; // Upload the JSONL file const jsonlContent = fs.readFileSync("batch_requests.jsonl", "utf8"); const formData = new FormData(); formData.append("file", new Blob([jsonlContent], { type: "application/jsonl" }), "batch_requests.jsonl"); const uploadRes = await fetch("https://api.x.ai/v1/files", { method: "POST", headers: { Authorization: \`Bearer \${process.env.XAI_API_KEY}\` }, body: formData, }); const file = await uploadRes.json(); // Create a batch with the file ID const batchRes = await fetch("https://api.x.ai/v1/batches", { method: "POST", headers: { "Content-Type": "application/json", Authorization: \`Bearer \${process.env.XAI_API_KEY}\`, }, body: JSON.stringify({ name: "sentiment_analysis", input_file_id: file.id }), }); const batch = await batchRes.json(); console.log(\`Created batch: \${batch.batch_id}\`); ``` The file is processed asynchronously in the background. If any line is invalid, the batch is cancelled with an error message. Monitor progress and retrieve results the same way as inline batches. File-based batches are sealed after creation — you cannot add more requests via `AddBatchRequests`. Maximum file size is **200 MB** with up to **50,000** requests. Each `custom_id` must be unique within the file. ## Limitations **Batches** * A team can have an **unlimited** number of batches. * Maximum batch creation rate: **2** batch creations per second per team. **Batch Requests** * A batch can contain an **unlimited** number of requests in theory, but extremely large batches (>1,000,000 requests) may be throttled for processing stability. * Each individual request that can be added to a batch has a maximum payload size of **25MB**. * A team can send up to **1000** add-batch-requests API calls every **30 seconds** (this is a rolling limit shared across all batches in the team). * Image and video results contain signed URLs that expire after **1 hour**. Download the media promptly after retrieving results. ## Tool Use Both [server-side tools](/developers/tools/overview) and client-side function tools are supported in batch requests. * **Server-side tools** (web search, code execution, MCP, etc.) work the same as in the real-time API — they are executed during processing and the final response is returned. * **Client-side function tools** are supported: the model returns `tool_calls` in the response for you to handle offline. Multi-turn tool calling requires submitting a new batch request with the tool result messages included in the conversation. ## Related * [API Reference: Batch endpoints](/developers/rest-api-reference/inference/batches#create-a-new-batch) * [gRPC Reference: Batch management](/developers/grpc-api-reference#batch-management) * [Models and pricing — Batch API Pricing](/developers/models#batch-api-pricing) * [xAI Python SDK](https://github.com/xai-org/xai-sdk-python) ===/developers/advanced-api-usage/deferred-chat-completions=== #### Advanced API Usage # Deferred Chat Completions Deferred Chat Completions are currently available only via REST requests or xAI SDK. Deferred Chat Completions allow you to create a chat completion, get a `response_id`, and retrieve the response at a later time. The result would be available to be requested exactly once within 24 hours, after which it would be discarded. Your deferred completion rate limit is the same as your chat completions rate limit. To view your rate limit, please visit [xAI Console](https://console.x.ai). After sending the request to the xAI API, the chat completion result will be available at `https://api.x.ai/v1/chat/deferred-completion/{request_id}`. The response body will contain `{'request_id': 'f15c114e-f47d-40ca-8d5c-8c23d656eeb6'}`, and the `request_id` value can be inserted into the `deferred-completion` endpoint path. Then, we send this GET request to retrieve the deferred completion result. When the completion result is not ready, the request will return `202 Accepted` with an empty response body. You can access the model's raw thinking trace via the `message.reasoning_content` of the chat completion response. ## Example A code example is provided below, where we retry retrieving the result until it has been processed: ```pythonXAI import os from datetime import timedelta from xai_sdk import Client from xai_sdk.chat import user, system client = Client(api_key=os.getenv('XAI_API_KEY')) chat = client.chat.create( model="grok-4.20-reasoning", messages=[system("You are Zaphod Beeblebrox.")] ) chat.append(user("126/3=?")) # Poll the result every 10 seconds for a maximum of 10 minutes response = chat.defer( timeout=timedelta(minutes=10), interval=timedelta(seconds=10) ) # Print the result when it is ready print(response.content) ``` ```pythonRequests import json import os import requests from tenacity import retry, wait_exponential headers = { "Content-Type": "application/json", "Authorization": f"Bearer {os.getenv('XAI_API_KEY')}" } payload = { "messages": [ {"role": "system", "content": "You are Zaphod Beeblebrox."}, {"role": "user", "content": "126/3=?"} ], "model": "grok-4.20-reasoning", "deferred": True } response = requests.post( "https://api.x.ai/v1/chat/completions", headers=headers, json=payload ) request_id = response.json()["request_id"] print(f"Request ID: {request_id}") @retry(wait=wait_exponential(multiplier=1, min=1, max=60),) def get_deferred_completion(): response = requests.get(f"https://api.x.ai/v1/chat/deferred-completion/{request_id}", headers=headers) if response.status_code == 200: return response.json() elif response.status_code == 202: raise Exception("Response not ready yet") else: raise Exception(f"{response.status_code} Error: {response.text}") completion_data = get_deferred_completion() print(json.dumps(completion_data, indent=4)) ``` ```javascriptWithoutSDK const axios = require('axios'); const retry = require('retry'); const headers = { 'Content-Type': 'application/json', 'Authorization': \`Bearer \${process.env.XAI_API_KEY}\` }; const payload = { messages: [ { role: 'system', content: 'You are Zaphod Beeblebrox.' }, { role: 'user', content: '126/3=?' } ], model: 'grok-4.20-reasoning', deferred: true }; async function main() { const requestId = (await axios.post('https://api.x.ai/v1/chat/completions', payload, { headers })).data.request_id; console.log(\`Request ID: \${requestId}\`); const operation = retry.operation({ minTimeout: 1000, maxTimeout: 60000, factor: 2 }); const completion = await new Promise((resolve, reject) => { operation.attempt(async () => { const res = await axios.get(\`https://api.x.ai/v1/chat/deferred-completion/\${requestId}\`, { headers }); if (res.status === 200) resolve(res.data); else if (res.status === 202) operation.retry(new Error('Not ready')); else reject(new Error(\`\${res.status}: \${res.statusText}\`)); }); }); console.log(JSON.stringify(completion, null, 4)); } main().catch(console.error); ``` ```bash RESPONSE=$(curl -s https://api.x.ai/v1/chat/completions \\ -H "Content-Type: application/json" \\ -H "Authorization: Bearer $XAI_API_KEY" \\ -d '{ "messages": [ {"role": "system", "content": "You are Zaphod Beeblebrox."}, {"role": "user", "content": "126/3=?"} ], "model": "grok-4.20-reasoning", "deferred": true }') REQUEST_ID=$(echo "$RESPONSE" | jq -r '.request_id') echo "Request ID: $REQUEST_ID" sleep 10 curl -s https://api.x.ai/v1/chat/deferred-completion/$REQUEST_ID \\ -H "Authorization: Bearer $XAI_API_KEY" ``` The response body will be the same as what you would expect with non-deferred chat completions: ```json { "id": "3f4ddfca-b997-3bd4-80d4-8112278a1508", "object": "chat.completion", "created": 1752077400, "model": "grok-4.20-reasoning", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "Whoa, hold onto your improbability drives, kid! This is Zaphod Beeblebrox here, the two-headed, three-armed ex-President of the Galaxy, and you're asking me about 126 divided by 3? Pfft, that's kid stuff for a guy who's stolen starships and outwitted the universe itself.\n\nBut get this\u2014126 slashed by 3 equals... **42**! Yeah, that's right, the Ultimate Answer to Life, the Universe, and Everything! Deep Thought didn't compute that for seven and a half million years just for fun, you know. My left head's grinning like a Vogon poet on happy pills, and my right one's already planning a party. If you need more cosmic math or a lift on the Heart of Gold, just holler. Zaphod out! \ud83d\ude80", "refusal": null }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 26, "completion_tokens": 168, "total_tokens": 498, "prompt_tokens_details": { "text_tokens": 26, "audio_tokens": 0, "image_tokens": 0, "cached_tokens": 4 }, "completion_tokens_details": { "reasoning_tokens": 304, "audio_tokens": 0, "accepted_prediction_tokens": 0, "rejected_prediction_tokens": 0 }, "num_sources_used": 0 }, "system_fingerprint": "fp_44e53da025" } ``` For more details, refer to [Chat completions](/developers/rest-api-reference/inference/chat#chat-completions) and [Get deferred chat completions](/developers/rest-api-reference/inference/chat#get-deferred-chat-completions) in our REST API Reference. ===/developers/advanced-api-usage/fingerprint=== #### Advanced API Usage # Fingerprint For each request to the xAI API, the response body will include a unique `system_fingerprint` value. This fingerprint serves as an identifier for the current state of the backend system's configuration. Example: ```bash curl https://api.x.ai/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $XAI_API_KEY" \ -d '{ "messages": [ { "role": "system", "content": "You are Grok, a chatbot inspired by the Hitchhiker's Guide to the Galaxy." }, { "role": "user", "content": "What is the meaning of life, the universe, and everything?" } ], "model": "grok-4.20-reasoning", "stream": false, "temperature": 0 }' ``` Response: ```json {..., "system_fingerprint":"fp_6ca29cf396"} ``` You can automate your system to keep track of the `system_fingerprint` along with token consumption and other metrics. ## Usage of fingerprint * **Monitoring System Changes:** The system fingerprint acts as a version control for the backend configuration. If any part of the backend system—such as model parameters, server settings, or even the underlying infrastructure—changes, the fingerprint will also change. This allows developers to track when and how the system has evolved over time. This is crucial for debugging, performance optimization, and ensuring consistency in API responses. * **Security and Integrity:** The fingerprint can be used to ensure the integrity of the response. If a response's fingerprint matches the expected one based on a recent system configuration, it helps in verifying that the data hasn't been tampered with during transmission or that the service hasn't been compromised. **The fingerprint will change over time and it is expected.** * **Compliance and Auditing:** For regulated environments, this fingerprint can serve as part of an audit trail, showing when specific configurations were in use for compliance purposes. ===/developers/advanced-api-usage/mtls=== #### Advanced API Usage # mTLS Authentication Mutual TLS (mTLS) lets you lock down API access so that only machines presenting a valid client certificate can make requests on behalf of your team. This is ideal for enterprise environments where API traffic flows through your own gateways and you need cryptographic proof that each request originates from an authorized system. mTLS is an enterprise feature. Contact [support@x.ai](mailto:support@x.ai?subject=mTLS%20Integration%20Request) to enable it for your team. ## Why Use mTLS? * **Zero-trust security** — Every request must prove its identity with a certificate, not just an API key * **Gateway-friendly** — Works naturally when your traffic routes through corporate API gateways, proxies, or service meshes * **No code changes** — Once enabled, you only need to attach your client certificate to requests. All existing API features (models, tools, streaming) work identically ## Quick Start ### 1. Get set up Contact [support@x.ai](mailto:support@x.ai) with: * Your team ID (found in the [xAI Console](https://console.x.ai)) * Your CA certificate in PEM format * The Common Name (CN) from the client certificates your systems will use We'll configure your team and confirm when mTLS is active. ### 2. Point to the mTLS endpoint Use `https://mtls.api.x.ai` instead of `https://api.x.ai`. This is the only change required. All API paths (`/v1/chat/completions`, `/v1/responses`, `/v1/embeddings`, etc.) work the same way. ### 3. Attach your client certificate Include your client certificate and private key with every request. Here are examples: ```bash curl https://mtls.api.x.ai/v1/chat/completions \\ --cert /path/to/client-cert.pem \\ --key /path/to/client-key.pem \\ -H "Content-Type: application/json" \\ -H "Authorization: Bearer $XAI_API_KEY" \\ -d '{ "messages": [ { "role": "user", "content": "Hello, world!" } ], "model": "grok-4.20-reasoning", "stream": false }' ``` ```pythonOpenAISDK import os import httpx from openai import OpenAI # Attach your client certificate to the HTTP transport http_client = httpx.Client( cert=("/path/to/client-cert.pem", "/path/to/client-key.pem") ) client = OpenAI( api_key=os.getenv("XAI_API_KEY"), base_url="https://mtls.api.x.ai/v1", http_client=http_client, ) completion = client.chat.completions.create( model="grok-4.20-reasoning", messages=[ {"role": "user", "content": "Hello, world!"} ] ) print(completion.choices[0].message.content) ``` ```javascriptOpenAISDK import OpenAI from 'openai'; import https from 'https'; import fs from 'fs'; const client = new OpenAI({ apiKey: process.env.XAI_API_KEY, baseURL: 'https://mtls.api.x.ai/v1', httpAgent: new https.Agent({ cert: fs.readFileSync('/path/to/client-cert.pem'), key: fs.readFileSync('/path/to/client-key.pem'), }), }); const completion = await client.chat.completions.create({ model: 'grok-4.20-reasoning', messages: [ { role: 'user', content: 'Hello, world!' } ], }); console.log(completion.choices[0].message.content); ``` You still need a valid API key on every request. mTLS is an **additional** layer of security, not a replacement for API key authentication. ## How Authentication Works When mTLS is enabled for your team, every request goes through two checks: 1. **Certificate verification** — Your client certificate is validated against the CA certificate you provided during setup. Requests without a valid certificate are rejected with `403 Forbidden`. 2. **API key verification** — Your API key is checked as usual. Invalid or missing keys are rejected with `401 Unauthorized`. Both checks must pass for the request to proceed. All other behavior (rate limits, billing, model access) is identical to the standard endpoint. ## Rotating Certificates mTLS is designed so you can rotate certificates without downtime: | Scenario | What to Do | |----------|------------| | **Renewing a client certificate** (same CA, same CN) | Nothing. Just start using the new certificate. | | **Updating your CA** (e.g., new intermediate) | Contact [support@x.ai](mailto:support@x.ai) to upload the updated CA bundle. | | **Switching to a different CA entirely** | Contact [support@x.ai](mailto:support@x.ai) to register the new CA certificate. | ## FAQ ### Do I have to use the mTLS endpoint? If mTLS is enabled as **required** for your team, yes. Requests to `api.x.ai` will be rejected because no client certificate is presented. If you need some API keys to work without mTLS, contact support to discuss your configuration. ### Can I use regional endpoints with mTLS? mTLS is currently available on the global `mtls.api.x.ai` endpoint. If you need mTLS with [regional endpoints](/developers/regions), contact [support@x.ai](mailto:support@x.ai). ### What certificate format do I need? X.509 certificates in PEM format. Both the CA certificate (provided during setup) and client certificates must be PEM-encoded. ### Is mTLS configured per API key or per team? mTLS is configured at the **team level**. All API keys in your team share the same mTLS configuration. ### How do I test my setup? After setup, make a simple request with your certificate: ```bash curl -v https://mtls.api.x.ai/v1/api-key \\ --cert /path/to/client-cert.pem \\ --key /path/to/client-key.pem \\ -H "Authorization: Bearer $XAI_API_KEY" ``` A successful response confirms both your certificate and API key are working. If you see `403 Forbidden`, check that your certificate is signed by the CA you provided to xAI. ===/developers/advanced-api-usage=== #### Advanced API Usage # Advanced API Usage Advanced guides for scaling, optimizing, and integrating xAI APIs. ## In this section * [Batch API](/developers/advanced-api-usage/batch-api) * [Prompt Caching](/developers/advanced-api-usage/prompt-caching) * [Deferred Completions](/developers/advanced-api-usage/deferred-chat-completions) * [Fingerprint](/developers/advanced-api-usage/fingerprint) * [Async Requests](/developers/advanced-api-usage/async) * [Use with Code Editors](/developers/advanced-api-usage/use-with-code-editors) ===/developers/advanced-api-usage/prompt-caching/best-practices=== #### Prompt Caching # Best Practices & FAQ ## Best practices 1. **Always set `x-grok-conv-id`** (or `prompt_cache_key` for Responses API) — Routes requests to the same server, maximizing cache hits. 2. **Use a stable conversation ID** — A UUID or your application's session ID works well. 3. **Never modify earlier messages** — Only append new ones. Any edit, removal, or reorder breaks the cache. 4. **Front-load static content** — Place system prompts, few-shot examples, and reference documents at the beginning where they form a stable prefix. 5. **Monitor `cached_tokens`** — If consistently 0, verify your conversation ID and message ordering. 6. **Handle cache misses gracefully** — Eviction and routing mean cache hits aren't guaranteed. Your application should work without caching. ## Supported models Prompt caching is available on all `grok` language models. Check the [Models and Pricing](/developers/models) page for details on which models support caching and their specific cached token pricing. ## FAQ ### Does caching affect output quality? No. Caching only accelerates the prompt processing phase. The model's output is identical whether the prompt is served from cache or computed from scratch. ### How long do cache entries persist? Cache entries can be evicted at any time due to server load or restarts. Use `x-grok-conv-id` to maximize retention by routing to the same server. ### Can I force a cache miss? Yes — use a different `x-grok-conv-id` or omit the header entirely. This will route your request to a potentially different server where no cache exists for your prompt. ### Does caching work with streaming? Yes. Prompt caching works with both streaming and non-streaming requests. The first empty token in a stream corresponds to the cache lookup and prefill phase. ### Does caching work with tool calls and function calling? Yes. The cacheable prefix includes all messages up to and including tool call results. As long as the prefix remains unchanged, subsequent requests will benefit from caching. ===/developers/advanced-api-usage/prompt-caching/how-it-works=== #### Prompt Caching # How It Works The cache works from the **start of your messages array**. When a request arrives, the system checks how many messages at the beginning match a previous request exactly — that matching portion is the "prefix" and gets served from cache: 1. **First request** — The full prompt is processed and cached 2. **Subsequent requests** — If the prompt prefix matches, the cached portion is reused (a cache *hit*) 3. **Billing** — Cached tokens are billed at a reduced rate Prompt caching is not 100% guaranteed. Cache entries can be evicted due to memory pressure, and requests may be routed to different servers. Use `x-grok-conv-id` to maximize cache hit rates. ## Example **Request 1:** ```text [system] "You are a helpful assistant." [user] "What is the capital of France?" [assistant] "The capital of France is Paris." ``` **Request 2:** ```text [system] "You are a helpful assistant." ← cached [user] "What is the capital of France?" ← cached [assistant] "The capital of France is Paris." ← cached [user] "What about Germany?" ← new ``` The first 3 messages match Request 1 exactly, so they're served from cache. Only the new message is computed. ## Next * [Maximizing Cache Hits](/developers/advanced-api-usage/prompt-caching/maximizing-cache-hits) * [What Breaks Caching](/developers/advanced-api-usage/prompt-caching/multi-turn) * [Usage & Pricing](/developers/advanced-api-usage/prompt-caching/usage-and-pricing) * [Best Practices & FAQ](/developers/advanced-api-usage/prompt-caching/best-practices) ===/developers/advanced-api-usage/prompt-caching/maximizing-cache-hits=== #### Prompt Caching # Maximizing Cache Hits ## Set `x-grok-conv-id` (Chat Completions API) The `x-grok-conv-id` HTTP header routes requests with the same conversation ID to the same server. Since cache entries are stored per-server, this maximizes your cache hit rate. ```bash customLanguage="bash" curl https://api.x.ai/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $XAI_API_KEY" \ -H "x-grok-conv-id: conv_abc123" \ -d '{ "model": "grok-4.20-reasoning", "messages": [ {"role": "system", "content": "You are Grok, a helpful and truthful AI assistant built by xAI."}, {"role": "user", "content": "What is prompt caching?"} ] }' ``` ```python customLanguage="pythonOpenAISDK" from openai import OpenAI client = OpenAI( api_key="YOUR_XAI_API_KEY", base_url="https://api.x.ai/v1", ) response = client.chat.completions.create( model="grok-4.20-reasoning", messages=[ {"role": "system", "content": "You are Grok, a helpful and truthful AI assistant built by xAI."}, {"role": "user", "content": "What is prompt caching?"}, ], extra_headers={ "x-grok-conv-id": "conv_abc123", }, ) print(response.choices[0].message.content) print(f"Cached tokens: {response.usage.prompt_tokens_details.cached_tokens}") ``` ```javascript customLanguage="javascriptOpenAISDK" import OpenAI from 'openai'; const client = new OpenAI({ apiKey: 'YOUR_XAI_API_KEY', baseURL: 'https://api.x.ai/v1', }); const response = await client.chat.completions.create( { model: 'grok-4.20-reasoning', messages: [ { role: 'system', content: 'You are Grok, a helpful and truthful AI assistant built by xAI.', }, { role: 'user', content: 'What is prompt caching?' }, ], }, { headers: { 'x-grok-conv-id': 'conv_abc123', }, }, ); console.log(response.choices[0].message.content); console.log( `Cached tokens: ${response.usage.prompt_tokens_details.cached_tokens}`, ); ``` ## Set `prompt_cache_key` (Responses API) For the Responses API, use the `prompt_cache_key` field directly in the request body. It functions identically to setting `x-grok-conv-id` — it routes requests to the same server for cache reuse. ```bash customLanguage="bash" curl https://api.x.ai/v1/responses \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $XAI_API_KEY" \ -d '{ "model": "grok-4.20-reasoning", "input": "What is prompt caching?", "prompt_cache_key": "b79ad29b-b3f9-463c-bca6-041d5058d366" }' ``` ```python customLanguage="pythonOpenAISDK" from openai import OpenAI client = OpenAI( api_key="YOUR_XAI_API_KEY", base_url="https://api.x.ai/v1", ) response = client.responses.create( model="grok-4.20-reasoning", input="What is prompt caching?", extra_body={ "prompt_cache_key": "b79ad29b-b3f9-463c-bca6-041d5058d366", }, ) print(response.output_text) print(f"Cached tokens: {response.usage.input_tokens_details.cached_tokens}") ``` ```javascript customLanguage="javascriptOpenAISDK" import OpenAI from 'openai'; const client = new OpenAI({ apiKey: 'YOUR_XAI_API_KEY', baseURL: 'https://api.x.ai/v1', }); const response = await client.responses.create({ model: 'grok-4.20-reasoning', input: 'What is prompt caching?', // @ts-expect-error -- xAI-specific field prompt_cache_key: 'b79ad29b-b3f9-463c-bca6-041d5058d366', }); console.log(response.output_text); console.log( `Cached tokens: ${response.usage.input_tokens_details.cached_tokens}`, ); ``` ```javascript customLanguage="javascriptAISDK" import { xai } from '@ai-sdk/xai'; import { generateText } from 'ai'; const { text, usage } = await generateText({ model: xai.responses('grok-4.20-reasoning'), prompt: 'What is prompt caching?', providerOptions: { xai: { promptCacheKey: 'b79ad29b-b3f9-463c-bca6-041d5058d366', }, }, }); console.log(text); console.log(`Total tokens: ${usage.totalTokens}`); ``` ## Set `x-grok-conv-id` metadata (gRPC API) For the gRPC API using the xAI SDK, pass `x-grok-conv-id` as gRPC metadata to enable sticky routing for cache reuse. ```python customLanguage="pythonXAI" from xai_sdk import Client from xai_sdk.chat import system, user client = Client( api_key="YOUR_API_KEY", metadata=(("x-grok-conv-id", "conv_abc123"),), ) chat = client.chat.create(model="grok-4.20-reasoning") chat.append(system("You are Grok, a helpful and truthful AI assistant built by xAI.")) chat.append(user("What is prompt caching?")) response = chat.sample() print(f"Response: {response.content}") print(f"Cached tokens: {response.usage.cached_prompt_text_tokens}") ``` ## Next * [What Breaks Caching](/developers/advanced-api-usage/prompt-caching/multi-turn) ===/developers/advanced-api-usage/prompt-caching/multi-turn=== #### Prompt Caching # What Breaks Caching Any change to earlier messages breaks the cache. Only append new messages at the end. **Keep messages unchanged.** For cache hits in multi-turn conversations, never edit, remove, or reorder earlier messages — only append new ones. For reasoning models, you **must** include `reasoning_content` from previous responses; omitting it is the top cause of cache misses. For reasoning models, you can maintain cache hits by either: * **Sending back the encrypted reasoning content** — Include the `reasoning_content` from the previous response. See [Encrypted Reasoning Content](/developers/model-capabilities/text/reasoning#encrypted-reasoning-content) for details. * **Using stateful responses** — Use `previous_response_id` to automatically continue the conversation. See [Chaining the Conversation](/developers/model-capabilities/text/generate-text#chaining-the-conversation) for details. ## Cache hit — appending a new message The prompt prefix is identical to the previous request, with only a new user message appended: ```bash customLanguage="bash" addedLines="26" # Turn 1: Initial request (establishes the cache) curl https://api.x.ai/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $XAI_API_KEY" \ -H "x-grok-conv-id: conv_abc123" \ -d '{ "model": "grok-4.20-reasoning", "messages": [ {"role": "system", "content": "You are Grok, a helpful and truthful AI assistant built by xAI."}, {"role": "user", "content": "What is prompt caching?"}, {"role": "assistant", "content": "Prompt caching stores KV pairs from unchanged prompt prefixes so they can be reused on subsequent requests. This makes responses faster and cheaper."} ] }' # Turn 2: Cache HIT — exact prefix preserved, new message appended curl https://api.x.ai/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $XAI_API_KEY" \ -H "x-grok-conv-id: conv_abc123" \ -d '{ "model": "grok-4.20-reasoning", "messages": [ {"role": "system", "content": "You are Grok, a helpful and truthful AI assistant built by xAI."}, {"role": "user", "content": "What is prompt caching?"}, {"role": "assistant", "content": "Prompt caching stores KV pairs from unchanged prompt prefixes so they can be reused on subsequent requests. This makes responses faster and cheaper."}, {"role": "user", "content": "Show me a code example."} ] }' ``` ```python customLanguage="pythonOpenAISDK" from openai import OpenAI client = OpenAI( api_key="YOUR_XAI_API_KEY", base_url="https://api.x.ai/v1", ) conversation_id = "conv_abc123" messages = [ {"role": "system", "content": "You are Grok, a helpful and truthful AI assistant built by xAI."}, {"role": "user", "content": "What is prompt caching?"}, ] # Turn 1: Initial request (establishes the cache) response = client.chat.completions.create( model="grok-4.20-reasoning", messages=messages, extra_headers={"x-grok-conv-id": conversation_id}, ) print(f"Turn 1 — Cached tokens: {response.usage.prompt_tokens_details.cached_tokens}") # Append the assistant's reply and the next user message messages.append({"role": "assistant", "content": response.choices[0].message.content}) messages.append({"role": "user", "content": "Show me a code example."}) # Turn 2: Cache HIT — prefix is unchanged, only new messages appended response = client.chat.completions.create( model="grok-4.20-reasoning", messages=messages, extra_headers={"x-grok-conv-id": conversation_id}, ) print(f"Turn 2 — Cached tokens: {response.usage.prompt_tokens_details.cached_tokens}") ``` ```javascript customLanguage="javascriptOpenAISDK" import OpenAI from 'openai'; const client = new OpenAI({ apiKey: 'YOUR_XAI_API_KEY', baseURL: 'https://api.x.ai/v1', }); const conversationId = 'conv_abc123'; const messages = [ { role: 'system', content: 'You are Grok, a helpful and truthful AI assistant built by xAI.', }, { role: 'user', content: 'What is prompt caching?' }, ]; // Turn 1: Initial request (establishes the cache) const turn1 = await client.chat.completions.create( { model: 'grok-4.20-reasoning', messages }, { headers: { 'x-grok-conv-id': conversationId } }, ); console.log( `Turn 1 — Cached tokens: ${turn1.usage.prompt_tokens_details.cached_tokens}`, ); // Append the assistant reply and next user message messages.push({ role: 'assistant', content: turn1.choices[0].message.content }); messages.push({ role: 'user', content: 'Show me a code example.' }); // Turn 2: Cache HIT — prefix unchanged, new message appended const turn2 = await client.chat.completions.create( { model: 'grok-4.20-reasoning', messages }, { headers: { 'x-grok-conv-id': conversationId } }, ); console.log( `Turn 2 — Cached tokens: ${turn2.usage.prompt_tokens_details.cached_tokens}`, ); ``` ## Cache miss — editing an earlier message Changing the content of any earlier message breaks the prefix match: ```bash customLanguage="bash" deletedLines="11" addedLines="12" # Cache MISS — editing the assistant message content curl https://api.x.ai/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $XAI_API_KEY" \ -H "x-grok-conv-id: conv_abc123" \ -d '{ "model": "grok-4.20-reasoning", "messages": [ {"role": "system", "content": "You are Grok, a helpful and truthful AI assistant built by xAI."}, {"role": "user", "content": "What is prompt caching?"}, {"role": "assistant", "content": "Prompt caching stores KV pairs from unchanged prompt prefixes so they can be reused on subsequent requests. This makes responses faster and cheaper."}, {"role": "assistant", "content": "It stores KV pairs."}, {"role": "user", "content": "Show me a code example."} ] }' ``` **What changed:** The assistant response on line 11 was shortened to `"It stores KV pairs."` (line 12). ## Cache miss — removing a message Removing any message from the conversation breaks the prefix: ```bash customLanguage="bash" deletedLines="11" # Cache MISS — the assistant message was removed curl https://api.x.ai/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $XAI_API_KEY" \ -H "x-grok-conv-id: conv_abc123" \ -d '{ "model": "grok-4.20-reasoning", "messages": [ {"role": "system", "content": "You are Grok, a helpful and truthful AI assistant built by xAI."}, {"role": "user", "content": "What is prompt caching?"}, {"role": "assistant", "content": "Prompt caching stores KV pairs from unchanged prompt prefixes so they can be reused on subsequent requests. This makes responses faster and cheaper."}, {"role": "user", "content": "Show me a code example."} ] }' ``` **What changed:** The assistant message on line 11 was removed entirely. ## Cache miss — reordering messages Changing the order of messages also breaks the prefix: ```bash customLanguage="bash" deletedLines="9,10" addedLines="9,10" # Cache MISS — user and system messages are swapped curl https://api.x.ai/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $XAI_API_KEY" \ -H "x-grok-conv-id: conv_abc123" \ -d '{ "model": "grok-4.20-reasoning", "messages": [ {"role": "user", "content": "What is prompt caching?"}, {"role": "system", "content": "You are Grok, a helpful and truthful AI assistant built by xAI."}, {"role": "assistant", "content": "Prompt caching stores KV pairs from unchanged prompt prefixes so they can be reused on subsequent requests. This makes responses faster and cheaper."}, {"role": "user", "content": "Show me a code example."} ] }' ``` **What changed:** Lines 9 and 10 were swapped — the user message now comes before the system message. ## Next * [Usage & Pricing](/developers/advanced-api-usage/prompt-caching/usage-and-pricing) ===/developers/advanced-api-usage/prompt-caching=== #### Advanced API Usage # Prompt Caching When consecutive requests share the same starting messages, the xAI API automatically caches them. On the next request, messages at the beginning that match exactly are served from cache: * **Faster time-to-first-token** — the model skips re-computing cached messages * **Lower cost** — cached tokens are billed at a reduced rate The xAI API performs prompt caching **automatically**. However, we recommend setting the `x-grok-conv-id` HTTP header to maximize your cache hit rate. ## In this section * [How It Works](/developers/advanced-api-usage/prompt-caching/how-it-works) — Understand how caching works from the start of your messages array * [Maximizing Cache Hits](/developers/advanced-api-usage/prompt-caching/maximizing-cache-hits) — Set up `x-grok-conv-id` and `prompt_cache_key` for optimal caching * [What Breaks Caching](/developers/advanced-api-usage/prompt-caching/multi-turn) — Common mistakes that cause cache misses * [Usage & Pricing](/developers/advanced-api-usage/prompt-caching/usage-and-pricing) — Read cached token counts and understand billing * [Best Practices & FAQ](/developers/advanced-api-usage/prompt-caching/best-practices) — Tips, supported models, and common questions ===/developers/advanced-api-usage/prompt-caching/usage-and-pricing=== #### Prompt Caching # Usage & Pricing ## Chat Completions API Cached tokens appear in `usage.prompt_tokens_details.cached_tokens`: ```json customLanguage="json" { "usage": { "prompt_tokens": 125, "completion_tokens": 48, "total_tokens": 173, "prompt_tokens_details": { "text_tokens": 125, "audio_tokens": 0, "image_tokens": 0, "cached_tokens": 98 }, "completion_tokens_details": { "reasoning_tokens": 0, "audio_tokens": 0, "accepted_prediction_tokens": 0, "rejected_prediction_tokens": 0 } } } ``` ## Responses API Cached tokens appear in `usage.input_tokens_details.cached_tokens`: ```json customLanguage="json" { "usage": { "input_tokens": 125, "output_tokens": 48, "total_tokens": 173, "input_tokens_details": { "cached_tokens": 98 }, "output_tokens_details": { "reasoning_tokens": 0 } } } ``` ## Verifying cache hits To determine whether your request benefitted from prompt caching, check the `cached_tokens` value in the response: | `cached_tokens` value | What it means | |---|---| | `0` | **Cache miss** — the entire prompt was computed from scratch. This is expected on the first request or after cache eviction. | | `> 0` | **Cache hit** — some or all of your prompt prefix was served from cache. The number indicates how many tokens were reused. | | Equal to `prompt_tokens` | **Full cache hit** — your entire prompt was served from cache (rare, typically happens when resending the exact same request). | A typical multi-turn conversation shows increasing `cached_tokens` over time: ```text Turn 1: prompt_tokens=50, cached_tokens=0 # First request, cache established Turn 2: prompt_tokens=120, cached_tokens=50 # Previous 50 tokens cached Turn 3: prompt_tokens=200, cached_tokens=120 # Previous 120 tokens cached ``` If `cached_tokens` is consistently 0 across multiple requests in the same conversation, verify that you're setting `x-grok-conv-id` (or `prompt_cache_key`) and that you're not modifying earlier messages between requests. ## Pricing Cached tokens are billed at the **cached prompt token price**, which is substantially lower than the regular prompt token price. The exact rates vary by model — check the [Models and Pricing](/developers/models) page for current prices. | Token type | Billing rate | |---|---| | Prompt tokens (non-cached) | Full prompt token price | | Cached prompt tokens | Reduced cached prompt token price | | Completion tokens | Full completion token price | | Reasoning tokens | Full completion token price | Long context pricing applies when total prompt tokens (including cached tokens) exceed the model's long context threshold. Both cached and non-cached tokens use their respective long-context rates in this case. ## Next * [Best Practices & FAQ](/developers/advanced-api-usage/prompt-caching/best-practices) ===/developers/advanced-api-usage/provisioned-throughput=== #### Advanced API Usage # Provisioned Throughput Provisioned Throughput allows enterprise customers to purchase dedicated input and output token capacity for specific models. Buy units with a minimum 30-day commitment for predictable, high-performance API access. Contact [support@x.ai](mailto:support@x.ai?subject=Provisioned%20Throughput%20Request) to get started with Provisioned Throughput. ## Key Benefits * **Predictable latency** — Faster, more consistent response times compared to pay-as-you-go, even during peak usage * **Uncapped scale** — Your purchased capacity adds directly to your rate limits; overages use standard pay-as-you-go rates * **High reliability** — 99.9% uptime SLA with enterprise-grade availability guarantees ## Pricing Each unit costs **$10.00 per day** and provides a fixed amount of tokens per minute (TPM): XAIPTUTABLEPLACEHOLDER ## How to Calculate Units ```text Input Units = Required Input TPM ÷ TPM per Unit (Input) Output Units = Required Output TPM ÷ TPM per Unit (Output) Daily Cost = (Input Units + Output Units) × $10 ``` **Example**: You need 100,000 input TPM and 50,000 output TPM with **grok-4-1-fast-reasoning**: * Input units: 100,000 ÷ 31,500 = **4 units** * Output units: 50,000 ÷ 12,500 = **4 units** * Daily cost: 8 × $10 = **$80/day** ($2,400 for 30 days) ## Getting Started 1. Contact [support@x.ai](mailto:support@x.ai) with your expected TPM and preferred models 2. Receive a custom quote based on your requirements 3. Sign the order form and your capacity will be activated ## How It Works Once activated, your provisioned capacity is automatically applied to all API requests from your team. ### Optional Headers You can control provisioned throughput behavior with these headers: | Header | Description | |--------|-------------| | `x-pt-disable: true` | Skip provisioned capacity and use pay-as-you-go for this request | | `x-pt-id: ` | Route the request to a specific capacity pool (if you have multiple allocations) | ## FAQ ### What happens if I exceed my provisioned capacity? Requests exceeding your allocation fall back to standard rate limits at pay-as-you-go pricing. ### Can I adjust my allocation? Yes. You can add units at any time. Contact support to modify your allocation. ### What's the minimum commitment? 30 days per unit. ===/developers/advanced-api-usage/use-with-code-editors=== # Use with Code Editors You can use Grok with coding assistant plugins to help you code. Our Code models are specifically optimized for this task, which would provide you a smoother experience. For pricing and limits of Code models, check out [Models and Pricing](/developers/models). ## Using Grok Code models with Cline To use Grok with Cline, first download Cline from VSCode marketplace. Once you have installed Cline in VSCode, open Cline. Click on "Use your own API key". Then, you can save your xAI API key to Cline. After setting up your xAI API key with Cline, you can set to use a coding model. Go to Cline settings -> API Configuration and you can choose `grok-4.20-reasoning` as the model. ## Using Grok Code models with Cursor You can also use Grok with Cursor to help you code. After installing Cursor, head to Cursor Settings -> Models. Open API Keys settings, enter your xAI API key and set Override OpenAI Base URL to `https://api.x.ai/v1` In the "Add or search model" input box, enter a coding model such as `grok-4.20-reasoning`. Then click on "Add Custom Model". ## Other code assistants supporting Grok Code models Besides Cline and Cursor, you can also use our code model with [GitHub Copilot](https://github.com/features/copilot), [opencode](https://opencode.ai/), [Kilo Code](https://kilocode.ai/), [Roo Code](https://roocode.com/) and [Windsurf](https://windsurf.com/). ===/developers/community=== #### Resources # Community Integrations Grok is also accessible via your favorite community integrations, enabling you to connect Grok to other parts of your system easily. ## Third-party SDK/frameworks ### LiteLLM LiteLLM provides a simple SDK or proxy server for calling different LLM providers. If you're using LiteLLM, integrating xAI as your provider is straightforward—just swap out the model name and API key to xAI's Grok model in your configuration. For latest information and more examples, visit [LiteLLM xAI Provider Documentation](https://docs.litellm.ai/docs/providers/xai). As a quick start, you can use LiteLLM in the following fashion: ```pythonWithoutSDK from litellm import completion import os os.environ['XAI_API_KEY'] = "" response = completion( model="xai/grok-4.20-reasoning", messages=[ { "role": "user", "content": "What's the weather like in Boston today in Fahrenheit?", } ], max_tokens=10, response_format={ "type": "json_object" }, seed=123, temperature=0.2, top_p=0.9, user="user", ) print(response) ``` ### Vercel AI SDK [Vercel's AI SDK](https://sdk.vercel.ai/) supports a [xAI Grok Provider](https://sdk.vercel.ai/providers/ai-sdk-providers/xai) for integrating with xAI API. By default it uses your xAI API key in `XAI_API_KEY` variable. To generate text use the `generateText` function: ```javascriptAISDK import { xai } from '@ai-sdk/xai'; import { generateText } from 'ai'; const { text } = await generateText({ model: xai.responses('grok-4.20-reasoning'), prompt: 'Write a vegetarian lasagna recipe for 4 people.', }); ``` You can also customize the setup like the following: ```javascriptAISDK import { createXai } from '@ai-sdk/xai'; const xai = createXai({ apiKey: 'your-api-key', }); ``` You can also generate images with the `generateImage` function: ```javascriptAISDK import { xai } from '@ai-sdk/xai'; import { experimental_generateImage as generateImage } from 'ai'; const { image } = await generateImage({ model: xai.image('grok-imagine-image'), prompt: 'A cat in a tree', }); ``` ## Coding assistants ### Continue - VSCode or JetBrains You can use Continue extension in VSCode or JetBrains with xAI's models. To start using xAI models with Continue, you can add the following in Continue's config file `~/.continue/config.json`(MacOS and Linux)/`%USERPROFILE%\.continue\config.json`(Windows). ```json "models": [ { "title": "grok-4.20-reasoning", "provider": "xAI", "model": "grok-4.20-reasoning", "apiKey": "[XAI_API_KEY]" } ] ``` Visit [Continue's Documentation](https://docs.continue.dev/chat/model-setup#grok-2-from-xai) for more details. ### Xcode To use xAI API with Xcode, you need to add xAI API endpoint as a new Model Provider in Xcode coding intelligence. Go to Xcode -> Settings -> Intelligence, and click on "Add a Provider..." button. In the pop-up menu, enter the following: * URL: `https://api.x.ai` * API Key: Your xAI API key starting with `xai-...`, without Bearer * API Key Header: `Authorization` For more information, you can review Apple's guide on setting up [a new Model Provider in Xcode coding intelligence](https://developer.apple.com/documentation/xcode/setting-up-coding-intelligence). ===/developers/debugging=== #### Getting Started # Debugging Errors When you send a request, you would normally get a `200 OK` response from the server with the expected response body. If there has been an error with your request, or error with our service, the API endpoint will typically return an error code with error message. If there is an ongoing service disruption, you can visit [https://status.x.ai](https://status.x.ai) for the latest updates. The status is also available via RSS at [https://status.x.ai/feed.xml](https://status.x.ai/feed.xml). The service status is also indicated in the navigation bar of this site. Most of the errors will be accompanied by an error message that is self-explanatory. For typical status codes of each endpoint, visit [API Reference](/developers/rest-api-reference). ## Status Codes Here is a list of potential errors and statuses arranged by status codes. ### 4XX Status Codes | Status Code | Endpoints | Cause | Solution | | ------------------------------ | -------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------ | | 400Bad Request | All Endpoints | - A `POST` method request body specified an invalid argument, or a `GET` method with dynamic route has an invalid param in the URL.- An incorrect API key is supplied. | - Please check your request body or request URL. | | 401Unauthorized | All Endpoints | - No authorization header or an invalid authorization token is provided. | - Supply an `Authorization: Bearer Token ` in the request header. You can get a new API key on [xAI Console](https://console.x.ai). | | 403Forbidden | All Endpoints | - Your API key/team doesn't have permission to perform the action.- Your API key/team is blocked. | - Ask your team admin for permission. | | 404Not Found | All Endpoints | - A model specified in a `POST` method request body is not found.- Trying to reach an invalid endpoint URL. (Misspelled URL) | - Check your request body and endpoint URL with our [API Reference](/developers/rest-api-reference). | | 405Method Not Allowed | All Endpoints | - The request method is not allowed. For example, sending a `POST` request to an endpoint supporting only `GET`. | - Check your request method with our [API Reference](/developers/rest-api-reference). | | 415Unsupported Media Type | All Endpoints Supporting `POST` Method | - An empty request body in `POST` requests.- Not specifying `Content-Type: application/json` header. | - Add a valid request body. - Ensure `Content-Type: application/json` header is present in the request header. | | 422Unprocessable Entity | All Endpoints Supporting `POST` Method | - An invalid format for a field in the `POST` request body. | - Check your request body is valid. You can find more information from [API Reference](/developers/rest-api-reference). | | 429Too Many Requests | All Inference Endpoints | - You are sending requests too frequently and reaching rate limit | - Reduce your request rate or increase your rate limit. You can find your current rate limit on [xAI Console](https://console.x.ai). | ### 2XX Error Codes | Status Code | Endpoints | Cause | Solution | | ---------------- | ------------------------------------------- | -------------------------------------------------------------------------------------------------------- | ------------------------------ | | 202Accepted | `/v1/chat/deferred-completion/{request_id}` | - Your deferred chat completion request is queued for processing, but the response is not available yet. | - Wait for request processing. | ## Bug Report If you believe you have encountered a bug and would like to contribute to our development process, [email API Bug Report](mailto:support@x.ai?subject=API%20Bug%20Report) to support@x.ai with your API request and response and relevant logs. You can also chat in the `#help` channel of our [xAI API Developer Discord](https://discord.gg/x-ai). ===/developers/docs-mcp=== #### Community # Docs MCP Server xAI hosts a [Model Context Protocol (MCP)](https://modelcontextprotocol.io/) server that gives AI assistants and agents direct access to the xAI documentation. Instead of copy-pasting docs into a prompt, you can point any MCP-compatible client at the server and let it pull the information it needs. You can use this with all popular IDEs/Editors of your choice. **Endpoint:** ``` https://docs.x.ai/api/mcp ``` The server uses the **Streamable HTTP** transport and runs in stateless mode — no session management required. ## Quickstart ### Cursor In Cursor, go to **Settings → MCP** and add a new server: * **Type:** `url` (Streamable HTTP) * **URL:** `https://docs.x.ai/api/mcp` ### Zed In Zed, go to `agent: open settings` -> Model Context Protocol (MCP) Servers. Add the following to a new server configuration. ```json { "xai-docs": { "url": "https://docs.x.ai/api/mcp" } } ``` ### Windsurf In Windsurf, go to **Settings → MCP** and add a new server using the same endpoint URL. ### OpenCode In OpenCode Config under `mcp`, add the following config: ```json { "$schema": "https://opencode.ai/config.json", "mcp": { "xai-docs": { "type": "remote", "url": "https://docs.x.ai/api/mcp", "enabled": true, }, }, } ``` ### Any MCP-Compatible Client Any client that supports the **Streamable HTTP** transport can connect by pointing to the endpoint URL. For example, using the MCP TypeScript SDK: ```javascript customLanguage="javascriptWithoutSDK" import { StreamableHTTPClientTransport } from '@modelcontextprotocol/sdk/client/streamableHttp.js'; import { Client } from '@modelcontextprotocol/sdk/client/index.js'; const client = new Client({ name: 'my-app', version: '1.0.0' }); const transport = new StreamableHTTPClientTransport( new URL('https://docs.x.ai/api/mcp'), ); await client.connect(transport); // List all available doc pages const result = await client.callTool({ name: 'list_doc_pages' }); console.log(result); // Get a specific page const page = await client.callTool({ name: 'get_doc_page', arguments: { slug: 'developers/quickstart' }, }); console.log(page); ``` ### Using curl You can also interact with the MCP server directly via HTTP. The server accepts JSON-RPC requests: ```bash customLanguage="bash" # Initialize (optional — the server is stateless) curl -X POST https://docs.x.ai/api/mcp \ -H "Content-Type: application/json" \ -H "Accept: application/json, text/event-stream" \ -d '{ "jsonrpc": "2.0", "method": "initialize", "params": { "capabilities": {}, "clientInfo": { "name": "curl", "version": "1.0.0" }, "protocolVersion": "2025-03-26" }, "id": 1 }' # List available tools curl -X POST https://docs.x.ai/api/mcp \ -H "Content-Type: application/json" \ -H "Accept: application/json, text/event-stream" \ -d '{ "jsonrpc": "2.0", "method": "tools/list", "params": {}, "id": 2 }' # Call a tool curl -X POST https://docs.x.ai/api/mcp \ -H "Content-Type: application/json" \ -H "Accept: application/json, text/event-stream" \ -d '{ "jsonrpc": "2.0", "method": "tools/call", "params": { "name": "search_docs", "arguments": { "query": "rate limits", "max_results": 3 } }, "id": 3 }' ``` ===/developers/faq/accounts=== #### FAQ # Accounts ## How do I create an account for the API? You can create an account at https://accounts.x.ai, or https://console.x.ai. To link your X account automatically to your xAI account, choose to sign up with X account. You can create multiple accounts of different sign-in methods with the same email. When you sign-up with a sign-in method and with the same email, we will prompt you whether you want to create a new account, or link to the existing account. We will not be able to merge the content, subscriptions, etc. of different accounts. ## How do I update my xAI account email? You can visit [xAI Accounts](https://accounts.x.ai). On the Account page, you can update your email. ## How do I add other sign-in methods? Once you have signed-up for an account, you can add additional sign-in methods by going to [xAI Accounts](https://accounts.x.ai). ## I've forgotten my Multi-Factor Authentication (MFA) method, can you remove it? You can generate your recovery codes at [xAI Accounts](https://accounts.x.ai) Security page. We can't remove or reset your MFA method unless you have recovery codes due to security considerations. Please reach out to support@x.ai if you would like to delete the account instead. ## If I already have an account for Grok, can I use the same account for API access? Yes, the account is shared between Grok and xAI API. You can manage the sign-in details at https://accounts.x.ai. However, the billing is separate for Grok and xAI API. You can manage your billing for xAI API on [xAI Console](https://console.x.ai). To manage billing for Grok, visit https://grok.com -> Settings -> Billing, or directly with Apple/Google if you made the purchase via Apple App Store or Google Play. ## How do I manage my account? You can visit [xAI Accounts](https://accounts.x.ai) to manage your account. Please note the xAI account is different from the X account, and xAI cannot assist you with X account issues. Please contact X via [X Help Center](https://help.x.com/) or Premium Support if you encounters any issues with your X account. ## I received an email of someone logging into my xAI account xAI will send an email to you when someone logs into your xAI account. The login location is an approximation based on your IP address, which is dependent on your network setup and ISP and might not reflect exactly where the login happened. If you think the login is not you, please [reset your password](https://accounts.x.ai/request-reset-password) and [clear your login sessions](https://accounts.x.ai/sessions). We also recommend all users to [add a multi-factor authentication method](https://accounts.x.ai/security). ## How do I delete my xAI account? We are sorry to see you go! You can visit [xAI Accounts](https://accounts.x.ai/account) to delete your account. You can restore your account after log in again and confirming restoration within 30 days. You can cancel the deletion within 30 days by logging in again to any xAI websites and follow the prompt to confirm restoring the account. For privacy requests, please go to: https://privacy.x.ai. ===/developers/faq/billing=== #### FAQ # Billing ## I'm having payment issues with an Indian payment card Unfortunately we cannot process Indian payment cards for our API service. We are working toward supporting it but you might want to consider using a third-party API in the meantime. As Grok Website and Apps' payments are handled differently, those are not affected. ## When will I be charged? * Prepaid Credits: If you choose to use prepaid credits, you’ll be charged when you buy them. These credits will be assigned to the team you select during purchase. * Monthly Invoiced Billing: If you set your [invoiced spending limit](/console/billing#monthly-invoiced-billing-and-invoiced-billing-limit) above $0, any usage beyond your prepaid credits will be charged at the end of the month. * API Usage: When you make API requests, the cost is calculated immediately. The amount is either deducted from your available prepaid credits or added to your monthly invoice if credits are exhausted. If you change your [invoiced spending limit](/console/billing#monthly-invoiced-billing-and-invoiced-billing-limit) to be greater than $0, you will be charged at the end of the month for any extra consumption after your prepaid credit on the team has run out. Your API consumption will be calculated when making the requests, and the corresponding amount will be deducted from your remaining credits or added to your monthly invoice. Check out [Billing](/console/billing) for more information. ## Can you retroactively generate an invoice with new billing information? We are unable to retroactively generate an invoice. Please ensure your billing information is correct on [xAI Console](https://console.x.ai) Billing -> Payment. ## Can prepaid API credits be refunded? Unfortunately, we are not able to offer refunds on any prepaid credit purchase unless in regions required by law. For details, please visit https://x.ai/legal/terms-of-service-enterprise. ### My prompt token consumption from the API is different from the token count I get from xAI Console Tokenizer or tokenize text endpoint The inference endpoints add pre-defined tokens to help us process the request. Therefore, these tokens would be added to the total prompt token consumption. For more information, see: [Estimating consumption with tokenizer on xAI Console or Estimating consumption with tokenizer on xAI Console or through API](/developers/rate-limits#estimating-consumption-with-tokenizer-on-xai-console-or-through-api). ===/developers/faq/general=== #### FAQ # Frequently Asked Questions - General Frequently asked questions by our customers. For product-specific questions, visit or . ### Does the xAI API provide access to live data? Yes! With the agentic server-side [Web Search](/developers/tools/web-search) and [X Search](/developers/tools/x-search) tools. ### How do I contact Sales? For customers with bespoke needs or to request custom pricing, please fill out our [Grok for Business form](https://x.ai/grok/business). A member of our team will reach out with next steps. You can also email us at [sales@x.ai](mailto:sales@x.ai). ### Where are your Terms of Service and Privacy Policy? Please refer to our [Legal Resources](https://x.ai/legal) for our Enterprise Terms of Service and Data Processing Addendum. ### Does xAI sell crypto tokens? xAI is not affiliated with any cryptocurrency. We are aware of several scam websites that unlawfully use our name and logo. ===/developers/faq/security=== #### FAQ # Security ## Does xAI train on customers' API requests? xAI never trains on your API inputs or outputs without your explicit permission. API requests and responses are temporarily stored on our servers for 30 days in case they need to be audited for potential abuse or misuse. This data is automatically deleted after 30 days. For teams that require stricter data handling, see [Zero Data Retention (ZDR)](#what-is-zero-data-retention-zdr) below. ## What is Zero Data Retention (ZDR)? Zero Data Retention (ZDR) is an enterprise feature that prevents xAI from storing any API request or response data. ZDR is exclusively available to enterprise accounts. When ZDR is enabled for your team, your prompts, completions, and associated metadata are processed in real time but never persisted to our servers; once a response is delivered, no record of the exchange remains. For more information about ZDR and enterprise plans, please contact [sales@x.ai](mailto:sales@x.ai). ### How it works * **No logging:** API inputs and outputs are not written to any datastore. The 30-day audit retention described above does not apply to ZDR-enabled teams. * **Moderation still runs:** Safety and content moderation checks are performed in real time, but moderation results are not stored. * **Response header:** Every API response includes an `x-zero-data-retention` header set to `"true"` or `"false"`, so your application can programmatically confirm that ZDR is active. ### How to enable ZDR ZDR is only available to enterprise accounts. To learn more or enable ZDR for your organization, please reach out to [sales@x.ai](mailto:sales@x.ai). Once enabled, ZDR applies automatically to all API requests made with that team's API keys—no code changes are required. You can verify ZDR is active for your team in the [xAI Console](https://console.x.ai/) team picker, which displays a "Zero Data Retention" label beneath your team name. ### Considerations * **No server-side conversation history:** Because requests are not stored, features that rely on server-side state—such as the Responses API's automatic conversation threading via `previous_response_id`—are unavailable. You must manage conversation context client-side, e.g., by using `use_encrypted_content` for [agentic tool-calling state](/developers/tools/advanced-usage#append-the-encrypted-agentic-tool-calling-states). * **No audit log entries for request content:** Audit logs will still record administrative events (key creation, team changes, etc.), but the content of API requests and responses will not appear. ## Is the xAI API HIPAA compliant? To inquire about a Business Associate Agreement (BAA), please complete our [BAA Questionnaire](https://forms.gle/YAEdX3XUp6MvdEXW9). A member of our team will review your responses and reach out with next steps. ## Is xAI GDPR and SOC II compliant? We are SOC 2 Type 2 compliant. Customers with a signed NDA can refer to our [Trust Center](https://trust.x.ai/) for up-to-date information on our certifications and data governance. ## Do you have Audit Logs? Team admins are able to view an audit log of user interactions. This lists all of the user interactions with our API server. You can view it at [xAI Console -> Audit Log](https://console.x.ai/team/default/audit). The admin can also search by Event ID, Description or User to filter the results shown. For example, this is to filter by description matching `ListApiKeys`: You can also view the audit log across a range of dates with the time filter: ## How can I securely manage my API keys? Treat your xAI API keys as sensitive information, like passwords or credit card details. Do not share keys between teammates to avoid unauthorized access. Store keys securely using environment variables or secret management tools. Avoid committing keys to public repositories or source code. Rotate keys regularly for added security. If you suspect a compromise, log into the xAI console first. Ensure you are viewing the correct team, as API keys are tied to specific teams. Navigate to the "API Keys" section via the sidebar. In the API Keys table, click the vertical ellipsis (three dots) next to the key. Select "Disable key" to deactivate it temporarily or "Delete key" to remove it permanently. Then, click the "Create API Key" button to generate a new one and update your applications. xAI partners with GitHub's Secret Scanning program to detect leaked keys. If a leak is found, we disable the key and notify you via email. Monitor your account for unusual activity to stay protected. ===/developers/faq/team-management=== #### FAQ # Team Management ## What are teams? Teams are the level at which xAI tracks API usage, processes billing, and issues invoices. * If you’re the team creator and don’t need a new team, you can rename your Personal Team and add members instead of creating a new one. * Each team has **roles**: * **Admin**: Can modify team name, billing details, and manage members. * **Member**: Cannot make these changes. * The team creator is automatically an Admin. ## Which team am I on? When you sign up for xAI, you’re automatically assigned to a **Personal Team**, which you can view the top bar of [xAI Console](https://console.x.ai). ## How can I manage teams and team members? ### Create a Team 1. Click the dropdown menu in the xAI Console. 2. Select **+ Create Team**. 3. Follow the on-screen instructions. You can edit these details later. ### Rename or Describe a Team Admins can update the team name and description on the [Settings page](https://console.x.ai/team/default/settings). ### Manage Team Members Admins can add or remove members by email on the [Users page](https://console.x.ai/team/default/users). * Assign members as **Admin** or **Member**. * If a user is removed, their API keys remain with the team. ### Delete a Team Deleting a team removes its prepaid credits. To permanently delete a team: 1. Go to the [Settings page](https://console.x.ai/team/default/settings). 2. Follow the instructions under **Delete Team**. ## How to automatically add users to team with my organization's email domain? Admins can enable automatic team joining for users with a shared email domain: 1. Go to the [Settings page](https://console.x.ai/team/default/settings). 2. Add the domain under **Verified Domains**. 3. Add a `domain-verification` key to your domain’s DNS TXT record to verify ownership. Users signing up with a verified domain email will automatically join the team. ===/developers/files/collections/api=== #### Files & Collections # Using Collections via API This guide walks you through managing collections programmatically using the xAI SDK and REST API. ## Creating a Management Key To use the Collections API, you need to create a Management API Key with the `AddFileToCollection` permission. This permission is required for uploading documents to collections. 1. Navigate to the **Management Keys** section in the [xAI Console](https://console.x.ai/team/default/settings/management-keys) 2. Click on **Create Management Key** 3. Select the `AddFileToCollection` permission along with any other permissions you need 4. If you need to perform operations other than uploading documents (such as creating, updating, or deleting collections), enable the corresponding permissions in the **Collections Endpoint** group 5. Copy and securely store your Management API Key Make sure to copy your Management API Key immediately after creation. You won't be able to see it again. ## Creating a collection ```python customLanguage="pythonXAI" import os from xai_sdk import Client client = Client( api_key=os.getenv("XAI_API_KEY"), management_api_key=os.getenv("XAI_MANAGEMENT_API_KEY"), timeout=3600, ) collection = client.collections.create( name="SEC Filings", ) print(collection) ``` ```javascript customLanguage="javascriptWithoutSDK" const response = await fetch('https://management-api.x.ai/v1/collections', { method: 'POST', headers: { 'Content-Type': 'application/json', 'Authorization': `Bearer ${process.env.XAI_MANAGEMENT_API_KEY}`, }, body: JSON.stringify({ collection_name: 'SEC Filings' }), }); const collection = await response.json(); console.log(collection); ``` ```bash curl https://management-api.x.ai/v1/collections \ -X POST \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $XAI_MANAGEMENT_API_KEY" \ -d '{"collection_name": "SEC Filings"}' ``` ## Listing collections ```python customLanguage="pythonXAI" # ... Create client collections = client.collections.list() print(collections) ``` ```javascript customLanguage="javascriptWithoutSDK" const response = await fetch('https://management-api.x.ai/v1/collections', { headers: { 'Authorization': `Bearer ${process.env.XAI_MANAGEMENT_API_KEY}`, }, }); const collections = await response.json(); console.log(collections); ``` ```bash curl https://management-api.x.ai/v1/collections \ -H "Authorization: Bearer $XAI_MANAGEMENT_API_KEY" ``` ## Viewing collection configuration ```python customLanguage="pythonXAI" # ... Create client collection = client.collections.get("collection_dbc087b1-6c99-493d-86c6-b401fee34a9d") print(collection) ``` ```javascript customLanguage="javascriptWithoutSDK" const collectionId = 'collection_dbc087b1-6c99-493d-86c6-b401fee34a9d'; const response = await fetch(`https://management-api.x.ai/v1/collections/${collectionId}`, { headers: { 'Authorization': `Bearer ${process.env.XAI_MANAGEMENT_API_KEY}`, }, }); const collection = await response.json(); console.log(collection); ``` ```bash curl https://management-api.x.ai/v1/collections/collection_dbc087b1-6c99-493d-86c6-b401fee34a9d \ -H "Authorization: Bearer $XAI_MANAGEMENT_API_KEY" ``` ## Updating collection configuration ```python customLanguage="pythonXAI" # ... Create client collection = client.collections.update( "collection_dbc087b1-6c99-493d-86c6-b401fee34a9d", name="SEC Filings (New)" ) print(collection) ``` ```javascript customLanguage="javascriptWithoutSDK" const collectionId = 'collection_dbc087b1-6c99-493d-86c6-b401fee34a9d'; const response = await fetch(`https://management-api.x.ai/v1/collections/${collectionId}`, { method: 'PUT', headers: { 'Content-Type': 'application/json', 'Authorization': `Bearer ${process.env.XAI_MANAGEMENT_API_KEY}`, }, body: JSON.stringify({ collection_name: 'SEC Filings (New)' }), }); const collection = await response.json(); console.log(collection); ``` ```bash curl https://management-api.x.ai/v1/collections/collection_dbc087b1-6c99-493d-86c6-b401fee34a9d \ -X PUT \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $XAI_MANAGEMENT_API_KEY" \ -d '{"collection_name": "SEC Filings (New)"}' ``` ## Uploading documents Uploading a document to a collection is a two-step process: 1. Upload the file to the xAI API 2. Add the uploaded file to your collection ```python customLanguage="pythonXAI" # ... Create client with open("tesla-20241231.html", "rb") as file: file_data = file.read() document = client.collections.upload_document( collection_id="collection_dbc087b1-6c99-493d-86c6-b401fee34a9d", name="tesla-20241231.html", data=file_data, ) print(document) ``` ```javascript customLanguage="javascriptWithoutSDK" const collectionId = 'collection_dbc087b1-6c99-493d-86c6-b401fee34a9d'; // Step 1: Upload file const formData = new FormData(); formData.append('file', file); formData.append('purpose', 'assistants'); const uploadResponse = await fetch('https://api.x.ai/v1/files', { method: 'POST', headers: { 'Authorization': `Bearer ${process.env.XAI_API_KEY}` }, body: formData, }); const { id: fileId } = await uploadResponse.json(); // Step 2: Add to collection await fetch(`https://management-api.x.ai/v1/collections/${collectionId}/documents/${fileId}`, { method: 'POST', headers: { 'Authorization': `Bearer ${process.env.XAI_MANAGEMENT_API_KEY}` }, }); ``` ```bash # Step 1: Upload file curl https://api.x.ai/v1/files \ -H "Authorization: Bearer $XAI_API_KEY" \ -F file=@tesla-20241231.html # Step 2: Add file to collection (use file_id from step 1) curl -X POST https://management-api.x.ai/v1/collections/$COLLECTION_ID/documents/$FILE_ID \ -H "Authorization: Bearer $XAI_MANAGEMENT_API_KEY" ``` ### Uploading with metadata fields If your collection has [metadata fields](/developers/files/collections/metadata) defined (the collection must have these fields set in `field_definitions` when created or updated - see the linked metadata page for details), include them using the `fields` parameter: ```python customLanguage="pythonXAI" # ... Create client with open("paper.pdf", "rb") as file: file_data = file.read() document = client.collections.upload_document( collection_id="collection_dbc087b1-6c99-493d-86c6-b401fee34a9d", name="paper.pdf", data=file_data, fields={ "author": "Sandra Kim", "year": "2024", "title": "Q3 Revenue Analysis" }, ) print(document) ``` ```bash curl https://management-api.x.ai/v1/collections/collection_dbc087b1-6c99-493d-86c6-b401fee34a9d/documents \ -H "Authorization: Bearer $XAI_MANAGEMENT_API_KEY" \ -F "name=paper.pdf" \ -F "data=@paper.pdf" \ -F "content_type=application/pdf" \ -F 'fields={"author": "Sandra Kim", "year": "2024", "title": "Q3 Revenue Analysis"}' ``` ## Searching documents You can also search documents using the Responses API with the `file_search` tool. See the [Collections Search Tool](/developers/tools/collections-search) guide for more details. ```python customLanguage="pythonXAI" # ... Create client response = client.collections.search( query="What were the key revenue drivers based on the SEC filings?", collection_ids=["collection_dbc087b1-6c99-493d-86c6-b401fee34a9d"], ) print(response) ``` ```javascript customLanguage="javascriptWithoutSDK" const response = await fetch('https://api.x.ai/v1/documents/search', { method: 'POST', headers: { 'Content-Type': 'application/json', 'Authorization': `Bearer ${process.env.XAI_API_KEY}`, }, body: JSON.stringify({ query: 'What were the key revenue drivers based on the SEC filings?', source: { collection_ids: ['collection_dbc087b1-6c99-493d-86c6-b401fee34a9d'], }, }), }); const results = await response.json(); console.log(results); ``` ```bash curl https://api.x.ai/v1/documents/search \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $XAI_API_KEY" \ -d '{ "query": "What were the key revenue drivers based on the SEC filings?", "source": { "collection_ids": ["collection_dbc087b1-6c99-493d-86c6-b401fee34a9d"] } }' ``` ### Search modes There are three search methods available: * **Keyword search** * **Semantic search** * **Hybrid search** (combines both keyword and semantic methods) By default, the system uses hybrid search, which generally delivers the best and most comprehensive results. | Mode | Description | Best for | Drawbacks | |------|-------------|----------|-----------| | Keyword | Searches for exact matches of specified words, phrases, or numbers | Precise terms (e.g., account numbers, dates, specific financial figures) | May miss contextually relevant content | | Semantic | Understands meaning and context to find conceptually related content | Discovering general ideas, topics, or intent even when exact words differ | Less precise for specific terms | | Hybrid | Combines keyword and semantic search for broader and more accurate results | Most real-world use cases | Slightly higher latency | The hybrid approach balances precision and recall, making it the recommended default for the majority of queries. An example to set hybrid mode: ```bash curl https://api.x.ai/v1/documents/search \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $XAI_API_KEY" \ -d '{ "query": "What were the key revenue drivers based on the SEC filings?", "source": { "collection_ids": [ "collection_dbc087b1-6c99-493d-86c6-b401fee34a9d" ] }, "retrieval_mode": {"type": "hybrid"} }' ``` You can set `"retrieval_mode": {"type": "keyword"}` for keyword search and `"retrieval_mode": {"type": "semantic"}` for semantic search. ## Deleting a document ```python customLanguage="pythonXAI" # ... Create client client.collections.remove_document( collection_id="collection_dbc087b1-6c99-493d-86c6-b401fee34a9d", file_id="file_55a709d4-8edc-4f83-84d9-9f04fe49f832", ) ``` ```javascript customLanguage="javascriptWithoutSDK" const collectionId = 'collection_dbc087b1-6c99-493d-86c6-b401fee34a9d'; const fileId = 'file_55a709d4-8edc-4f83-84d9-9f04fe49f832'; await fetch(`https://management-api.x.ai/v1/collections/${collectionId}/documents/${fileId}`, { method: 'DELETE', headers: { 'Authorization': `Bearer ${process.env.XAI_MANAGEMENT_API_KEY}` }, }); ``` ```bash curl https://management-api.x.ai/v1/collections/collection_dbc087b1-6c99-493d-86c6-b401fee34a9d/documents/file_55a709d4-8edc-4f83-84d9-9f04fe49f832 \ -X DELETE \ -H "Authorization: Bearer $XAI_MANAGEMENT_API_KEY" ``` ## Deleting a collection ```python customLanguage="pythonXAI" # ... Create client client.collections.delete(collection_id="collection_dbc087b1-6c99-493d-86c6-b401fee34a9d") ``` ```javascript customLanguage="javascriptWithoutSDK" const collectionId = 'collection_dbc087b1-6c99-493d-86c6-b401fee34a9d'; await fetch(`https://management-api.x.ai/v1/collections/${collectionId}`, { method: 'DELETE', headers: { 'Authorization': `Bearer ${process.env.XAI_MANAGEMENT_API_KEY}` }, }); ``` ```bash curl https://management-api.x.ai/v1/collections/collection_dbc087b1-6c99-493d-86c6-b401fee34a9d \ -X DELETE \ -H "Authorization: Bearer $XAI_MANAGEMENT_API_KEY" ``` ## Next Steps ===/developers/files/collections/metadata=== #### Files & Collections # Metadata Fields Metadata fields allow you to attach structured attributes to documents in a collection. These fields enable: * **Filtered retrieval** — Narrow search results to documents matching specific criteria (e.g., `author="Sandra Kim"`) * **Contextual embeddings** — Inject metadata into chunks to improve retrieval accuracy (e.g., prepending document title to each chunk) * **Data integrity constraints** — Enforce required fields or uniqueness across documents ## Creating a Collection with Metadata Fields Define metadata fields using `field_definitions` when creating a collection: ```bash curl -X POST "https://management-api.x.ai/v1/collections" \ -H "Authorization: Bearer $XAI_MANAGEMENT_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "collection_name": "research_papers", "field_definitions": [ { "key": "author", "required": true }, { "key": "year", "required": true, "unique": true }, { "key": "title", "inject_into_chunk": true } ] }' ``` ### Field Definition Options | Option | Description | |--------|-------------| | `required` | Document uploads must include this field. Defaults to `false`. | | `unique` | Only one document in the collection can have a given value for this field. Defaults to `false`. | | `inject_into_chunk` | Prepends this field's value to every embedding chunk, improving retrieval by providing context. Defaults to `false`. | ## Uploading Documents with Metadata Include metadata as a JSON object in the `fields` parameter: ```bash curl -X POST "https://management-api.x.ai/v1/collections/{collection_id}/documents" \ -H "Authorization: Bearer $XAI_MANAGEMENT_API_KEY" \ -F "name=paper.pdf" \ -F "data=@paper.pdf" \ -F "content_type=application/pdf" \ -F 'fields={"author": "Sandra Kim", "year": "2024", "title": "Q3 Revenue Analysis"}' ``` ## Filtering Documents in Search Use the `filter` parameter to restrict search results based on metadata values. The filter uses AIP-160 syntax: ```bash curl -X POST "https://api.x.ai/v1/documents/search" \ -H "Authorization: Bearer $XAI_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "query": "revenue growth", "source": { "collection_ids": ["collection_xxx"] }, "filter": "author=\"Sandra Kim\" AND year>=2020" }' ``` ### Supported Filter Operators | Operator | Example | Description | |----------|---------|-------------| | `=` | `author="Jane"` | Equals | | `!=` | `status!="draft"` | Not equals | | `<`, `>`, `<=`, `>=` | `year>=2020` | Numeric/lexical comparison | | `AND` | `a="x" AND b="y"` | Both conditions must match | | `OR` | `a="x" OR a="y"` | Either condition matches | `OR` has higher precedence than `AND`. Use parentheses for clarity: `a="x" AND (b="y" OR b="z")`. Wildcard matching (e.g., `author="E*"`) is not supported. All string comparisons are exact matches. Filtering on fields that don't exist in your documents returns no results. Double-check that field names match your collection's `field_definitions`. ## AIP-160 Filter String Examples ### Basic Examples ```bash # Equality (double or single quotes for strings with spaces) author="Sandra Kim" author='Sandra Kim' # Equality (no quotes needed for simple values) year=2024 status=active # Not equal status!="archived" status!='archived' ``` ### Comparison Operators ```bash # Numeric comparisons year>=2020 year>2019 score<=0.95 price<100 # Combined comparisons (range) year>=2020 AND year<=2024 ``` ### Logical Operators ```bash # AND - both conditions must match author="Sandra Kim" AND year=2024 # OR - either condition matches status="pending" OR status="in_progress" # Combined (OR has higher precedence than AND) department="Engineering" AND status="active" OR status="pending" # Use parentheses for clarity department="Engineering" AND (status="active" OR status="pending") ``` ### Complex Examples ```bash # Multiple conditions author="Sandra Kim" AND year>=2020 AND status!="draft" # Nested logic with parentheses (author="Sandra Kim" OR author="John Doe") AND year>=2020 # Multiple fields with mixed operators category="finance" AND (year=2023 OR year=2024) AND status!="archived" ``` ## Quick Reference | Use Case | Filter String | |----------|---------------| | Exact match | `author="Sandra Kim"` | | Numeric comparison | `year>=2020` | | Not equal | `status!="archived"` | | Multiple conditions | `author="Sandra Kim" AND year=2024` | | Either condition | `status="pending" OR status="draft"` | | Grouped logic | `(status="active" OR status="pending") AND year>=2020` | | Complex filter | `category="finance" AND year>=2020 AND status!="archived"` | ===/developers/files/collections=== #### Files & Collections # Collections Collections offers xAI API users a robust set of tools and methods to seamlessly integrate their enterprise requirements and internal knowledge bases with the xAI API. Whether you're building a RAG application or need to search across large document sets, Collections provides the infrastructure to manage and query your content. **Looking for Files?** If you want to attach files directly to chat messages for conversation context, see [Files](/developers/files). Collections are different—they provide persistent document storage with semantic search across many documents. ## Core Concepts There are two entities that users can create within the Collections service: * **File** — A single entity of a user-uploaded file. * **Collection** — A group of files linked together, with an embedding index for efficient retrieval. * When you create a collection you have the option to automatically generate embeddings for any files uploaded to that collection. You can then perform semantic search across files in multiple collections. * A single file can belong to multiple collections. ## What You Can Do With Collections, you can: * **Create collections** to organize your documents * **Upload documents** in various formats (HTML, PDF, text, etc.) * **Search semantically** across your documents using natural language queries * **Configure chunking and embeddings** to optimize retrieval * **Manage documents** by listing, updating, and deleting them ## Getting Started Choose how you want to work with Collections: * [Using the Console →](/console/collections) - Create collections and upload documents through the xAI Console interface * [Using the API →](/developers/files/collections/api) - Programmatically manage collections with the SDK and REST API ## Metadata Fields Collections support **metadata fields** — structured attributes you can attach to documents for enhanced retrieval and data integrity: * **Filtered retrieval** — Narrow search results to documents matching specific criteria (e.g., `author="Sandra Kim"`) * **Contextual embeddings** — Inject metadata into chunks to improve retrieval accuracy (e.g., prepending document title to each chunk) * **Data integrity constraints** — Enforce required fields or uniqueness across documents When creating a collection, define metadata fields with options like `required`, `unique`, and `inject_into_chunk` to control how metadata is validated and used during search. [Learn more about metadata fields →](/developers/files/collections/metadata) ## Usage Limits To be able to upload files and add to a collections you must have credits in your account. **Maximum file size**: 100MB**Maximum number of files**: 100,000 files uploaded globally.**Maximum total size**: 100GB Please [contact us](https://x.ai/contact) to increase any of these limits. ## Data Privacy We do not use user data stored on Collections for model training purposes. ## Supported MIME Types While we support any `UTF-8` encoded text file, we also have special file conversion and chunking techniques for certain MIME types. The following would be a non-exhaustive list for the MIME types that we support: * application/csv * application/dart * application/ecmascript * application/epub * application/epub+zip * application/json * application/ms-java * application/msword * application/pdf * application/typescript * application/vnd.adobe.pdf * application/vnd.curl * application/vnd.dart * application/vnd.jupyter * application/vnd.ms-excel * application/vnd.ms-outlook * application/vnd.oasis.opendocument.text * application/vnd.openxmlformats-officedocument.presentationml.presentation * application/vnd.openxmlformats-officedocument.presentationml.slide * application/vnd.openxmlformats-officedocument.presentationml.slideshow * application/vnd.openxmlformats-officedocument.presentationml.template * application/vnd.openxmlformats-officedocument.spreadsheetml.sheet * application/vnd.openxmlformats-officedocument.spreadsheetml.template * application/vnd.openxmlformats-officedocument.wordprocessingml.document * application/x-csh * application/x-epub+zip * application/x-hwp * application/x-hwp-v5 * application/x-latex * application/x-pdf * application/x-php * application/x-powershell * application/x-sh * application/x-shellscript * application/x-tex * application/x-zsh * application/xhtml * application/xml * application/zip * text/cache-manifest * text/calendar * text/css * text/csv * text/html * text/javascript * text/jsx * text/markdown * text/n3 * text/php * text/plain * text/rtf * text/tab-separated-values * text/troff * text/tsv * text/tsx * text/turtle * text/uri-list * text/vcard * text/vtt * text/x-asm * text/x-bibtex * text/x-c * text/x-c++hdr * text/x-c++src * text/x-chdr * text/x-coffeescript * text/x-csh * text/x-csharp * text/x-csrc * text/x-d * text/x-diff * text/x-emacs-lisp * text/x-erlang * text/x-go * text/x-haskell * text/x-java * text/x-java-properties * text/x-java-source * text/x-kotlin * text/x-lisp * text/x-lua * text/x-objcsrc * text/x-pascal * text/x-perl * text/x-perl-script * text/x-python * text/x-python-script * text/x-r-markdown * text/x-rst * text/x-ruby-script * text/x-rust * text/x-sass * text/x-scala * text/x-scheme * text/x-script.python * text/x-scss * text/x-sh * text/x-sql * text/x-swift * text/x-tcl * text/x-tex * text/x-vbasic * text/x-vcalendar * text/xml * text/xml-dtd * text/yaml ===/developers/files/managing-files=== #### Files & Collections # Managing Files The Files API provides a complete set of operations for managing your files. If your files are publicly accessible, you can reference them directly by URL in chat conversations — see [Attaching Files](/developers/model-capabilities/files/chat-with-files#attaching-files). For files that aren't publicly accessible, upload them using one of the methods described below. ## Uploading Files You can upload files in several ways: from a file path, raw bytes, BytesIO object, or an open file handle. ### Upload from File Path ```pythonXAI import os from xai_sdk import Client client = Client(api_key=os.getenv("XAI_API_KEY")) # Upload a file from disk file = client.files.upload("/path/to/your/document.pdf") print(f"File ID: {file.id}") print(f"Filename: {file.filename}") print(f"Size: {file.size} bytes") print(f"Created at: {file.created_at}") ``` ```pythonOpenAISDK import os from openai import OpenAI client = OpenAI( api_key=os.getenv("XAI_API_KEY"), base_url="https://api.x.ai/v1", ) # Upload a file with open("/path/to/your/document.pdf", "rb") as f: file = client.files.create( file=f, purpose="assistants" ) print(f"File ID: {file.id}") print(f"Filename: {file.filename}") ``` ```pythonRequests import os import requests url = "https://api.x.ai/v1/files" headers = { "Authorization": f"Bearer {os.getenv('XAI_API_KEY')}" } with open("/path/to/your/document.pdf", "rb") as f: files = {"file": f} data = {"purpose": "assistants"} response = requests.post(url, headers=headers, files=files, data=data) file_data = response.json() print(f"File ID: {file_data['id']}") print(f"Filename: {file_data['filename']}") ``` ```javascriptOpenAISDK import OpenAI from "openai"; import fs from "fs"; const client = new OpenAI({ apiKey: process.env.XAI_API_KEY, baseURL: "https://api.x.ai/v1", }); // Upload a file const file = await client.files.create({ file: fs.createReadStream("/path/to/your/document.pdf"), purpose: "assistants", }); console.log("File ID: " + file.id); console.log("Filename: " + file.filename); ``` ```javascriptWithoutSDK import fs from "fs"; const formData = new FormData(); formData.append("file", new Blob([fs.readFileSync("/path/to/your/document.pdf")]), "document.pdf"); formData.append("purpose", "assistants"); const response = await fetch("https://api.x.ai/v1/files", { method: "POST", headers: { Authorization: \`Bearer \${process.env.XAI_API_KEY}\` }, body: formData, }); const file = await response.json(); console.log("File ID: " + file.id); console.log("Filename: " + file.filename); ``` ```bash curl https://api.x.ai/v1/files \\ -H "Authorization: Bearer $XAI_API_KEY" \\ -F file=@/path/to/your/document.pdf \\ -F purpose=assistants ``` ### Upload from Bytes ```pythonXAI import os from xai_sdk import Client client = Client(api_key=os.getenv("XAI_API_KEY")) # Upload file content directly from bytes content = b"This is my document content.\\nIt can span multiple lines." file = client.files.upload(content, filename="document.txt") print(f"File ID: {file.id}") print(f"Filename: {file.filename}") ``` ### Upload from file object ```pythonXAI import os from xai_sdk import Client client = Client(api_key=os.getenv("XAI_API_KEY")) # Upload a file directly from disk file = client.files.upload(open("document.pdf", "rb"), filename="document.pdf") print(f"File ID: {file.id}") print(f"Filename: {file.filename}") ``` ## Upload with Progress Tracking Track upload progress for large files using callbacks or progress bars. ### Custom Progress Callback ```pythonXAI import os from xai_sdk import Client client = Client(api_key=os.getenv("XAI_API_KEY")) # Define a custom progress callback def progress_callback(bytes_uploaded: int, total_bytes: int): percentage = (bytes_uploaded / total_bytes) * 100 if total_bytes else 0 mb_uploaded = bytes_uploaded / (1024 * 1024) mb_total = total_bytes / (1024 * 1024) print(f"Progress: {mb_uploaded:.2f}/{mb_total:.2f} MB ({percentage:.1f}%)") # Upload with progress tracking file = client.files.upload( "/path/to/large-file.pdf", on_progress=progress_callback ) print(f"Successfully uploaded: {file.filename}") ``` ### Progress Bar with tqdm ```pythonXAI import os from xai_sdk import Client from tqdm import tqdm client = Client(api_key=os.getenv("XAI_API_KEY")) file_path = "/path/to/large-file.pdf" total_bytes = os.path.getsize(file_path) # Upload with tqdm progress bar with tqdm(total=total_bytes, unit="B", unit_scale=True, desc="Uploading") as pbar: file = client.files.upload( file_path, on_progress=pbar.update ) print(f"Successfully uploaded: {file.filename}") ``` ## Listing Files Retrieve a list of your uploaded files with pagination and sorting options. ### Available Options * **`limit`**: Maximum number of files to return. If not specified, uses server default of 100. * **`order`**: Sort order for the files. Either `"asc"` (ascending) or `"desc"` (descending). * **`sort_by`**: Field to sort by. Options: `"created_at"`, `"filename"`, or `"size"`. * **`pagination_token`**: Token for fetching the next page of results. ```pythonXAI import os from xai_sdk import Client client = Client(api_key=os.getenv("XAI_API_KEY")) # List files with pagination and sorting response = client.files.list( limit=10, order="desc", sort_by="created_at" ) for file in response.data: print(f"File: {file.filename} (ID: {file.id}, Size: {file.size} bytes)") ``` ```pythonOpenAISDK import os from openai import OpenAI client = OpenAI( api_key=os.getenv("XAI_API_KEY"), base_url="https://api.x.ai/v1", ) # List files files = client.files.list() for file in files.data: print(f"File: {file.filename} (ID: {file.id})") ``` ```pythonRequests import os import requests url = "https://api.x.ai/v1/files" headers = { "Authorization": f"Bearer {os.getenv('XAI_API_KEY')}" } response = requests.get(url, headers=headers) files = response.json() for file in files.get("data", []): print(f"File: {file['filename']} (ID: {file['id']})") ``` ```javascriptOpenAISDK import OpenAI from "openai"; const client = new OpenAI({ apiKey: process.env.XAI_API_KEY, baseURL: "https://api.x.ai/v1", }); // List files const files = await client.files.list(); for (const file of files.data) { console.log(\`File: \${file.filename} (ID: \${file.id})\`); } ``` ```javascriptWithoutSDK const response = await fetch("https://api.x.ai/v1/files", { headers: { Authorization: \`Bearer \${process.env.XAI_API_KEY}\` }, }); const files = await response.json(); for (const file of files.data) { console.log(\`File: \${file.filename} (ID: \${file.id})\`); } ``` ```bash curl https://api.x.ai/v1/files \\ -H "Authorization: Bearer $XAI_API_KEY" ``` ## Getting File Metadata Retrieve detailed information about a specific file. ```pythonXAI import os from xai_sdk import Client client = Client(api_key=os.getenv("XAI_API_KEY")) # Get file metadata by ID file = client.files.get("file-abc123") print(f"Filename: {file.filename}") print(f"Size: {file.size} bytes") print(f"Created: {file.created_at}") print(f"Team ID: {file.team_id}") ``` ```pythonOpenAISDK import os from openai import OpenAI client = OpenAI( api_key=os.getenv("XAI_API_KEY"), base_url="https://api.x.ai/v1", ) # Get file metadata file = client.files.retrieve("file-abc123") print(f"Filename: {file.filename}") print(f"Size: {file.bytes} bytes") ``` ```pythonRequests import os import requests file_id = "file-abc123" url = f"https://api.x.ai/v1/files/{file_id}" headers = { "Authorization": f"Bearer {os.getenv('XAI_API_KEY')}" } response = requests.get(url, headers=headers) file = response.json() print(f"Filename: {file['filename']}") print(f"Size: {file['bytes']} bytes") ``` ```javascriptOpenAISDK import OpenAI from "openai"; const client = new OpenAI({ apiKey: process.env.XAI_API_KEY, baseURL: "https://api.x.ai/v1", }); // Get file metadata const file = await client.files.retrieve("file-abc123"); console.log("Filename: " + file.filename); console.log("Size: " + file.bytes + " bytes"); ``` ```javascriptWithoutSDK const response = await fetch("https://api.x.ai/v1/files/file-abc123", { headers: { Authorization: \`Bearer \${process.env.XAI_API_KEY}\` }, }); const file = await response.json(); console.log("Filename: " + file.filename); console.log("Size: " + file.bytes + " bytes"); ``` ```bash curl https://api.x.ai/v1/files/file-abc123 \\ -H "Authorization: Bearer $XAI_API_KEY" ``` ## Getting File Content Download the actual content of a file. ```pythonXAI import os from xai_sdk import Client client = Client(api_key=os.getenv("XAI_API_KEY")) # Get file content content = client.files.content("file-abc123") # Content is returned as bytes print(f"Content length: {len(content)} bytes") print(f"Content preview: {content[:100]}") ``` ```pythonOpenAISDK import os from openai import OpenAI client = OpenAI( api_key=os.getenv("XAI_API_KEY"), base_url="https://api.x.ai/v1", ) # Get file content content = client.files.content("file-abc123") print(f"Content: {content.text}") ``` ```pythonRequests import os import requests file_id = "file-abc123" url = f"https://api.x.ai/v1/files/{file_id}/content" headers = { "Authorization": f"Bearer {os.getenv('XAI_API_KEY')}" } response = requests.get(url, headers=headers) content = response.content print(f"Content length: {len(content)} bytes") ``` ```javascriptOpenAISDK import OpenAI from "openai"; const client = new OpenAI({ apiKey: process.env.XAI_API_KEY, baseURL: "https://api.x.ai/v1", }); // Get file content const response = await client.files.content("file-abc123"); const content = await response.text(); console.log("Content: " + content); ``` ```javascriptWithoutSDK const response = await fetch("https://api.x.ai/v1/files/file-abc123/content", { headers: { Authorization: \`Bearer \${process.env.XAI_API_KEY}\` }, }); const content = await response.text(); console.log("Content: " + content); ``` ```bash curl https://api.x.ai/v1/files/file-abc123/content \\ -H "Authorization: Bearer $XAI_API_KEY" ``` ## Deleting Files Remove files when they're no longer needed. ```pythonXAI import os from xai_sdk import Client client = Client(api_key=os.getenv("XAI_API_KEY")) # Delete a file delete_response = client.files.delete("file-abc123") print(f"Deleted: {delete_response.deleted}") print(f"File ID: {delete_response.id}") ``` ```pythonOpenAISDK import os from openai import OpenAI client = OpenAI( api_key=os.getenv("XAI_API_KEY"), base_url="https://api.x.ai/v1", ) # Delete a file delete_response = client.files.delete("file-abc123") print(f"Deleted: {delete_response.deleted}") print(f"File ID: {delete_response.id}") ``` ```pythonRequests import os import requests file_id = "file-abc123" url = f"https://api.x.ai/v1/files/{file_id}" headers = { "Authorization": f"Bearer {os.getenv('XAI_API_KEY')}" } response = requests.delete(url, headers=headers) result = response.json() print(f"Deleted: {result['deleted']}") print(f"File ID: {result['id']}") ``` ```javascriptOpenAISDK import OpenAI from "openai"; const client = new OpenAI({ apiKey: process.env.XAI_API_KEY, baseURL: "https://api.x.ai/v1", }); // Delete a file const deleteResponse = await client.files.delete("file-abc123"); console.log("Deleted: " + deleteResponse.deleted); console.log("File ID: " + deleteResponse.id); ``` ```javascriptWithoutSDK const response = await fetch("https://api.x.ai/v1/files/file-abc123", { method: "DELETE", headers: { Authorization: \`Bearer \${process.env.XAI_API_KEY}\` }, }); const result = await response.json(); console.log("Deleted: " + result.deleted); console.log("File ID: " + result.id); ``` ```bash curl -X DELETE https://api.x.ai/v1/files/file-abc123 \\ -H "Authorization: Bearer $XAI_API_KEY" ``` ## Limitations and Considerations ### File Size Limits * **Maximum file size**: 48 MB per file * **Processing time**: Larger files may take longer to process ### File Retention * **Cleanup**: Delete files when no longer needed to manage storage * **Access**: Files are scoped to your team/organization ### Supported Formats While many text-based formats are supported, the system works best with: * Structured documents (with clear sections, headings) * Plain text and markdown * Documents with clear information hierarchy Supported file types include: * Plain text files (.txt) * Markdown files (.md) * Code files (.py, .js, .java, etc.) * CSV files (.csv) * JSON files (.json) * PDF documents (.pdf) * And many other text-based formats ## Next Steps Now that you know how to manage files, learn how to use them in chat conversations: ===/developers/files=== #### Files & Collections # Files Grok can search through and reason over documents you attach to chat messages. You can reference any public file by URL or [upload](/developers/files/managing-files) private files and reference them by ID; either way, the system automatically activates the `attachment_search` tool and transforms your request into an agentic workflow. You can view more information at [Files API Reference](/developers/rest-api-reference/files). **Looking for Collections?** If you need persistent document storage with semantic search across many documents, see [Collections](/developers/files/collections). Files are different—they're for attaching documents to chat conversations for immediate context. ## How Files Work with Chat Behind the scenes, when you attach files to a chat message, the xAI API implicitly adds the `attachment_search` server-side tool to your request. This means: 1. **Automatic Agentic Behavior**: Your chat request becomes an agentic request, where Grok autonomously searches through your documents 2. **Intelligent Document Analysis**: The model can reason over document content, extract relevant information, and synthesize answers 3. **Multi-Document Support**: You can attach multiple files, and Grok will search across all of them This seamless integration allows you to simply attach files and ask questions—the complexity of document search and retrieval is handled automatically by the agentic workflow. ## Understanding Document Search When you attach files to a chat message, the xAI API automatically activates the `attachment_search` [server-side tool](/developers/tools/overview). This transforms your request into an [agentic workflow](/developers/tools/overview#how-it-works) where Grok: 1. **Analyzes your query** to understand what information you're seeking 2. **Searches the documents** intelligently, finding relevant sections across all attached files 3. **Extracts and synthesizes information** from multiple sources if needed 4. **Provides a comprehensive answer** with the context from your documents ### Agentic Workflow Just like other agentic tools (web search, X search, code execution), document search operates autonomously: * **Multiple searches**: The model may search documents multiple times with different queries to find comprehensive information * **Reasoning**: The model uses its reasoning capabilities to decide what to search for and how to interpret the results * **Streaming visibility**: In streaming mode, you can see when the model is searching your documents via tool call notifications ### Token Usage with Files File-based chats follow similar token patterns to other agentic requests: * **Prompt tokens**: Include the conversation history and internal processing. Document content is processed efficiently * **Reasoning tokens**: Used for planning searches and analyzing document content * **Completion tokens**: The final answer text * **Cached tokens**: Repeated document content benefits from prompt caching for efficiency The actual document content is processed by the server-side tool and doesn't directly appear in the message history, keeping token usage optimized. ### Pricing Document search is billed per tool invocation, in addition to standard token costs. Each time the model searches your documents, it counts as one tool invocation. For complete pricing details, see the [Tools Pricing](/developers/models#tools-pricing) table. ## Getting Started To use files with Grok, you'll need to: 1. Get file's **public URL** or learn how to upload, list, retrieve, and delete files via the **[Files API](/developers/files/managing-files)**. 2. **[Chat with files](/developers/model-capabilities/files/chat-with-files)** - attach files to chat messages and ask questions about your documents ## Quick Example Here's a quick example of the complete workflow: ```pythonXAI import os from xai_sdk import Client from xai_sdk.chat import user, file client = Client(api_key=os.getenv("XAI_API_KEY")) # 1a. Reference a public file by URL file_url = "https://example-files.online-convert.com/document/txt/example.txt" # 1b. Or upload a file and reference by ID uploaded_file = client.files.upload( b"Employee: Alice Johnson\\nDepartment: Engineering", filename="employee.txt", ) # 2. Chat with files chat = client.chat.create(model="grok-4.20-reasoning") chat.append(user( "Summarize both documents", file(url=file_url), file(uploaded_file.id), )) # 3. Get the answer response = chat.sample() print(response.content) # 4. Clean up uploaded file client.files.delete(uploaded_file.id) ``` ```javascriptWithoutSDK // 1a. Reference a public file by URL const fileUrl = "https://docs.x.ai/assets/api-examples/documents/sales-report.txt"; // 1b. Or upload a file and reference by ID const formData = new FormData(); formData.append("file", new Blob(["Employee: Alice Johnson\\nDepartment: Engineering"], { type: "text/plain" }), "employee.txt"); formData.append("purpose", "assistants"); const uploadRes = await fetch("https://api.x.ai/v1/files", { method: "POST", headers: { Authorization: \`Bearer \${process.env.XAI_API_KEY}\` }, body: formData, }); const uploadedFile = await uploadRes.json(); // 2. Chat with files const chatRes = await fetch("https://api.x.ai/v1/responses", { method: "POST", headers: { "Content-Type": "application/json", Authorization: \`Bearer \${process.env.XAI_API_KEY}\`, }, body: JSON.stringify({ model: "grok-4.20-reasoning", input: [ { role: "user", content: [ { type: "input_text", text: "Summarize both documents" }, { type: "input_file", file_url: fileUrl }, { type: "input_file", file_id: uploadedFile.id }, ], }, ], }), }); // 3. Get the answer const chatData = await chatRes.json(); const lastMessage = chatData.output[chatData.output.length - 1]; const answer = lastMessage?.content?.find((c) => c.type === "output_text")?.text; console.log(answer); // 4. Clean up await fetch(\`https://api.x.ai/v1/files/\${uploadedFile.id}\`, { method: "DELETE", headers: { Authorization: \`Bearer \${process.env.XAI_API_KEY}\` }, }); ``` ## Key Features ### Multiple File Support Attach [multiple documents](/developers/model-capabilities/files/chat-with-files#multiple-file-attachments) to a single query and Grok will search across all of them to find relevant information. ### Multi-Turn Conversations File context persists across [conversation turns](/developers/model-capabilities/files/chat-with-files#multi-turn-conversations-with-files), allowing you to ask follow-up questions without re-attaching files. ### Code Execution Integration Combine files with the [code execution tool](/developers/model-capabilities/files/chat-with-files#combining-files-with-code-execution) to perform advanced data analysis, statistical computations, and transformations on your uploaded data. The model can write and execute Python code that processes your files directly. ## Limitations * **File size**: Maximum 48 MB per file * **No batch requests**: File attachments with document search are agentic requests and do not support batch mode (`n > 1`) * **Agentic models only**: Requires models that support agentic tool calling (e.g., `grok-4-fast`, `grok-4`, `grok-4.20`) * **Supported file formats**: * Plain text files (.txt) * Markdown files (.md) * Code files (.py, .js, .java, etc.) * CSV files (.csv) * JSON files (.json) * PDF documents (.pdf) * And many other text-based formats ## Next Steps ===/developers/grpc-api-reference=== # gRPC API Reference The xAI gRPC API is a robust, high-performance gRPC interface designed for seamless integration into existing systems. The base url for all services is at `api.x.ai`. For all services, you have to authenticate with the header `Authorization: Bearer `. Visit [xAI API Protobuf Definitions](https://github.com/xai-org/xai-proto) to view and download our protobuf definitions. The [xAI Python SDK](https://github.com/xai-org/xai-sdk-python) (`xai-sdk`) uses gRPC natively. Install with `pip install xai-sdk`. ## Using buf curl Clone the proto definitions and use [buf curl](https://buf.build/docs/curl/usage) to call the API: ```bash git clone https://github.com/xai-org/xai-proto.git cd xai-proto ``` All `buf curl` examples below assume you run from inside the cloned `xai-proto` directory. *** ===/developers/introduction=== #### Introduction # What is Grok? Grok is a family of Large Language Models (LLMs) developed by [xAI](https://x.ai). Inspired by the Hitchhiker's Guide to the Galaxy, Grok is a maximally truth-seeking AI that provides insightful, unfiltered truths about the universe. xAI offers an API for developers to programmatically interact with our Grok [models](/developers/models). The same models power our consumer facing services such as [Grok.com](https://grok.com), the [iOS](https://apps.apple.com/us/app/grok/id6670324846) and [Android](https://play.google.com/store/apps/details?id=ai.x.grok) apps, as well as [Grok in X experience](https://grok.x.com). ## What is the xAI API? How is it different from Grok in other services? The xAI API is a toolkit for developers to integrate xAI's Grok models into their own applications, the xAI API provides the building blocks to create new AI experiences. To get started building with the xAI API, please head to [The Hitchhiker's Guide to Grok](/developers/quickstart). ## xAI API vs Grok in other services | Category | xAI API | Grok.com | Mobile Apps | Grok in 𝕏 | |-------------------------------|----------------------------------|-----------------------------------|----------------------------|------------------------------------| | **Accessible** | API (api.x.ai) | grok.com + PWA (Android) | App Store / Play Store | X.com + 𝕏 apps | | **Billing** | xAI | xAI / 𝕏 | xAI / 𝕏 | 𝕏 | | **Programming Required** | Yes | No | No | No | | **Description** | Programmatic access for developers | Full-featured web AI assistant | Mobile AI assistant | X-integrated AI (fewer features) | Because these are separate offerings, your purchase on X (e.g. X Premium) won't affect your service status on xAI API, and vice versa. This documentation is intended for users using xAI API. ===/developers/management-api-guide=== #### Key Information # Using Management API Some enterprise users may prefer to manage their account details programmatically rather than manually through the xAI Console. For this reason, we have developed a Management API to enable enterprise users to efficiently manage their team details. You can read the endpoint specifications and descriptions at [Management API Reference](/developers/rest-api-reference/management). You need to get a management key, which is separate from your API key, to use the management API. The management key can be obtained at [xAI Console](https://console.x.ai) -> Settings -> Management Keys. The base URL is at `https://management-api.x.ai`, which is also different from the inference API. ## Operations related to API Keys You can create, list, update, and delete API keys via the management API. You can also manage the access control lists (ACLs) associated with the API keys. The available ACL types are: * `api-key:model` * `api-key:endpoint` To enable all models and endpoints available to your team, use: * `api-key:model:*` * `api-key:endpoint:*` Or if you need to specify the particular endpoint available to the API: * `api-key:endpoint:chat` for chat and vision models * `api-key:endpoint:image` for image generation models And to specify models the API key has access to: * `api-key:model:` ### Create an API key An example to create an API key with all models and endpoints enabled, limiting requests to 5 queries per second and 100 queries per minute, without token number restrictions. ```bash curl https://management-api.x.ai/auth/teams/{teamId}/api-keys \\ -X POST \\ -H "Authorization: Bearer " \\ -d '{ "name": "My API key", "acls": ["api-key:model:*", "api-key:endpoint:*"], "qps": 5, "qpm": 100, "tpm": null }' ``` Specify `tpm` to any integer string to limit the number of tokens produced/consumed per minute. When the token rate limit is triggered, new requests will be rejected and in-flight requests will continue processing. The newly-created API key will be returned in the `"apiKey"` field of the response object. The API Key ID is returned as `"apiKeyId"` in the response body as well, which is useful for updating and deleting operations. ### List API keys To retrieve a list of API keys from a team, you can run the following: ```bash curl https://management-api.x.ai/auth/teams/{teamId}/api-keys?pageSize=10&paginationToken= \\ -H "Authorization: Bearer " ``` You can customize the query parameters such as `pageSize` and `paginationToken`. ### Update an API key You can update an API key after it has been created. For example, to update the `qpm` of an API key: ```bash curl https://management-api.x.ai/auth/api-keys/{apiKeyId} \\ -X PUT \\ -H "Authorization: Bearer " \\ -d '{ "apiKey": { "qpm": 200 }, "fieldMask": "qpm", }' ``` Or to update the `name` of an API key: ```bash curl https://management-api.x.ai/auth/api-keys/{apiKeyId} \\ -X PUT \\ -H "Authorization: Bearer " \\ -d '{ "apiKey": { "name": "Updated API key" }, "fieldMask": "name", }' ``` ### Delete an API key You can also delete an API key with the following: ```bash curl https://management-api.x.ai/auth/api-keys/{apiKeyId} \\ -X DELETE \\ -H "Authorization: Bearer " ``` ### Check propagation status of API key across clusters There could be a slight delay between creating an API key, and the API key being available for use across all clusters. You can check the propagation status of the API key via API. ```bash curl https://management-api.x.ai/auth/api-keys/{apiKeyId}/propagation \\ -H "Authorization: Bearer " ``` ### List all models available for the team You can list all the available models for a team with our management API as well. The model names in the output can be used with setting ACL string on an API key as `api-key:model:` ```bash curl https://management-api.x.ai/auth/teams/{teamId}/models \\ -H "Authorization: Bearer " ``` ## Access Control List (ACL) management We also offer endpoint to list possible ACLs for a team. You can then apply the endpoint ACL strings to your API keys. To view possible endpoint ACLs for a team's API keys: ```bash curl https://management-api.x.ai/auth/teams/{teamId}/endpoints \\ -H "Authorization: Bearer " ``` ## Validate a management key You can check if your key is a valid management key. If validation succeeds, the endpoint returns meta information about the management key. This endpoint does not require any Access Control List (ACL) permissions. ```bash curl https://management-api.x.ai/auth/management-keys/validation \\ -H "Authorization: Bearer " ``` ## Audit Logs You can retrieve audit logs for your team. Audit events track changes to team settings, API keys, team membership, and other administrative actions. ### List audit events To retrieve audit events for a team: ```bash curl "https://management-api.x.ai/audit/teams/{teamId}/events?pageSize=10" \\ -H "Authorization: Bearer " ``` You can customize the query parameters: * `pageSize` - Number of events per page * `pageToken` - Token for fetching the next page of results * `eventFilter.userId` - Filter events to a specific user * `eventFilter.query` - Full-text search in event descriptions * `eventTimeFrom` - Filter events from a specific time (ISO 8601 format) * `eventTimeTo` - Filter events up to a specific time (ISO 8601 format) To fetch the next page of results, use the `nextPageToken` from the response: ```bash curl "https://management-api.x.ai/audit/teams/{teamId}/events?pageSize=10&pageToken={nextPageToken}" \\ -H "Authorization: Bearer " ``` Example with time filter: ```bash curl "https://management-api.x.ai/audit/teams/{teamId}/events?pageSize=50&eventTimeFrom=2025-01-01T00:00:00Z" \\ -H "Authorization: Bearer " ``` ===/developers/migration/models=== #### Key Information # Migrating to New Models As we release newer, more advanced models, we are focusing resources on supporting customers with these models and will be phasing out older versions. You will see `deprecated` tag by the deprecated model names on [xAI Console](https://console.x.ai) models page. You should consider moving to a newer model when the model of your choice is being deprecated. We may transition a `deprecated` model to `obsolete` and discontinue serving the model across our services. An `obsolete` model will be removed from our [Models and Pricing](../models) page as well as from [xAI Console](https://console.x.ai). ## Moving from an older generation model When you move from an older model generation to a newer one, you usually won't need to make significant changes to how you use the API. In your request body, you can switch the `"model"` field from the deprecating model to a current model on [xAI Console](https://console.x.ai) models page. The newer models are more performant, but you might want to check if your prompts and other parameters can work with the new model and modify if necessary. ## Moving to the latest endpoints When you are setting up to use new models, it might also be a good idea to ensure you're using the latest endpoints. The latest endpoints have more stable supports for the model functionalities. Endpoints that are marked with `legacy` might not receive any updates that support newer functionalities. In general, the following endpoints are recommended: - Text and image input and text output: [Chat Completions](/developers/rest-api-reference/inference/chat#chat-completions) - `/v1/chat/completions` - Text input and image output: [Image Generation](/developers/rest-api-reference/inference/images#image-generation) - `/v1/image/generations` - Tokenization: [Tokenize Text](/developers/rest-api-reference/inference/other#tokenize-text) - `/v1/tokenize-text` ===/developers/model-capabilities/audio/agent-builder=== #### Model Capabilities # Agent Builder The [Voice Agent API](/developers/model-capabilities/audio/voice-agent) lets you build real-time voice applications over WebSocket. Every session requires configuration—instructions, tools, voice settings—sent via `session.update`. The Agents API adds a persistence layer: define your agent once, then reference it by ID in any session or phone call. ## Overview The workflow has three steps: 1. **Create an agent** — store its name, instructions, tools, voice, and knowledge base. 2. **Assign a phone number** — purchase a number and link it to the agent. 3. **Use the agent** — connect to the Voice Agent API with the stored configuration, or have the agent call a phone number. ## Step 1: Create an agent ```bash customLanguage="bash" curl -X POST https://api.x.ai/v1/agents \ -H "Authorization: Bearer $XAI_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "name": "Order Support", "instructions": "You are a friendly order support agent for Acme Corp. Help customers track orders, process returns, and answer product questions. Be concise—most callers are on mobile.", "tools": [ { "type": "function", "function": { "name": "lookup_order", "description": "Look up the status of a customer order", "parameters": { "type": "object", "properties": { "order_id": { "type": "string", "description": "The order ID, e.g. ORD-12345" } }, "required": ["order_id"] } } }, { "type": "web_search" } ], "voice": { "voice_id": "eve", "vad_threshold": 0.5, "vad_silence_duration_ms": 300 } }' ``` ```python customLanguage="pythonWithoutSDK" import os, requests resp = requests.post( "https://api.x.ai/v1/agents", headers={"Authorization": f"Bearer {os.environ['XAI_API_KEY']}"}, json={ "name": "Order Support", "instructions": "You are a friendly order support agent for Acme Corp.", "tools": [ { "type": "function", "function": { "name": "lookup_order", "description": "Look up the status of a customer order", "parameters": { "type": "object", "properties": {"order_id": {"type": "string"}}, "required": ["order_id"], }, }, }, {"type": "web_search"}, ], "voice": {"voice_id": "eve", "vad_threshold": 0.5, "vad_silence_duration_ms": 300}, }, ) agent = resp.json()["agent"] print(f"Created agent: {agent['agent_id']}") ``` The response includes the `agent_id` (e.g., `agent_abc123def456`) which you'll use in subsequent calls. ### Configuring voice The `voice` object controls how the agent sounds and when it detects user speech: | Field | Description | |-------|-------------| | `voice_id` | Which voice to use. Options: `eve`, `ara`, `rex`, `sal`, `leo`. | | `vad_threshold` | How sensitive turn detection is. Lower values (e.g., 0.3) pick up quieter speech; higher values (e.g., 0.8) require louder input. Range: 0.0–1.0. | | `vad_silence_duration_ms` | How long the agent waits after silence before responding. Lower values (e.g., 200) make the agent more responsive; higher values (e.g., 500) give the caller more time to pause mid-thought. Range: 0–10,000. | ### Adding tools Agents support the same tool types as the [Voice Agent API](/developers/model-capabilities/audio/voice-agent): * **`function`** — custom functions your application executes. The agent generates the arguments; you return the result. * **`web_search`** — search the web for current information. * **`x_search`** — search posts on X. * **`file_search`** — search documents in knowledge base collections (set `collection_ids` on the agent). * **`mcp`** — connect to an MCP server for external tool access. ## Step 2: Purchase and assign a phone number Purchase a number in a US area code, then assign it to your agent: ```bash customLanguage="bash" # Purchase a number curl -X POST https://api.x.ai/v1/phone-numbers \ -H "Authorization: Bearer $XAI_API_KEY" \ -H "Content-Type: application/json" \ -d '{"area_code": "415", "name": "Support Line"}' # Assign it to the agent (use the phone_number_id from above) curl -X PATCH https://api.x.ai/v1/agents/agent_abc123def456 \ -H "Authorization: Bearer $XAI_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "agent": {}, "field_mask": {"paths": ["name"]}, "phone_number_id": "phone_xyz789" }' ``` ```python customLanguage="pythonWithoutSDK" import os, requests headers = {"Authorization": f"Bearer {os.environ['XAI_API_KEY']}"} # Purchase a number phone = requests.post( "https://api.x.ai/v1/phone-numbers", headers=headers, json={"area_code": "415", "name": "Support Line"}, ).json()["phone_number"] print(f"Purchased: {phone['phone_number']}") # Assign to agent requests.patch( f"https://api.x.ai/v1/agents/{agent['agent_id']}", headers=headers, json={ "agent": {}, "field_mask": {"paths": ["name"]}, "phone_number_id": phone["phone_number_id"], }, ) print(f"Assigned {phone['phone_number']} to {agent['agent_id']}") ``` Once assigned, the agent uses this number as its caller ID for outbound calls. To reassign a number, update the agent with a different `phone_number_id`. To unassign, pass `""`. ## Step 3: Use the agent ### In a realtime session Pass the agent's stored configuration to the [Voice Agent API](/developers/model-capabilities/audio/voice-agent) via `session.update`: ```python customLanguage="pythonWithoutSDK" import asyncio, json, os, requests, websockets # Fetch agent config headers = {"Authorization": f"Bearer {os.environ['XAI_API_KEY']}"} agent = requests.get( "https://api.x.ai/v1/agents/agent_abc123def456", headers=headers, ).json()["agent"] async def start_session(): async with websockets.connect( "wss://api.x.ai/v1/realtime", additional_headers=headers, ) as ws: # Configure session from agent await ws.send(json.dumps({ "type": "session.update", "session": { "voice": agent["voice"]["voice_id"], "instructions": agent["instructions"], "tools": agent["tools"], "turn_detection": { "type": "server_vad", "threshold": agent["voice"].get("vad_threshold", 0.5), "silence_duration_ms": agent["voice"].get("vad_silence_duration_ms", 300), }, }, })) # Now stream audio... print("Session configured. Ready to stream audio.") asyncio.run(start_session()) ``` This pattern separates configuration (Agents API) from session lifecycle (Voice Agent API). Update the agent once; every new session picks up the latest config. ### Outbound phone call Use the console to place an outbound call. The agent dials the target number using its assigned phone number as caller ID, with its stored instructions, voice, and tools. ## Using the console Everything above can also be done through the xAI Console at [console.x.ai](https://console.x.ai) without writing code. ### Creating an agent 1. Navigate to **Agents** in the console sidebar. 2. Click **Create Agent**. 3. Choose a template (Healthcare, Restaurant, Customer Support, Real Estate, Appointment Booking, Concierge) or select **Create Custom** to start from scratch. 4. Enter a name and instructions, then click **Create**. The console opens the agent detail view with four tabs: | Tab | What it does | |-----|-------------| | **Configuration** | Edit instructions, select a voice (`eve`, `ara`, `rex`, `sal`, `leo`), tune VAD threshold and silence duration, and assign a phone number. | | **Tools** | Add function tools with JSON Schema parameters, or enable built-in tools (web search, X search). | | **Knowledge Base** | Upload documents or create text entries for the agent to search via `file_search`. | | **Testing** | Talk to the agent in your browser—type a message or click **Talk** for a live voice session. The testing playground loads the agent's current configuration automatically. | ### Assigning a phone number 1. Go to the **Configuration** tab of your agent. 2. Under **Telephony**, select a phone number from the dropdown (or click **Manage Phone Numbers** to purchase one first). 3. Save changes. To purchase a new number, navigate to **Agents** → **Manage Phone Numbers**, then click **Add Phone Number** and choose a US area code. ### Testing in the browser The **Testing** tab provides a full voice playground: * Click **Talk** to start a conversation using your microphone with server-side VAD (the agent detects when you stop speaking). * Type a message in the text box to send text input instead. * Click **Call Me** to have the agent call your phone—useful for testing the telephony experience end to end. * Click **Clear Chat** to reset the conversation. The playground uses the same Voice Agent API WebSocket (`wss://api.x.ai/v1/realtime`) as your production integration, so what you hear in testing is what your users will experience. ## Testing with the CLI The `agent-mgmt-cli` (`agents`) is a command-line tool for managing and testing agents without the console UI. It is particularly useful for quick iteration, scripting, and CI workflows. ### Setup ```bash customLanguage="bash" export XAI_API_KEY=xai-... cargo install --path prod/mc/agent-mgmt-cli ``` ### Agent CRUD ```bash customLanguage="bash" # Create from flags agents create --name "My Agent" --instructions "You are helpful" # Create from a JSON file agents create agent.json # Create from piped stdin echo '{"name": "Bot", "instructions": "Be helpful"}' | agents create # List all agents agents list # Get a specific agent agents get agent_abc123 # Update an agent agents update agent_abc123 --name "New Name" agents update agent_abc123 --instructions "Updated prompt" agents update agent_abc123 --clear-instructions # Update from a JSON file (must contain agent.agentId) agents update update.json # Delete agents delete agent_abc123 ``` ### Running an agent (text) The `run` command fetches the agent's config and sends a one-shot chat completion to `/v1/chat/completions`, using the agent's instructions as the system prompt and its tools: ```bash customLanguage="bash" # Quick test with default model agents run agent_abc123 --message "What's the status of order ORD-42?" # Override the model agents run agent_abc123 --message "Hello" --model grok-3-fast ``` ### Realtime voice session (text I/O) The `voice` command opens a WebSocket to `wss://api.x.ai/v1/realtime`, configures the session with the agent's instructions, tools, and voice, then drops you into a text-based REPL. Type a message, press Enter, and the agent responds with streamed text (audio transcripts in text form): ```bash customLanguage="bash" agents voice agent_abc123 # Connected. Type a message and press Enter. Ctrl+C to quit. ``` This is the fastest way to test an agent's conversational behavior from the terminal without a browser or microphone. ### JSON output Add `--json` before any subcommand for machine-readable output, useful for scripting: ```bash customLanguage="bash" agents --json list agents --json run agent_abc123 --message "Hello" agents --json get agent_abc123 ``` ### Pointing at local dev Override the base URL to test against a local or staging instance: ```bash customLanguage="bash" agents --base-url http://localhost:9978 list ``` ## Updating an agent Use `PATCH` with a `field_mask` to modify specific fields without overwriting the rest: ```bash customLanguage="bash" # Change only the voice — everything else stays the same curl -X PATCH https://api.x.ai/v1/agents/agent_abc123def456 \ -H "Authorization: Bearer $XAI_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "agent": { "voice": {"voice_id": "leo", "vad_threshold": 0.7} }, "field_mask": {"paths": ["voice"]} }' ``` Only fields listed in `field_mask.paths` are modified. Valid paths: `name`, `instructions`, `tools`, `voice`, `collection_ids`. To clear a field, include it in the mask but omit it from the agent object. For example, `"field_mask": {"paths": ["instructions"]}` with an empty `agent` object clears the instructions. ## API reference For full endpoint details, request/response schemas, and error codes, see the [Agents API reference](/developers/rest-api-reference/inference/agents). ===/developers/model-capabilities/audio/ephemeral-tokens=== #### Model Capabilities # Ephemeral Tokens Ephemeral tokens provide secure, short-lived authentication for client-side applications. Use them when connecting to the [Voice Agent API](/developers/model-capabilities/audio/voice-agent) from browsers or mobile apps to avoid exposing your API key. ## How It Works 1. Your **server** requests an ephemeral token from xAI using your API key 2. Your server passes the ephemeral token to the **client** 3. The **client** uses the ephemeral token to authenticate the WebSocket connection 4. The token expires automatically after the configured duration **Never expose your API key in client-side code.** Always use ephemeral tokens for browser and mobile applications. ## Creating Ephemeral Tokens You need to set up a server endpoint to fetch the ephemeral token from xAI. The ephemeral token gives the holder scoped access to resources. **Endpoint:** `POST https://api.x.ai/v1/realtime/client_secrets` ```bash curl --url https://api.x.ai/v1/realtime/client_secrets \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $XAI_API_KEY" \ --data '{ "expires_after": { "seconds": 300 } }' # Note: Does not support "session" or "expires_after.anchor" fields ``` ```pythonWithoutSDK # Example ephemeral token endpoint with FastAPI import os import httpx from fastapi import FastAPI app = FastAPI() SESSION_REQUEST_URL = "https://api.x.ai/v1/realtime/client_secrets" XAI_API_KEY = os.getenv("XAI_API_KEY") @app.post("/session") async def get_ephemeral_token(): # Send request to xAI endpoint to retrieve the ephemeral token async with httpx.AsyncClient() as client: response = await client.post( url=SESSION_REQUEST_URL, headers={ "Authorization": f"Bearer {XAI_API_KEY}", "Content-Type": "application/json", }, json={"expires_after": {"seconds": 300}}, ) # Return the response body from xAI with ephemeral token return response.json() ``` ```javascriptWithoutSDK // Example ephemeral token endpoint with Express import express from 'express'; const app = express(); const SESSION_REQUEST_URL = "https://api.x.ai/v1/realtime/client_secrets"; app.use(express.json()); app.post("/session", async (req, res) => { const r = await fetch(SESSION_REQUEST_URL, { method: "POST", headers: { Authorization: \`Bearer \${process.env.XAI_API_KEY}\`, "Content-Type": "application/json", }, body: JSON.stringify({ expires_after: { seconds: 300 } }), }); const data = await r.json(); res.json(data); }); app.listen(8081); ``` ## Using Ephemeral Tokens The ephemeral token can be used in the same fashion as an API key: ```pythonWithoutSDK import os import websockets base_url = "wss://api.x.ai/v1/realtime" # Connect with API key in Authorization header async with websockets.connect( uri=base_url, ssl=True, additional_headers={"Authorization": f"Bearer {OBTAINED_EPHEMERAL_TOKEN}"} ) as websocket: # WebSocket connection is now authenticated pass ``` ```javascriptWithoutSDK import WebSocket from "ws"; const baseUrl = "wss://api.x.ai/v1/realtime"; // Connect with API key in Authorization header const ws = new WebSocket(baseUrl, { headers: { Authorization: "Bearer " + OBTAINED_EPHEMERAL_TOKEN, "Content-Type": "application/json", }, }); ws.on("open", () => { console.log("Connected with ephemeral token authentication"); }); ``` ### Browser WebSocket Authentication If you need to send the ephemeral token from the browser, you can add the ephemeral token with a prefix `xai-client-secret.` to the `sec-websocket-protocol` header: ```javascriptWithoutSDK new WebSocket("api.x.ai", [\`xai-client-secret.\${OBTAINED_EPHEMERAL_TOKEN}\`]); ``` ===/developers/model-capabilities/audio/text-to-speech=== #### Model Capabilities # Text to Speech Convert text into spoken audio with a single API call. The API supports 5 expressive voices, inline speech tags for fine-grained delivery control, and output formats from high-fidelity MP3 to telephony-optimized μ-law. ## Quick Start Generate speech with a single API call: ```bash curl -X POST https://api.x.ai/v1/tts \ -H "Authorization: Bearer $XAI_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "text": "Hello! Welcome to the xAI Text to Speech API.", "voice_id": "eve", "language": "en" }' \ --output hello.mp3 ``` ```python customLanguage="pythonWithoutSDK" import os import requests response = requests.post( "https://api.x.ai/v1/tts", headers={ "Authorization": f"Bearer {os.environ['XAI_API_KEY']}", "Content-Type": "application/json", }, json={ "text": "Hello! Welcome to the xAI Text to Speech API.", "voice_id": "eve", "language": "en", }, ) response.raise_for_status() with open("hello.mp3", "wb") as f: f.write(response.content) print(f"Saved {len(response.content):,} bytes to hello.mp3") ``` ```javascript customLanguage="javascriptWithoutSDK" import fs from "fs"; const response = await fetch("https://api.x.ai/v1/tts", { method: "POST", headers: { Authorization: `Bearer ${process.env.XAI_API_KEY}`, "Content-Type": "application/json", }, body: JSON.stringify({ text: "Hello! Welcome to the xAI Text to Speech API.", voice_id: "eve", language: "en", }), }); if (!response.ok) throw new Error(`TTS error ${response.status}`); const buffer = Buffer.from(await response.arrayBuffer()); fs.writeFileSync("hello.mp3", buffer); console.log(`Saved ${buffer.length.toLocaleString()} bytes to hello.mp3`); ``` ```swift import Foundation let apiKey = ProcessInfo.processInfo.environment["XAI_API_KEY"]! let url = URL(string: "https://api.x.ai/v1/tts")! var request = URLRequest(url: url) request.httpMethod = "POST" request.setValue("Bearer \(apiKey)", forHTTPHeaderField: "Authorization") request.setValue("application/json", forHTTPHeaderField: "Content-Type") request.httpBody = try JSONSerialization.data(withJSONObject: [ "text": "Hello! Welcome to the xAI Text to Speech API.", "voice_id": "eve", "language": "en", ]) let (data, _) = try await URLSession.shared.data(for: request) let fileURL = URL(fileURLWithPath: "hello.mp3") try data.write(to: fileURL) print("Saved \(data.count) bytes to hello.mp3") ``` The response body contains raw audio bytes. Save directly to a file or pipe to an audio player. [Try the Playground →](https://console.x.ai/team/default/voice/text-to-speech?campaign=voice-docs-tts) [Live Voice Demos](https://x.ai/api/voice) [Get API Key](https://console.x.ai/team/default/api-keys?campaign=voice-docs-tts) ## Request Body | Parameter | Type | Required | Description | |-----------|------|----------|-------------| | `text` | string | ✓ | The text to convert to speech. Maximum **15,000 characters**. Supports [speech tags](#speech-tags). | | `voice_id` | string | | Voice to use for synthesis. Defaults to `eve`. See [Voices](#voices). | | `language` | string | ✓ | BCP-47 language code (e.g. `en`, `zh`, `pt-BR`) or `auto` for automatic language detection. See [Supported Languages](#supported-languages). | | `output_format` | object | | Output format configuration. Defaults to MP3 at 24 kHz / 128 kbps. See [Output Formats](#output-formats). | ### Example with all options ```json { "text": "Hello! This is a high-fidelity text to speech example.", "voice_id": "ara", "language": "en", "output_format": { "codec": "mp3", "sample_rate": 44100, "bit_rate": 192000 } } ``` ## Voices Five voices are available, each with a distinct personality. Listen to samples and choose the best fit for your use case: | Voice | Tone | Description | Sample | |-------|------|-------------|:------:| | **`eve`** | Energetic, upbeat | Default voice - engaging and enthusiastic | | | **`ara`** | Warm, friendly | Balanced and conversational | | | **`rex`** | Confident, clear | Professional and articulate - ideal for business | | | **`sal`** | Smooth, balanced | Versatile voice for a wide range of contexts | | | **`leo`** | Authoritative, strong | Commanding and decisive - great for instructional content | | Voice IDs are **case-insensitive** - `eve`, `Eve`, and `EVE` all work. [Preview all voices in the playground →](https://console.x.ai/team/default/voice/text-to-speech?campaign=voice-docs-tts) ### Choosing the right voice * **`eve`** - Great default for demos, announcements, and upbeat content * **`ara`** - Ideal for conversational interfaces, customer support, and warm narration * **`rex`** - Best for business presentations, corporate communications, and tutorials * **`sal`** - Versatile choice for balanced delivery across different content types * **`leo`** - Perfect for authoritative narration, instructions, and educational content You can also list voices programmatically with the [List voices](/developers/rest-api-reference/inference/voice#list-voices) endpoint: ```bash curl -s https://api.x.ai/v1/tts/voices \ -H "Authorization: Bearer $XAI_API_KEY" ``` ```python customLanguage="pythonWithoutSDK" import os import requests response = requests.get( "https://api.x.ai/v1/tts/voices", headers={"Authorization": f"Bearer {os.environ['XAI_API_KEY']}"}, ) for voice in response.json()["voices"]: print(f"{voice['voice_id']:5s} {voice['name']}") ``` ```javascript customLanguage="javascriptWithoutSDK" const response = await fetch("https://api.x.ai/v1/tts/voices", { headers: { Authorization: `Bearer ${process.env.XAI_API_KEY}` }, }); const { voices } = await response.json(); voices.forEach((v) => console.log(`${v.voice_id} ${v.name}`)); ``` ```swift import Foundation let apiKey = ProcessInfo.processInfo.environment["XAI_API_KEY"]! let url = URL(string: "https://api.x.ai/v1/tts/voices")! var request = URLRequest(url: url) request.setValue("Bearer \(apiKey)", forHTTPHeaderField: "Authorization") let (data, _) = try await URLSession.shared.data(for: request) let json = try JSONSerialization.jsonObject(with: data) as! [String: Any] let voices = json["voices"] as! [[String: Any]] for voice in voices { print("\(voice["voice_id"]!) \(voice["name"]!)") } ``` ## Supported Languages The TTS API supports 20 languages via BCP-47 language codes. Use `auto` for automatic language detection, or specify a language code explicitly for consistent results. Language code validation is **case-insensitive** — `en`, `EN`, and `En` all work. | Language | Language Code | |----------|---------------| | Auto-detect | `auto` | | English | `en` | | Arabic (Egypt) | `ar-EG` | | Arabic (Saudi Arabia) | `ar-SA` | | Arabic (United Arab Emirates) | `ar-AE` | | Bengali | `bn` | | Chinese (Simplified) | `zh` | | French | `fr` | | German | `de` | | Hindi | `hi` | | Indonesian | `id` | | Italian | `it` | | Japanese | `ja` | | Korean | `ko` | | Portuguese (Brazil) | `pt-BR` | | Portuguese (Portugal) | `pt-PT` | | Russian | `ru` | | Spanish (Mexico) | `es-MX` | | Spanish (Spain) | `es-ES` | | Turkish | `tr` | | Vietnamese | `vi` | The model is also capable of generating speech in additional languages beyond those listed above, with varying degrees of accuracy. ## Speech Tags Add inline speech tags to your text for expressive delivery. There are two types of tags: * **Inline tags** `[tag]` — placed at a specific point in the text to produce a vocal expression (e.g. a laugh or pause) * **Wrapping tags** `text` — wrap a section of text to change how it is delivered (e.g. whispering, singing) ### Inline Tags Insert these where the expression should occur. Click any tag to hear an example: | Category | Tags | |----------|------| | **Pauses** | | | **Laughter & crying** | | | **Mouth sounds** | | | **Breathing** | | ### Wrapping Tags Wrap text to change delivery style. Use an opening tag and a matching closing tag. Click any tag to hear an example: | Category | Tags | |----------|------| | **Volume & intensity** | | | **Pitch & speed** | | | **Vocal style** | | ### Examples ```bash # Inline tags curl -X POST https://api.x.ai/v1/tts \ -H "Authorization: Bearer $XAI_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "text": "So I walked in and [pause] there it was. [laugh] I honestly could not believe it!", "voice_id": "eve", "language": "en" }' \ --output expressive.mp3 # Wrapping tags curl -X POST https://api.x.ai/v1/tts \ -H "Authorization: Bearer $XAI_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "text": "I need to tell you something. It is a secret. Pretty cool, right?", "voice_id": "eve", "language": "en" }' \ --output whisper.mp3 ``` ```python customLanguage="pythonWithoutSDK" import os import requests # Inline tags response = requests.post( "https://api.x.ai/v1/tts", headers={ "Authorization": f"Bearer {os.environ['XAI_API_KEY']}", "Content-Type": "application/json", }, json={ "text": "So I walked in and [pause] there it was. [laugh] I honestly could not believe it!", "voice_id": "eve", "language": "en", }, ) response.raise_for_status() with open("expressive.mp3", "wb") as f: f.write(response.content) # Wrapping tags response = requests.post( "https://api.x.ai/v1/tts", headers={ "Authorization": f"Bearer {os.environ['XAI_API_KEY']}", "Content-Type": "application/json", }, json={ "text": "I need to tell you something. It is a secret. Pretty cool, right?", "voice_id": "eve", "language": "en", }, ) response.raise_for_status() with open("whisper.mp3", "wb") as f: f.write(response.content) ``` ```javascript customLanguage="javascriptWithoutSDK" // Inline tags const response = await fetch("https://api.x.ai/v1/tts", { method: "POST", headers: { Authorization: `Bearer ${process.env.XAI_API_KEY}`, "Content-Type": "application/json", }, body: JSON.stringify({ text: "So I walked in and [pause] there it was. [laugh] I honestly could not believe it!", voice_id: "eve", language: "en", }), }); // Wrapping tags const whisperResponse = await fetch("https://api.x.ai/v1/tts", { method: "POST", headers: { Authorization: `Bearer ${process.env.XAI_API_KEY}`, "Content-Type": "application/json", }, body: JSON.stringify({ text: "I need to tell you something. It is a secret. Pretty cool, right?", voice_id: "eve", language: "en", }), }); ``` ```swift import Foundation let apiKey = ProcessInfo.processInfo.environment["XAI_API_KEY"]! let url = URL(string: "https://api.x.ai/v1/tts")! // Inline tags var request = URLRequest(url: url) request.httpMethod = "POST" request.setValue("Bearer \(apiKey)", forHTTPHeaderField: "Authorization") request.setValue("application/json", forHTTPHeaderField: "Content-Type") request.httpBody = try JSONSerialization.data(withJSONObject: [ "text": "So I walked in and [pause] there it was. [laugh] I honestly could not believe it!", "voice_id": "eve", "language": "en", ]) let (data, _) = try await URLSession.shared.data(for: request) try data.write(to: URL(fileURLWithPath: "expressive.mp3")) // Wrapping tags request.httpBody = try JSONSerialization.data(withJSONObject: [ "text": "I need to tell you something. It is a secret. Pretty cool, right?", "voice_id": "eve", "language": "en", ]) let (whisperData, _) = try await URLSession.shared.data(for: request) try whisperData.write(to: URL(fileURLWithPath: "whisper.mp3")) ``` **Tips for speech tags:** * Place inline tags where the expression would naturally occur in conversation * Combine tags with punctuation — `"Really? [laugh] That's incredible!"` produces more natural results than stacking tags * Use `[pause]` or `[long-pause]` to add dramatic timing or let a thought land * Wrapping tags work best around complete phrases — `It is a secret.` reads more naturally than wrapping individual words * Combine styles for effect — `Goodnight, sleep well.` ## Output Formats Control the audio codec, sample rate, and bit rate with the `output_format` object. When omitted, the default is **MP3 at 24 kHz / 128 kbps**. ### Codecs | Codec | Content-Type | Best for | |-------|-------------|----------| | `mp3` | `audio/mpeg` | General use - wide compatibility, good compression | | `wav` | `audio/wav` | Lossless audio - editing, post-production | | `pcm` | `audio/pcm` | Raw audio - real-time processing pipelines | | `mulaw` | `audio/basic` | Telephony (G.711 μ-law) | | `alaw` | `audio/alaw` | Telephony (G.711 A-law) | ### Sample Rates | Rate | Description | |------|-------------| | `8000` | Narrowband - telephony | | `16000` | Wideband - speech recognition | | `22050` | Standard - balanced quality | | `24000` | High quality - **default**, recommended for most use cases | | `44100` | CD quality - media production | | `48000` | Professional - studio-grade audio | ### Bit Rates (MP3 only) | Rate | Quality | |------|---------| | `32000` | Low - smallest file size | | `64000` | Medium - good for speech | | `96000` | Standard - balanced | | `128000` | High - **default**, recommended | | `192000` | Maximum - highest fidelity | ### Example: High-fidelity MP3 ```json { "text": "Crystal clear audio at maximum quality.", "voice_id": "rex", "language": "en", "output_format": { "codec": "mp3", "sample_rate": 44100, "bit_rate": 192000 } } ``` ### Example: Telephony (μ-law) ```json { "text": "Hello, thank you for calling. How can I help you today?", "voice_id": "ara", "language": "en", "output_format": { "codec": "mulaw", "sample_rate": 8000 } } ``` ## Best Practices Tips for getting the highest quality output from the TTS API. ### Writing effective text * **Use natural punctuation.** Commas, periods, and question marks guide pacing and intonation. `"Wait, really?"` sounds more natural than `"Wait really"`. * **Add emotional context.** Exclamation marks and question marks influence delivery - `"That's amazing!"` sounds enthusiastic while `"That's amazing."` is matter-of-fact. * **Break long content into paragraphs.** Paragraph breaks create natural pauses and help the model maintain consistent quality across longer text. * **Keep unary requests under 15,000 characters.** For longer content, use the [bidirectional WebSocket endpoint](#streaming-tts-websocket) which has no text length limit, or split into logical segments (by paragraph or sentence) and concatenate the audio output. ### Integrating with AI coding assistants The [Cloud Console playground](https://console.x.ai/team/default/voice/text-to-speech?campaign=voice-docs-tts) includes ready-made **agent instructions** you can copy and paste into tools like Cursor, GitHub Copilot, or Windsurf. The instructions are pre-configured with your current voice and format settings - open the playground, tweak your settings, and copy the prompt to get a tailored integration guide for your coding agent. ### Optimizing for production * **Proxy requests server-side.** Never expose your API key in client-side code. Route TTS requests through your backend. * **Cache generated audio.** If the same text is requested repeatedly, cache the audio bytes to save API calls and reduce latency. * **Match the format to the use case.** Use `mulaw` or `alaw` at 8 kHz for telephony; `mp3` at 24 kHz for web; `wav` at 44.1+ kHz for post-production. * **Respect concurrent session limits.** The streaming WebSocket endpoint allows up to **50 concurrent sessions per team**. For high-throughput services, pool connections or queue requests to stay within this limit. ## Browser Playback To play TTS audio in the browser, proxy the request through your backend and use the Web Audio API or an `