/v1/chat/completions

Create a chat response from text/image chat prompts. This is the endpoint for making requests to chat and image understanding models.

Request Body

Response Body

choices

array

A list of response choices from the model. The length corresponds to the n in request body (default to 1).

created

integer

The chat completion creation time in Unix timestamp.

string

A unique ID for the chat response.

model

string

Model ID used to create chat completion.

object

string

The object type, which is always "chat.completion".

service_tier

"default" | "priority"

Processing tier for a request. Determines scheduling priority and billing.

POST

/v1/chat/completions

JSON

{
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant that can answer questions and help with tasks."
    },
    {
      "role": "user",
      "content": "What is 101*3?"
    }
  ],
  "model": "latest"
}

200

Response

JSON

{
  "id": "a3d1008e-4544-40d4-d075-11527e794e4a",
  "object": "chat.completion",
  "created": 1752854522,
  "model": "latest",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "101 multiplied by 3 is 303.",
        "refusal": null
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 32,
    "completion_tokens": 9,
    "total_tokens": 135,
    "prompt_tokens_details": {
      "text_tokens": 32,
      "audio_tokens": 0,
      "image_tokens": 0,
      "cached_tokens": 6
    },
    "completion_tokens_details": {
      "reasoning_tokens": 94,
      "audio_tokens": 0,
      "accepted_prediction_tokens": 0,
      "rejected_prediction_tokens": 0
    },
    "num_sources_used": 0
  },
  "system_fingerprint": "fp_3a7881249c"
}

Create new response

/v1/responses

Generates a response based on text or image prompts. The response ID can be used to retrieve the response later or to continue the conversation without repeating prior context. New responses will be stored for 30 days and then permanently deleted.

Request Body

input

string | array

required

Content of the input passed to a /v1/response request.

Response Body

background

boolean

default: false

OpenResponses compatibility fields. Not used at the moment. Just for OpenResponses compatibility. Whether to process the response asynchronously in the background.

created_at

integer

The Unix timestamp (in seconds) for the response creation time.

error

An error object returned when the model fails to generate a response.

frequency_penalty

number

(NOT SUPPORTED in Responses API) Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

string

Unique ID of the response.

metadata

Only included for compatibility.

model

string

Model name used to generate the response.

object

string

The object type of this resource. Always set to response.

output

array

The response generated by the model.

parallel_tool_calls

boolean

Whether to allow the model to run parallel tool calls.

presence_penalty

number

(NOT SUPPORTED in Responses API) Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

service_tier

string

default: default

status

string

Status of the response. One of completed, in_progress or incomplete.

store

boolean

default: true

Whether to store the input message(s) and model response for later retrieval.

text

object

tool_choice

string | object

Parameter to control how model chooses the tools.

tools

array

A list of tools the model may call in JSON-schema. Currently, only functions and web search are supported as tools. A max of 128 tools are supported.

top_logprobs

integer

An integer between 0 and 8 specifying the number of most likely tokens to return at each token position.

truncation

string

default: disabled

The truncation strategy to use for the model response.

POST

/v1/responses

curl -s https://api.x.ai/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $XAI_API_KEY" \
  -d '{
    "model": "grok-4.5",
    "input": "What is the meaning of life?"
  }'

import { xai } from "@ai-sdk/xai";
import { generateText } from "ai";

const result = await generateText({
  model: xai.responses("grok-4.5"),
  prompt: "What is the meaning of life?",
});

console.log(JSON.stringify(result, null, 2));

import os

from openai import OpenAI

client = OpenAI(
    api_key=os.environ["XAI_API_KEY"],
    base_url="https://api.x.ai/v1",
)

response = client.responses.create(
    model="grok-4.5",
    input="What is the meaning of life?",
)

print(response.model_dump_json(indent=2))

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.XAI_API_KEY,
  baseURL: "https://api.x.ai/v1",
});

const response = await client.responses.create({
  model: "grok-4.5",
  input: "What is the meaning of life?",
});

console.log(JSON.stringify(response, null, 2));

{
  "input": [
    {
      "role": "system",
      "content": "You are a helpful assistant that can answer questions and help with tasks."
    },
    {
      "role": "user",
      "content": "What is 101*3?"
    }
  ],
  "model": "latest"
}

Output

JSON

{
  "created_at": 1774274151,
  "completed_at": 1774274155,
  "id": "e7fd6e3f-0a77-9948-99a9-b40ba7c1c6f1",
  "max_output_tokens": null,
  "model": "grok-4.20-0309-reasoning",
  "object": "response",
  "output": [
    {
      "content": [
        {
          "type": "output_text",
          "text": "**42.**",
          "logprobs": [],
          "annotations": []
        }
      ],
      "id": "msg_e7fd6e3f-0a77-9948-99a9-b40ba7c1c6f1",
      "role": "assistant",
      "type": "message",
      "status": "completed"
    }
  ],
  "parallel_tool_calls": true,
  "previous_response_id": null,
  "reasoning": {
    "effort": null,
    "summary": null
  },
  "temperature": 0.7,
  "text": {
    "format": {
      "type": "text"
    }
  },
  "tool_choice": "auto",
  "tools": [],
  "top_p": 0.95,
  "usage": {
    "input_tokens": 131,
    "input_tokens_details": {
      "cached_tokens": 128
    },
    "output_tokens": 624,
    "output_tokens_details": {
      "reasoning_tokens": 246
    },
    "total_tokens": 755,
    "num_sources_used": 0,
    "num_server_side_tools_used": 0,
    "cost_in_usd_ticks": 37756000
  },
  "user": null,
  "incomplete_details": null,
  "status": "completed",
  "store": true,
  "metadata": {},
  "background": false,
  "service_tier": "default",
  "truncation": "disabled",
  "top_logprobs": 0,
  "presence_penalty": 0,
  "frequency_penalty": 0,
  "prompt_cache_key": null,
  "max_tool_calls": null,
  "safety_identifier": null,
  "error": null,
  "instructions": null
}

Compact a conversation

/v1/responses/compact

Shrink a full context window into a compacted window that can be reused in follow-up /v1/responses calls. See Context Compaction for the full guide.

Compacts a full Responses API input window into a shorter canonical window.

Request Body

input

string | array

required

Content of the input passed to a /v1/response request.

model

string

required

Model to use for compaction summarization (required).

Response Body

created_at

integer

Unix timestamp (in seconds) when the compacted conversation was created.

string

Unique ID for this compaction (e.g. cmp_<uuid>).

model

string

Model used for the compaction summary.

object

string

Always "response.compaction".

output

array

Compacted output containing a single compaction item. Pass this verbatim as input to the next /v1/responses call.

POST

/v1/responses/compact

curl -s https://api.x.ai/v1/responses/compact \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $XAI_API_KEY" \
  -d '{
    "model": "grok-4.5",
    "input": [
      {"role": "system", "content": "You are a concise and knowledgeable science tutor."},
      {"role": "user", "content": "What is the Higgs boson and why is it important?"},
      {"role": "assistant", "content": "The Higgs boson is an elementary particle in the Standard Model, predicted by Peter Higgs in 1964 and confirmed at CERN in 2012. It is the quantum excitation of the Higgs field, which gives mass to fundamental particles via the Higgs mechanism."},
      {"role": "user", "content": "How does the Higgs mechanism actually work?"},
      {"role": "assistant", "content": "Through spontaneous symmetry breaking. The Higgs field has a nonzero vacuum value, and particles acquire mass in proportion to how strongly they couple to it. Photons do not couple, which is why they remain massless."}
    ]
  }'

import os

from openai import OpenAI

client = OpenAI(
    api_key=os.environ["XAI_API_KEY"],
    base_url="https://api.x.ai/v1",
)

compacted = client.responses.compact(
    model="grok-4.5",
    input=[
        {"role": "system", "content": "You are a concise and knowledgeable science tutor."},
        {"role": "user", "content": "What is the Higgs boson and why is it important?"},
        {
            "role": "assistant",
            "content": (
                "The Higgs boson is an elementary particle in the Standard Model, predicted by "
                "Peter Higgs in 1964 and confirmed at CERN in 2012. It is the quantum excitation "
                "of the Higgs field, which gives mass to fundamental particles via the Higgs mechanism."
            ),
        },
        {"role": "user", "content": "How does the Higgs mechanism actually work?"},
        {
            "role": "assistant",
            "content": (
                "Through spontaneous symmetry breaking. The Higgs field has a nonzero vacuum value, "
                "and particles acquire mass in proportion to how strongly they couple to it. Photons "
                "do not couple, which is why they remain massless."
            ),
        },
    ],
)

print(compacted.model_dump_json(indent=2))

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.XAI_API_KEY,
  baseURL: "https://api.x.ai/v1",
});

const compacted = await client.responses.compact({
  model: "grok-4.5",
  input: [
    { role: "system", content: "You are a concise and knowledgeable science tutor." },
    { role: "user", content: "What is the Higgs boson and why is it important?" },
    {
      role: "assistant",
      content:
        "The Higgs boson is an elementary particle in the Standard Model, predicted by Peter Higgs in 1964 and confirmed at CERN in 2012. It is the quantum excitation of the Higgs field, which gives mass to fundamental particles via the Higgs mechanism.",
    },
    { role: "user", content: "How does the Higgs mechanism actually work?" },
    {
      role: "assistant",
      content:
        "Through spontaneous symmetry breaking. The Higgs field has a nonzero vacuum value, and particles acquire mass in proportion to how strongly they couple to it. Photons do not couple, which is why they remain massless.",
    },
  ],
});

console.log(JSON.stringify(compacted, null, 2));

{
  "model": "latest",
  "input": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "Tell me about Rust."
    },
    {
      "role": "assistant",
      "content": "Rust is a systems programming language..."
    },
    {
      "role": "user",
      "content": "How does ownership work?"
    },
    {
      "role": "assistant",
      "content": "Ownership in Rust is..."
    }
  ]
}

Output

JSON

{}

Retrieve previous response

/v1/responses/{response_id}

Retrieve a previously generated response.

Path parameters

response_id

string

required

The response id returned by a previous create response request.

Response Body

background

boolean

default: false

OpenResponses compatibility fields. Not used at the moment. Just for OpenResponses compatibility. Whether to process the response asynchronously in the background.

created_at

integer

The Unix timestamp (in seconds) for the response creation time.

error

An error object returned when the model fails to generate a response.

frequency_penalty

number

(NOT SUPPORTED in Responses API) Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

string

Unique ID of the response.

metadata

Only included for compatibility.

model

string

Model name used to generate the response.

object

string

The object type of this resource. Always set to response.

output

array

The response generated by the model.

parallel_tool_calls

boolean

Whether to allow the model to run parallel tool calls.

presence_penalty

number

(NOT SUPPORTED in Responses API) Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

service_tier

string

default: default

status

string

Status of the response. One of completed, in_progress or incomplete.

store

boolean

default: true

Whether to store the input message(s) and model response for later retrieval.

text

object

tool_choice

string | object

Parameter to control how model chooses the tools.

tools

array

A list of tools the model may call in JSON-schema. Currently, only functions and web search are supported as tools. A max of 128 tools are supported.

top_logprobs

integer

An integer between 0 and 8 specifying the number of most likely tokens to return at each token position.

truncation

string

default: disabled

The truncation strategy to use for the model response.

GET

/v1/responses/{response_id}

JSON

No parameters.

200

Response

JSON

{
  "created_at": 1754475266,
  "id": "ad5663da-63e6-86c6-e0be-ff15effa8357",
  "max_output_tokens": null,
  "model": "latest",
  "object": "response",
  "output": [
    {
      "content": [
        {
          "type": "output_text",
          "text": "101 multiplied by 3 is 303.",
          "logprobs": null,
          "annotations": []
        }
      ],
      "id": "msg_ad5663da-63e6-86c6-e0be-ff15effa8357",
      "role": "assistant",
      "type": "message",
      "status": "completed"
    },
    {
      "id": "",
      "summary": [
        {
          "text": "First, the user asked: \"What is 101*3?\"\n\nThis is a simple multiplication: 101 multiplied by 3.\n\nCalculating: 100 * 3 = 300, and 1 * 3 = 3, so 300 + 3 = 303.\n\nI should respond helpfully and directly, as per my system prompt: \"You are a helpful assistant that can answer questions and help with tasks.\"\n\nKeep the response concise and accurate. No need for extra fluff unless it adds value.\n\nFinal answer: 303.",
          "type": "summary_text"
        }
      ],
      "type": "reasoning",
      "status": "completed"
    }
  ],
  "parallel_tool_calls": true,
  "previous_response_id": null,
  "reasoning": null,
  "temperature": null,
  "text": {
    "format": {
      "type": "text"
    }
  },
  "tool_choice": "auto",
  "tools": [],
  "top_p": null,
  "usage": {
    "prompt_tokens": 32,
    "completion_tokens": 9,
    "total_tokens": 151,
    "prompt_tokens_details": {
      "text_tokens": 32,
      "audio_tokens": 0,
      "image_tokens": 0,
      "cached_tokens": 8
    },
    "completion_tokens_details": {
      "reasoning_tokens": 110,
      "audio_tokens": 0,
      "accepted_prediction_tokens": 0,
      "rejected_prediction_tokens": 0
    },
    "num_sources_used": 0
  },
  "user": null,
  "incomplete_details": null,
  "status": "completed",
  "store": true
}

Delete previous response

/v1/responses/{response_id}

Delete a previously generated response.

Path parameters

response_id

string

required

The response id returned by a previous create response request.

Response Body

deleted

boolean

Whether the response was successfully deleted.

string

The response_id to be deleted.

object

string

The deleted object type, which is always response.

DELETE

/v1/responses/{response_id}

JSON

No parameters.

200

Response

JSON

{
  "id": "ad5663da-63e6-86c6-e0be-ff15effa8357",
  "object": "response",
  "deleted": true
}

Get deferred chat completions

/v1/chat/deferred-completion/{request_id}

Tries to fetch a result for a previously-started deferred completion. Returns 200 Success with the response body, if the request has been completed. Returns 202 Accepted when the request is pending processing.

Path parameters

request_id

string

required

The deferred request id returned by a previous deferred chat request.

Response Body

choices

array

A list of response choices from the model. The length corresponds to the n in request body (default to 1).

created

integer

The chat completion creation time in Unix timestamp.

string

A unique ID for the chat response.

model

string

Model ID used to create chat completion.

object

string

The object type, which is always "chat.completion".

service_tier

"default" | "priority"

Processing tier for a request. Determines scheduling priority and billing.

GET

/v1/chat/deferred-completion/{request_id}

JSON

No parameters.

200

Response

JSON

{
  "id": "335b92e4-afa5-48e7-b99c-b9a4eabc1c8e",
  "object": "chat.completion",
  "created": 1743770624,
  "model": "latest",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "101 multiplied by 3 is 303.",
        "refusal": null
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 31,
    "completion_tokens": 11,
    "total_tokens": 42,
    "prompt_tokens_details": {
      "text_tokens": 31,
      "audio_tokens": 0,
      "image_tokens": 0,
      "cached_tokens": 0
    },
    "completion_tokens_details": {
      "reasoning_tokens": 0,
      "audio_tokens": 0,
      "accepted_prediction_tokens": 0,
      "rejected_prediction_tokens": 0
    }
  },
  "system_fingerprint": "fp_156d35dcaa"
}

Last updated: May 21, 2026

Inference API

Chat

Chat completions

Request Body

Response Body

Create new response

Request Body

Response Body

Compact a conversation

Request Body

Response Body

Retrieve previous response

Path parameters

Response Body

Delete previous response

Path parameters

Response Body

Get deferred chat completions

Path parameters

Response Body