Inference API

Chat

View as Markdown

Chat completions

/v1/chat/completions

Create a chat response from text/image chat prompts. This is the endpoint for making requests to chat and image understanding models.

Request Body

Response Body

choices

array

A list of response choices from the model. The length corresponds to the n in request body (default to 1).

created

integer

The chat completion creation time in Unix timestamp.

id

string

A unique ID for the chat response.

model

string

Model ID used to create chat completion.

object

string

The object type, which is always "chat.completion".


Create new response

/v1/responses

Generates a response based on text or image prompts. The response ID can be used to retrieve the response later or to continue the conversation without repeating prior context. New responses will be stored for 30 days and then permanently deleted.

Request Body

input

string | array

required

Content of the input passed to a /v1/response request.

Response Body

background

boolean

default: false

OpenResponses compatibility fields. Not used at the moment. Just for OpenResponses compatibility. Whether to process the response asynchronously in the background.

created_at

integer

The Unix timestamp (in seconds) for the response creation time.

error

An error object returned when the model fails to generate a response.

frequency_penalty

number

(NOT SUPPORTED in Responses API) Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

id

string

Unique ID of the response.

metadata

Only included for compatibility.

model

string

Model name used to generate the response.

object

string

The object type of this resource. Always set to response.

output

array

The response generated by the model.

parallel_tool_calls

boolean

Whether to allow the model to run parallel tool calls.

presence_penalty

number

(NOT SUPPORTED in Responses API) Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

service_tier

string

default: default

Specifies the processing tier used for serving the request.

status

string

Status of the response. One of completed, in_progress or incomplete.

store

boolean

default: true

Whether to store the input message(s) and model response for later retrieval.

text

object

tool_choice

string | object

Parameter to control how model chooses the tools.

tools

array

A list of tools the model may call in JSON-schema. Currently, only functions and web search are supported as tools. A max of 128 tools are supported.

top_logprobs

integer

An integer between 0 and 8 specifying the number of most likely tokens to return at each token position.

truncation

string

default: disabled

The truncation strategy to use for the model response.


Retrieve previous response

/v1/responses/{response_id}

Retrieve a previously generated response.

Path parameters

response_id

string

required

The response id returned by a previous create response request.

Response Body

background

boolean

default: false

OpenResponses compatibility fields. Not used at the moment. Just for OpenResponses compatibility. Whether to process the response asynchronously in the background.

created_at

integer

The Unix timestamp (in seconds) for the response creation time.

error

An error object returned when the model fails to generate a response.

frequency_penalty

number

(NOT SUPPORTED in Responses API) Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

id

string

Unique ID of the response.

metadata

Only included for compatibility.

model

string

Model name used to generate the response.

object

string

The object type of this resource. Always set to response.

output

array

The response generated by the model.

parallel_tool_calls

boolean

Whether to allow the model to run parallel tool calls.

presence_penalty

number

(NOT SUPPORTED in Responses API) Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

service_tier

string

default: default

Specifies the processing tier used for serving the request.

status

string

Status of the response. One of completed, in_progress or incomplete.

store

boolean

default: true

Whether to store the input message(s) and model response for later retrieval.

text

object

tool_choice

string | object

Parameter to control how model chooses the tools.

tools

array

A list of tools the model may call in JSON-schema. Currently, only functions and web search are supported as tools. A max of 128 tools are supported.

top_logprobs

integer

An integer between 0 and 8 specifying the number of most likely tokens to return at each token position.

truncation

string

default: disabled

The truncation strategy to use for the model response.


Delete previous response

/v1/responses/{response_id}

Delete a previously generated response.

Path parameters

response_id

string

required

The response id returned by a previous create response request.

Response Body

deleted

boolean

Whether the response was successfully deleted.

id

string

The response_id to be deleted.

object

string

The deleted object type, which is always response.


Get deferred chat completions

/v1/chat/deferred-completion/{request_id}

Tries to fetch a result for a previously-started deferred completion. Returns 200 Success with the response body, if the request has been completed. Returns 202 Accepted when the request is pending processing.

Path parameters

request_id

string

required

The deferred request id returned by a previous deferred chat request.

Response Body

choices

array

A list of response choices from the model. The length corresponds to the n in request body (default to 1).

created

integer

The chat completion creation time in Unix timestamp.

id

string

A unique ID for the chat response.

model

string

Model ID used to create chat completion.

object

string

The object type, which is always "chat.completion".


Did you find this page helpful?