Image understanding

Use xAI's image understanding models to analyze visuals.


The vision model can receive both text and image inputs. You can pass images into the model in one of two ways: base64 encoded strings or web URLs. Our API can handle several images at once, accepting them as base64 encoded data or URLs. It analyzes all images together to respond to inquiries.

You will need to pass in base64 encoded image directly in the request, in the user messages.

Here is an example of how you can load a local image, encode it in Base64 and use it as part of your conversation:

python

import os
from openai import OpenAI
import os
import base64

MODEL_NAME = "grok-vision-beta"
XAI_API_KEY = os.getenv("XAI_API_KEY")
image_path = "..."

client = OpenAI(
    api_key=XAI_API_KEY,
    base_url="https://api.x.ai/v1",
)

def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        encoded_string = base64.b64encode(image_file.read()).decode("utf-8")
    return encoded_string

# Getting the base64 string

base64_image = encode_image(image_path)

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image_url",
                "image_url": {
                    "url": f"data:image/jpeg;base64,{base64_image}",
                    "detail": "high",
                },
            },
            {
                "type": "text",
                "text": "What is on this image ?",
            },
        ],
    },
]

stream = client.chat.completions.create(
    model="grok-vision-beta",
    messages=messages,
    stream=True,
    temperature=0.01,
)

for chunk in stream:
    print(chunk.choices[0].delta.content, end="", flush=True)

The model also support web URL as inputs for images. The API will fetch the image from the public URL and handle it as part of the chat. Integrating with URLs is as simple as:

python

import os
from openai import OpenAI

MODEL_NAME = "grok-vision-beta"
XAI_API_KEY = os.getenv("XAI_API_KEY")
image_url = "..."

client = OpenAI(
    api_key=XAI_API_KEY,
    base_url="https://api.x.ai/v1",
)

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image_url",
                "image_url": {
                    "url": image_url,
                    "detail": "high",
                },
            },
            {
                "type": "text",
                "text": "What's funny about this image?",
            },
        ],
    },
]

stream = client.chat.completions.create(
    model="grok-vision-beta",
    messages=messages,
    stream=True,
    temperature=0.01,
)

for chunk in stream:
    print(chunk.choices[0].delta.content, end="", flush=True)