Image understanding
Use xAI's image understanding models to analyze visuals.
Introduction
The vision model can receive both text and image inputs. You can pass images into the model in one of two ways: base64 encoded strings or web URLs. Our API can handle several images at once, accepting them as base64 encoded data or URLs. It analyzes all images together to respond to inquiries.
Base64 string input
You will need to pass in base64 encoded image directly in the request, in the user messages.
Here is an example of how you can load a local image, encode it in Base64 and use it as part of your conversation:
python
import os
from openai import OpenAI
import os
import base64
MODEL_NAME = "grok-vision-beta"
XAI_API_KEY = os.getenv("XAI_API_KEY")
image_path = "..."
client = OpenAI(
api_key=XAI_API_KEY,
base_url="https://api.x.ai/v1",
)
def encode_image(image_path):
with open(image_path, "rb") as image_file:
encoded_string = base64.b64encode(image_file.read()).decode("utf-8")
return encoded_string
# Getting the base64 string
base64_image = encode_image(image_path)
messages = [
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{base64_image}",
"detail": "high",
},
},
{
"type": "text",
"text": "What is on this image ?",
},
],
},
]
stream = client.chat.completions.create(
model="grok-vision-beta",
messages=messages,
stream=True,
temperature=0.01,
)
for chunk in stream:
print(chunk.choices[0].delta.content, end="", flush=True)
Web URL input
The model also support web URL as inputs for images. The API will fetch the image from the public URL and handle it as part of the chat. Integrating with URLs is as simple as:
python
import os
from openai import OpenAI
MODEL_NAME = "grok-vision-beta"
XAI_API_KEY = os.getenv("XAI_API_KEY")
image_url = "..."
client = OpenAI(
api_key=XAI_API_KEY,
base_url="https://api.x.ai/v1",
)
messages = [
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": image_url,
"detail": "high",
},
},
{
"type": "text",
"text": "What's funny about this image?",
},
],
},
]
stream = client.chat.completions.create(
model="grok-vision-beta",
messages=messages,
stream=True,
temperature=0.01,
)
for chunk in stream:
print(chunk.choices[0].delta.content, end="", flush=True)