Model Capabilities

Image Overview

View as Markdown

The xAI image APIs support image generation, image editing, multi-image editing, and image understanding workflows with Grok models.

Image Generation

Text to image

Generate new images from text prompts with Grok Imagine models.

Image Editing

Image + prompt

Edit a source image with natural language while preserving relevant context.

Multi-Image Editing

Up to 3 images

Combine multiple visual references in a single image edit.

Pricing

Image generation uses flat per-image pricing rather than token-based pricing like text models. Each generated image incurs a fixed fee regardless of prompt length. Image edits are billed for both the input image and the generated output image. For full pricing details on the grok-imagine-image model, see the model page.

Limitations

  • Maximum images per request: 10
  • URL expiration: Generated URLs are temporary
  • Content moderation: Images are subject to content policy review

Image Generation

Generate new images from text prompts with Grok Imagine models. Configure output count, aspect ratio, resolution, and response format for your use case.

import xai_sdk

client = xai_sdk.Client()

response = client.image.sample(
    prompt="A collage of London landmarks in a stenciled street‑art style",
    model="grok-imagine-image",
)

print(response.url)

Image Editing

Edit a source image with natural language. Provide a public image URL or base64-encoded data URI, then describe the change you want Grok Imagine to apply.

import base64
import xai_sdk

client = xai_sdk.Client()

# Load image from file and encode as base64
with open("photo.png", "rb") as f:
    image_data = base64.b64encode(f.read()).decode("utf-8")

response = client.image.sample(
    prompt="Render this as a pencil sketch with detailed shading",
    model="grok-imagine-image",
    image_url=f"data:image/png;base64,{image_data}",
)

print(response.url)

Multi-Image Editing

Use up to three source images in a single edit. Multi-image editing is useful for combining subjects, transferring styles across references, and composing scenes from several visual inputs.

import xai_sdk

client = xai_sdk.Client()

response = client.image.sample(
    prompt="Show all subjects sitting together on the grass in a sunny park. Casual relaxed mood, natural daylight, warm tones. No additional people or animals.",
    model="grok-imagine-image",
    image_urls=[
        "https://docs.x.ai/assets/api-examples/images/image-merge/man.jpg",
        "https://docs.x.ai/assets/api-examples/images/image-merge/puppy.jpg",
        "https://docs.x.ai/assets/api-examples/images/image-merge/woman.jpg",
    ],
    aspect_ratio="3:2",
)

print(response.url)

Image Understanding

Send images as input to Grok models and generate responses that use visual context. Use this for multimodal chat, visual question answering, screenshots, documents, and other image analysis tasks.

import os
from xai_sdk import Client
from xai_sdk.chat import user, image

client = Client(
    api_key=os.getenv("XAI_API_KEY"),
    management_api_key=os.getenv("XAI_MANAGEMENT_API_KEY"),
    timeout=3600,
)

image_url = "https://science.nasa.gov/wp-content/uploads/2023/09/web-first-images-release.png"
chat = client.chat.create(model="grok-4.3")
chat.append(
    user(
        "What's in this image?",
        image(image_url=image_url, detail="high"),
    )
)

response = chat.sample()
print(response)

# The response ID that can be used to continue the conversation later

print(response.id)

Did you find this page helpful?

Last updated: May 6, 2026