Model Capabilities
Image Overview
The xAI image APIs support image generation, image editing, multi-image editing, and image understanding workflows with Grok models.
Image Generation
Text to imageGenerate new images from text prompts with Grok Imagine models.
Image Editing
Image + promptEdit a source image with natural language while preserving relevant context.
Multi-Image Editing
Up to 3 imagesCombine multiple visual references in a single image edit.
Pricing
Image generation uses flat per-image pricing rather than token-based pricing like text models. Each generated image incurs a fixed fee regardless of prompt length. Image edits are billed for both the input image and the generated output image. For full pricing details on the grok-imagine-image model, see the model page.
Limitations
- Maximum images per request: 10
- URL expiration: Generated URLs are temporary
- Content moderation: Images are subject to content policy review
Image Generation
Generate new images from text prompts with Grok Imagine models. Configure output count, aspect ratio, resolution, and response format for your use case.
import xai_sdk
client = xai_sdk.Client()
response = client.image.sample(
prompt="A collage of London landmarks in a stenciled street‑art style",
model="grok-imagine-image",
)
print(response.url)
Image Editing
Edit a source image with natural language. Provide a public image URL or base64-encoded data URI, then describe the change you want Grok Imagine to apply.
import base64
import xai_sdk
client = xai_sdk.Client()
# Load image from file and encode as base64
with open("photo.png", "rb") as f:
image_data = base64.b64encode(f.read()).decode("utf-8")
response = client.image.sample(
prompt="Render this as a pencil sketch with detailed shading",
model="grok-imagine-image",
image_url=f"data:image/png;base64,{image_data}",
)
print(response.url)
Multi-Image Editing
Use up to three source images in a single edit. Multi-image editing is useful for combining subjects, transferring styles across references, and composing scenes from several visual inputs.
import xai_sdk
client = xai_sdk.Client()
response = client.image.sample(
prompt="Show all subjects sitting together on the grass in a sunny park. Casual relaxed mood, natural daylight, warm tones. No additional people or animals.",
model="grok-imagine-image",
image_urls=[
"https://docs.x.ai/assets/api-examples/images/image-merge/man.jpg",
"https://docs.x.ai/assets/api-examples/images/image-merge/puppy.jpg",
"https://docs.x.ai/assets/api-examples/images/image-merge/woman.jpg",
],
aspect_ratio="3:2",
)
print(response.url)
Image Understanding
Send images as input to Grok models and generate responses that use visual context. Use this for multimodal chat, visual question answering, screenshots, documents, and other image analysis tasks.
import os
from xai_sdk import Client
from xai_sdk.chat import user, image
client = Client(
api_key=os.getenv("XAI_API_KEY"),
management_api_key=os.getenv("XAI_MANAGEMENT_API_KEY"),
timeout=3600,
)
image_url = "https://science.nasa.gov/wp-content/uploads/2023/09/web-first-images-release.png"
chat = client.chat.create(model="grok-4.3")
chat.append(
user(
"What's in this image?",
image(image_url=image_url, detail="high"),
)
)
response = chat.sample()
print(response)
# The response ID that can be used to continue the conversation later
print(response.id)
Did you find this page helpful?
Last updated: May 6, 2026