Model Capabilities
Imagine Overview
The Imagine API lets you generate and edit images and videos with Grok Imagine models. Use it for image generation, image editing with up to 3 reference images, video generation from text or still images, video editing, and more.
Image Editing
Edit images with natural language. Supports up to 3 reference images per request.
Image Generation
Generate images from text prompts with configurable aspect ratio, resolution, and count.
Image-to-Video
Animate a still image with a text prompt. The source image becomes the first frame.
Pricing
Image generation uses flat per-image pricing regardless of prompt length. Each generated image incurs a fixed fee. Image edits are billed for both the input image and the generated output image. Video generation uses per-second pricing where both duration and resolution affect the total cost. For full pricing details, see the models page.
Image Editing
Edit a source image with natural language. Provide a public image URL or base64-encoded data URI, then describe the change you want Grok Imagine to apply. Multi-image editing supports up to 3 source images in a single request for combining subjects, transferring styles, and composing scenes.
import base64
import xai_sdk
client = xai_sdk.Client()
# Load image from file and encode as base64
with open("photo.png", "rb") as f:
image_data = base64.b64encode(f.read()).decode("utf-8")
response = client.image.sample(
prompt="Render this as a pencil sketch with detailed shading",
model="grok-imagine-image-quality",
image_url=f"data:image/png;base64,{image_data}",
)
print(response.url)
Image Generation
Generate new images from text prompts with Grok Imagine models. Configure output count (up to 10 images per request), aspect ratio, resolution, and response format.
import xai_sdk
client = xai_sdk.Client()
response = client.image.sample(
prompt="A collage of London landmarks in a stenciled street‑art style",
model="grok-imagine-image-quality",
)
print(response.url)
Image-to-Video
Animate a still image with a text prompt. The source image becomes the starting point for the generated video. Video requests are asynchronous: start a request, poll with the returned request ID, and use the completed video URL when ready. The xAI SDK and AI SDK handle polling for you.
import os
import xai_sdk
client = xai_sdk.Client(api_key=os.getenv("XAI_API_KEY"))
response = client.video.generate(
prompt="Make the water crash down and slowly pan out the camera",
model="grok-imagine-video",
image_url="https://docs.x.ai/assets/api-examples/video/waterfall-still.png",
duration=12,
)
print(response.url)
More Capabilities
Beyond the top use cases above, the Imagine API supports several additional workflows:
- Multi-Image Editing — Combine up to 3 source images in a single edit for compositing subjects, transferring styles, and building scenes from multiple references.
- Video Generation — Generate videos from text prompts with configurable duration (up to 15s), aspect ratio, and resolution.
- Video Editing — Modify an existing video with a text prompt while preserving the rest of the scene.
- Reference-to-Video — Guide a generated video with one or more reference images that influence the output without forcing the first frame.
- Video Extension — Continue an existing video from its last frame, combining the original and extension into one clip.
Enterprise Compliance & Security
The Imagine APIs are built for production workloads with strict security and compliance requirements. Generated media is subject to content policy review and is not used for training.
Last updated: May 12, 2026