Model Capabilities

Reference-to-Video

View as Markdown

Provide one or more reference images to incorporate specific people, objects, clothing, or other visual elements into the generated video. The model uses the reference images as a visual guide, producing a video that features the content from those images. This is useful for virtual try-on, product placement, and character-consistent storytelling.

Unlike image-to-video, where the source image becomes the starting frame, reference images influence what appears in the video without locking in the first frame.

Each reference image can be provided as a public HTTPS URL or a base64-encoded data URI. In the AI SDK, set providerOptions.xai.mode to "reference-to-video" and pass the images with providerOptions.xai.referenceImageUrls.

  • A non-empty prompt is required when using reference images.
  • A maximum of 7 reference images can be provided per request.
  • The maximum duration allowed when using reference images is 10 seconds.
  • Reference images cannot be combined with image-to-video or video editing. Only one mode can be active per request, determined by the parameters on the request.
import os
import xai_sdk

client = xai_sdk.Client(api_key=os.getenv("XAI_API_KEY"))

response = client.video.generate(
    prompt="slow zoom in on the white fashion runway stage. then, the model from <IMAGE_1> walks in from the back of the shot from the white opening, and gracefully walk out onto the front of the white stage platform. they wear the shirt from <IMAGE_2> and black flared jeans. they look dramatically at the camera. high quality slow motion shot. fun, playful. skin pores. highly detailed faces. perfect shot. they reach the end of the runway and look at the camera as the camera slowly zooms. subtle smile.",
    model="grok-imagine-video",
    reference_image_urls=[
        "<IMAGE_URL_1>",
        "<IMAGE_URL_2>",
        "<IMAGE_URL_3>",
    ],
    duration=10,
    aspect_ratio="16:9",
    resolution="720p",
)

print(response.url)


Did you find this page helpful?

Last updated: April 2, 2026