xAI launched Grok Imagine Video 1.5 on June 4, with Elon Musk confirming the official release by sharing an AI-generated Iliad trailer. The model now sits at #1 on the Artificial Analysis Image-to-Video Arena leaderboard, ahead of ByteDance Seedance 2.0, Alibaba HappyHorse 1.0, and Google Veo. It produces 720p clips up to 15 seconds with native synchronized audio in a single inference pass.

How to integrate Grok Imagine 1.5 into your video pipeline

The model is exposed through the standard xAI API at api.x.ai using the identifier grok-imagine-video-1.5-2026-05-30, and the developer docs show endpoints for five distinct workflows: image-to-video, text-to-video, video editing, multi-image editing, and reference-to-video. Output is H.264 MP4 at 24fps in either 480p or 720p across seven aspect ratios. For a typical creator pipeline, that means you can drop a still frame from Midjourney, Reve, or Nano Banana into the API, describe the camera move and pacing in a text prompt, and get a 5-second 720p clip back in roughly 20 to 30 seconds, two to three times faster than competing image-to-video models. The native audio stack handles lip-synced dialogue, sound effects, and ambient music in the same call, so you avoid the second round-trip to ElevenLabs or Suno that most current video pipelines require.

Why it matters

The video generation field has been waiting for a serious image-to-video contender since OpenAI shut down Sora earlier this spring and left a documented production gap that hit projects already in flight at Cannes. Runway's Aleph 2.0 release filled part of that gap on the editing side, but creators wanting a single-shot image-to-video with bundled audio were stuck stitching outputs from three or four vendors. Grok Imagine 1.5 collapses that stack into one API call, and the +52 Elo jump over version 1.0 on the Arena leaderboard suggests the underlying model is now legitimately competitive at the per-shot quality level, not just on price.

Key details

The preview launched on May 30 with broad rollout following on June 4. Per the basenor.com developer breakdown, the model accepts JPG, JPEG, PNG, WEBP, GIF, and AVIF inputs, and 15-second clips are a hard ceiling, up from the 10-second cap in version 1.0. Pricing has not been published on the public docs page, and broader consumer rollout to X Premium subscribers is still in progress. The model card lists video extension and reference-guided generation as supported, which lines up with the Arena ranking against Veo and Seedance on multi-shot consistency tests.

What to do next

If you already have an xAI API key, swap a single image-to-video call into your existing pipeline and benchmark the latency and audio quality against your current Veo or Seedance flow. If you are still picking a stack for a Q3 production, our 2026 AI video generators comparison covers the tradeoffs across the active vendors. Watch the X Premium consumer tier rollout for pricing signals, since the preview API pricing will likely follow the consumer tier launch.