Grok Imagine Video 1.5, the new image-to-video model xAI pushed to wide release on June 17, 2026, does something the video-generation race has avoided so far: it competes at the top of the quality leaderboard while charging a fraction of the going rate. It generates clips with synchronized audio for about $4.20 per minute, roughly 86 percent below the price creators pay for OpenAI's Sora 2. For anyone who has been priced out of AI video, that gap is the story.
What Shipped
Grok Imagine Video 1.5 takes a still image plus a short motion prompt and animates it into a clip of up to 15 seconds at 480p or 720p. The headline feature is native audio: sound effects, ambience, and dialogue are generated in the same pass as the visuals rather than bolted on afterward, so speech and on-screen action stay in sync. xAI says the model claimed the top spot on the public image-to-video leaderboard with a 52-point Elo jump over the previous version.
The model is generally available now through the Grok Imagine web app, the iOS and Android apps, and the xAI API under the id grok-imagine-video-1.5. A faster variant, branded Video 1.5 Fast, renders a six-second 720p clip in about 25 seconds, close to twice the speed of the prior generation. That combination of price, speed, and bundled audio is what makes this release worth a closer look than a routine version bump.

Grok Imagine 1.5 vs Sora 2, Kling 3, and Runway
The case for Grok Imagine Video 1.5 is sharpest when you line it up against the models creators already pay for. The table below compares the four on the dimensions that decide a real production budget. Pricing for Sora 2 reflects the roughly $30-per-minute tier that xAI used as its 86-percent reference point; Kling and Runway sell credits rather than flat per-minute rates, so their effective cost shifts with resolution and length.
| Model | Price | Max clip | Resolution | Native audio | Standout |
|---|---|---|---|---|---|
| Grok Imagine Video 1.5 | ~$4.20/min | 15 sec | 480p / 720p | Yes (one pass) | Top leaderboard rank at lowest price |
| OpenAI Sora 2 | ~$30/min | Longer | Up to 1080p | Yes | Highest fidelity and length |
| Kling 3.0 | Credit-based | Longer | Up to 4K | Yes | 4K output, multilingual audio |
| Runway Gen-4 | Credit-based | Longer | Up to 1080p | Yes | Editing and pipeline integrations |
The honest read is that Grok does not win on raw ceiling. Sora 2 still goes longer and sharper, and Kling holds the resolution crown at 4K. What Grok wins is cost-per-iteration. At 720p and 15 seconds, it covers the bulk of social-first work, and the price means you can run ten variations for what one Sora clip costs. For our earlier breakdown of how these per-second economics stack up across the field, see our AI video cost-per-second analysis.
What the Price Drop Means for Creators
Cheap iteration changes the workflow, not just the invoice. AI video is a numbers game: the difference between a usable shot and a discard is often the seed, the motion phrasing, or a half-second of timing, and you find the keeper by generating many takes. At $30 a minute, creators ration prompts and settle early. At $4.20, the same budget buys enough takes to actually direct a scene.

The bundled audio compounds the savings. A typical short-form pipeline pairs a video generator with a separate sound or voice tool, which adds both cost and a syncing step. Grok folds ambience, effects, and dialogue into the same render, so a meme clip, a product teaser, or a quick explainer can leave the model as a finished beat instead of a silent plate waiting for post. That is a meaningful shave on turnaround for solo creators and small social teams. It also lowers the stakes of experimentation: when a take is nearly free, creators try riskier camera moves and stranger prompts instead of replaying the one composition they know will render cleanly, and that is usually where the memorable shots come from.
Where It Falls Short
This is not a Sora killer for premium work. The 480p and 720p ceiling rules out large-screen or client-grade delivery where 1080p or 4K is the floor, and the 15-second cap keeps it in short-form territory. Output quality also leans hard on the input image: a soft or busy starting frame animates into mush, so the model rewards creators who already generate clean stills. And as with every fast-moving xAI release, content controls and watermarking are still settling, which matters if you publish on platforms with synthetic-media disclosure rules.

For creators weighing the broader field rather than a single model, our Kling 3 vs Runway vs Sora comparison covers how the premium tier trades off length, resolution, and control.
How to Try It in 20 Minutes
You can pressure-test Grok Imagine Video 1.5 on a real task in one sitting. First, generate or pick a clean, high-resolution still as your starting frame; a crisp subject on an uncluttered background animates best. Second, open Grok Imagine on web or mobile and upload the frame. Third, write a motion prompt that describes the camera and the action separately, for example "slow push-in, the subject turns toward the light, soft room tone." Fourth, generate three to five takes with the Fast variant and compare timing and audio sync. Fifth, once a take lands, regenerate it on the standard model for the cleaner pass.
Developers building this into a pipeline can call the same model through the xAI API as grok-imagine-video-1.5; our coverage of the Grok Imagine API and public file URLs walks through the request and output handling.
Frequently asked questions
How much does Grok Imagine Video 1.5 cost?
It runs about $4.20 per minute of generated video, which xAI frames as roughly 86 percent below the comparable Sora 2 tier. Audio is included in that price rather than billed separately.
Does Grok Imagine Video 1.5 generate sound?
Yes. Sound effects, ambient noise, and dialogue are produced in the same pass as the video, so speech and action stay synchronized without a separate audio tool.
What resolution and length can it output?
Clips run up to 15 seconds at 480p or 720p. There is no 1080p or 4K option yet, which keeps it aimed at short-form and social work rather than large-screen delivery.
How is it different from Sora 2 or Kling?
Sora 2 goes longer and up to 1080p, and Kling 3.0 outputs up to 4K, so both have a higher quality ceiling. Grok's advantage is price and speed: top leaderboard placement at a fraction of the cost, with audio bundled in.
Where can I use it?
Grok Imagine Video 1.5 is live on the Grok Imagine web app, the iOS and Android apps, and the xAI API under the model id grok-imagine-video-1.5. A Video 1.5 Fast mode trades a little quality for roughly double the render speed.
Is the output safe to publish commercially?
Treat that cautiously. Watermarking and content controls on new xAI releases are still maturing, and some platforms require synthetic-media disclosure, so check both xAI's terms and your destination platform's rules before publishing client work.