PrismAudio: AI Video-to-Stereo Audio in 0.63s

PrismAudio, developed by Alibaba's FunAudioLLM team and accepted at ICLR 2026, generates spatial stereo audio directly from silent AI video files. The tool processes uploads in an average of 0.63 seconds and positions sounds left, right, near, and far based on visual content, a capability no other AI audio tool currently offers.

For the broader landscape, see our complete guide to AI video generation in 2026.

What Happened

Alibaba's FunAudioLLM research team has released PrismAudio, a video-to-audio generation tool built on research accepted at ICLR 2026, one of the top machine learning conferences. Unlike existing tools such as MMAudio, which output mono audio only, PrismAudio generates true stereo with spatial positioning derived from the video frame itself.

The tool supports MP4, MOV, AVI, WebM, and MKV uploads and was designed specifically to complement AI video generators including Sora, Veo3, Kling, Runway, and Pika. A free tier is available with no credit card required. Commercial use starts at $19 per month.

Why It Matters

AI video generation has a persistent audio problem. Tools like Sora and Kling produce visually convincing footage but ship with no sound. Creators have been patching this gap manually, layering stock audio or running separate AI tools that output flat mono tracks.

Stereo spatial audio changes the experience significantly. A drone flying left to right in a video now produces audio that follows that path. Footsteps approaching the camera grow louder. These are the details that make video feel professional, and until now they required either a sound designer or hours of manual work in a DAW.

The speed benchmark matters too. At 0.63 seconds average processing time, PrismAudio fits into a fast production pipeline rather than adding a bottleneck. For creators running multiple video exports per session, that difference compounds quickly.

For deeper context on where AI audio tools are heading, see the full AI Music and Audio Tools guide for 2026.

Key Details

Developer: Alibaba FunAudioLLM team
Research: Accepted at ICLR 2026
Output: Spatial stereo (left, right, near, far positioning)
Processing speed: 0.63 seconds average, roughly 2x faster than competitors
Supported formats: MP4, MOV, AVI, WebM, MKV
Compatibility: Sora, Veo3, Kling, Runway, Pika exports
Free tier: Available, no credit card required, personal use only
Starter plan: $19/month for commercial use

PrismAudio handles multi-layered audio scenes, meaning a single video frame with ambient background noise, a foreground subject speaking, and a distant sound effect will produce three separate audio elements blended into the stereo field. That level of audio layering from a single upload is a meaningful step beyond what tools like MMAudio currently offer.

This development fits a broader trend of research labs building production-ready tools on top of peer-reviewed work. Sony's audio foundation model follows a similar pattern. See the Sony Woosh sound effects AI writeup for comparison.

What to Do Next

The free tier at prismaudio.net is the fastest way to evaluate the tool. Upload a recent export from Kling or Runway and compare the stereo output against a mono-only tool on the same clip. The spatial difference is most obvious on headphones.

If you are producing content commercially, the $19/month Starter plan covers commercial licensing. The free tier is limited to personal, non-commercial use.

PrismAudio Turns Silent AI Video Into Stereo

What Happened

Why It Matters

Key Details

What to Do Next

Keep reading

ComfyUI v0.29.0 Adds HeyGen, GPT-5.6, and Gemma4 Nodes

Sessiongrep: Searchable Memory for AI Coding Agents

How to Make YouTube Thumbnails With AI (2026 Guide)

What Happened

Why It Matters

Key Details

What to Do Next

Stay ahead of AI

Keep reading

ComfyUI v0.29.0 Adds HeyGen, GPT-5.6, and Gemma4 Nodes

Sessiongrep: Searchable Memory for AI Coding Agents

How to Make YouTube Thumbnails With AI (2026 Guide)

Stay ahead of Creative AI