PrismAudio, developed by Alibaba's FunAudioLLM team and accepted at ICLR 2026, generates spatial stereo audio directly from silent AI video files. The tool processes uploads in an average of 0.63 seconds and positions sounds left, right, near, and far based on visual content, a capability no other AI audio tool currently offers.
What Happened
Alibaba's FunAudioLLM research team has released PrismAudio, a video-to-audio generation tool built on research accepted at ICLR 2026, one of the top machine learning conferences. Unlike existing tools such as MMAudio, which output mono audio only, PrismAudio generates true stereo with spatial positioning derived from the video frame itself.
The tool supports MP4, MOV, AVI, WebM, and MKV uploads and was designed specifically to complement AI video generators including Sora, Veo3, Kling, Runway, and Pika. A free tier is available with no credit card required. Commercial use starts at $19 per month.
Why It Matters
AI video generation has a persistent audio problem. Tools like Sora and Kling produce visually convincing footage but ship with no sound. Creators have been patching this gap manually, layering stock audio or running separate AI tools that output flat mono tracks.
Stereo spatial audio changes the experience significantly. A drone flying left to right in a video now produces audio that follows that path. Footsteps approaching the camera grow louder. These are the details that make video feel professional, and until now they required either a sound designer or hours of manual work in a DAW.
The speed benchmark matters too. At 0.63 seconds average processing time, PrismAudio fits into a fast production pipeline rather than adding a bottleneck. For creators running multiple video exports per session, that difference compounds quickly.
For deeper context on where AI audio tools are heading, see the full AI Music and Audio Tools guide for 2026.
Key Details
- Developer: Alibaba FunAudioLLM team
- Research: Accepted at ICLR 2026
- Output: Spatial stereo (left, right, near, far positioning)
- Processing speed: 0.63 seconds average, roughly 2x faster than competitors
- Supported formats: MP4, MOV, AVI, WebM, MKV
- Compatibility: Sora, Veo3, Kling, Runway, Pika exports
- Free tier: Available, no credit card required, personal use only
- Starter plan: $19/month for commercial use
PrismAudio handles multi-layered audio scenes, meaning a single video frame with ambient background noise, a foreground subject speaking, and a distant sound effect will produce three separate audio elements blended into the stereo field. That level of audio layering from a single upload is a meaningful step beyond what tools like MMAudio currently offer.
This development fits a broader trend of research labs building production-ready tools on top of peer-reviewed work. Sony's audio foundation model follows a similar pattern. See the Sony Woosh sound effects AI writeup for comparison.
What to Do Next
The free tier at prismaudio.net is the fastest way to evaluate the tool. Upload a recent export from Kling or Runway and compare the stereo output against a mono-only tool on the same clip. The spatial difference is most obvious on headphones.
If you are producing content commercially, the $19/month Starter plan covers commercial licensing. The free tier is limited to personal, non-commercial use.