ElevenLabs released Dubbing v2 on May 28, 2026, replacing the company's first-generation dubbing pipeline with a model that conditions directly on the original audio waveform rather than the transcript. The change targets the longest-standing complaint about AI dubbing: dubbed lines lose the speaker's pacing, emphasis, and emotional register, leaving translated content sounding flat and disconnected from the visuals. Dubbing v2 supports 90+ languages and is live in the ElevenCreative dubbing app today.
Try It: Dub a YouTube Video in 30 Minutes
Open ElevenCreative on the ElevenLabs platform, upload an MP4 or paste a YouTube URL, pick your target language, and let Dubbing v2 process. The Free plan gives you a 1-minute trial, Starter gets 15 minutes, and Creator gets 30 minutes during the 7-day rollout. Review the auto-generated track in the editor before exporting. Sync-aware translation means you should not need to manually nudge timing as often as v1 demanded.
If you publish multi-language YouTube content, the practical workflow is: dub three test clips, A/B against your existing v1 outputs or a human translator, then decide whether to re-dub your back catalog or only forward content.
Why It Matters
The audio-conditioned approach lets the model carry tone across languages instead of inferring it from text. That means a whisper stays a whisper, a sarcastic line keeps its lift, and the dubbed timeline matches the original on pacing rather than running long or short. The closest precedent is the work ElevenLabs shipped on Music v2 earlier this week, which also leaned on conditioning over generation-from-scratch to keep musical context intact across transitions. The same architectural pattern is showing up across the company's recent Music v2 release.
For working creators, the lock-in problem changes. v1 era dubbing required either heavy post-edit cleanup or human voice talent for any content where performance mattered. v2 closes enough of that gap that a one-person YouTube channel can plausibly localize into the top 10 markets without hiring per-language voice actors.
Key Details
Dubbing v2 is shipping to ElevenLabs' three creator tiers first (Free / Starter / Creator+ have 1, 15, and 30-minute trial caps during rollout), with API access flagged as "coming soon" but no date attached. Studios needing professional human translation plus AI voice casting workflows can route through ElevenProductions, the white-glove service the company runs for film and TV clients. The 90+ language coverage means the long-tail languages that v1 handled awkwardly (Vietnamese, Thai, Hungarian, Hebrew) now share the same model architecture as the headline pairs.
Sync-aware alignment is the second non-obvious improvement. v1 dubs frequently ran 5-15% over or under the source duration, requiring manual stretching in post. v2 adapts the translation itself to fit the original cadence, so the output dub lines up with the video without re-editing. That removes the most common post-production task in localization.
What to Do Next
Run one test minute on a clip where v1 produced an emotionally flat dub and compare. If the emotional preservation lands, queue your back catalog and plan when API access goes live (likely the next pricing window). For any creator running multi-language channels, this is the upgrade worth re-pricing the localization budget around.