HappyHorse-1.0 AI Video Model Now on fal.ai

Alibaba's HappyHorse-1.0 became available on fal.ai on April 26, giving creators API access to the AI video model that currently ranks first on the Artificial Analysis Video Arena leaderboard for both text-to-video and image-to-video. It generates 1080p clips with synchronized native audio in a single pass.

For the broader landscape, see our complete guide to AI video generation in 2026.

What Happened

HappyHorse-1.0, built by Alibaba's ATH AI Innovation Unit (Taotian Group), launched on fal.ai on April 26 at 9 PM PST. The model had been available only through the third-party happyhorse.app since its initial appearance, which means this is the first time creators can access it through a standard API with per-second billing and no platform lock-in.

Alibaba revealed the model's origin on April 10, after it surfaced anonymously on the Artificial Analysis leaderboard around April 7 and quickly reached #1 in blind human preference voting.

Why It Matters

The leaderboard rank matters because Artificial Analysis uses blind voting -- voters do not see which model they are evaluating. HappyHorse-1.0 holds a 107-point Elo lead over the second-ranked model in text-to-video (without audio), which translates to users preferring its output roughly 65% of the time in head-to-head comparisons. That kind of gap in a blind test is a credible quality signal.

The architectural differentiator is joint audio-video generation. Most competing models run audio as a separate post-processing step, which introduces timing drift. HappyHorse generates dialogue, ambient sound, and Foley effects in the same forward pass as the video, which is why its multilingual lip-sync benchmark scores lower word error rates than LTX 2.3 and OVI 1.1.

Practically speaking, this matters for talking-head content, multilingual marketing, and any use case where audio-visual sync failure is a visible problem. It does not have the longest max duration (capped at 10 seconds in the UI) or the lowest per-second cost, but for output quality and audio sync, it currently leads the field.

Key Details

Leaderboard: #1 text-to-video and image-to-video on Artificial Analysis Video Arena (Elo 1,360 T2V, 1,403 I2V)
Architecture: 15B-parameter unified Transformer, joint audio-video single-pass generation
Resolution: 720p and 1080p; aspect ratios 16:9, 9:16, 1:1, 4:3
Duration: 3-15 seconds supported; UI currently limits to 5 or 10 seconds
Audio: Native dialogue, ambient, and Foley -- multilingual lip-sync in English, Mandarin, Cantonese, Japanese, Korean, German, French
fal.ai pricing: $0.14/second at 720p, $0.28/second at 1080p
API: Also available via Alibaba Cloud's Bailian platform as of April 27

What to Do Next

Try HappyHorse-1.0 at fal.ai/happyhorse-1.0 -- no subscription required, pay per second of generated video. For best results with audio, enable audio generation and include language and sound direction in your prompt (for example: "English dialogue, street ambient sound bed"). The model's strongest use cases are talking-head video, animated stills with synced speech, and multilingual localization where other models produce drift.

At $0.28/second for 1080p, a 5-second clip costs $1.40. That is higher than Kling 3.0 Pro ($0.55-0.85 for 5 seconds) but on par with Seedance 2.0 for equivalent resolution. Run a 5-second test before committing to longer clips or batch production.

HappyHorse-1.0, Alibaba'\''s #1 AI Video Model, Now on fal.ai

What Happened

Why It Matters

Key Details

What to Do Next

Keep reading

Gemini Generates PDFs, Excel, Slides Direct From Chat

IBM Granite 4.1: Dense LLMs Walk Back the MoE Bet

Mistral Medium 3.5: 128B Open Weights, Cloud Vibe Agents

What Happened

Why It Matters

Key Details

What to Do Next

Stay ahead of AI

Keep reading

Gemini Generates PDFs, Excel, Slides Direct From Chat

IBM Granite 4.1: Dense LLMs Walk Back the MoE Bet

Mistral Medium 3.5: 128B Open Weights, Cloud Vibe Agents

Stay ahead of Creative AI