Seven of the top ten text-to-video models on HuggingFace are Wan2.2 variants, with combined monthly downloads exceeding 500,000. Open-source video generation is no longer catching up to proprietary tools. It has taken the lead.
This analysis draws from HuggingFace model data, trending research papers, and current pricing across every major platform to map the full AI video generation landscape as of March 2026.
Key Findings
1. Wan2.2 Dominates the Open-Source Video Stack
Alibaba's Wan2.2 has become the foundation layer of open-source video generation. Across HuggingFace, variants of the model occupy the top spots in text-to-video downloads.
| Model | Downloads | Likes | Type |
|---|---|---|---|
| Wan2.2-T2V-A14B-Diffusers | 130,303 | 118 | Text-to-Video |
| Wan2.2-T2V-A14B-GGUF | 113,254 | 233 | GGUF Quantized |
| Wan2.1-T2V-1.3B-Diffusers | 99,231 | 113 | Lightweight T2V |
| Wan2.2-TI2V-5B-Diffusers | 43,867 | 114 | Text+Image-to-Video |
| CogVideoX-5B | 33,977 | 665 | Text-to-Video |
| Wan2.1-T2V-14B | 30,220 | 1,459 | Full-size T2V |
The Wan ecosystem extends well beyond Alibaba's official releases. Kijai's ComfyUI WanVideoWrapper provides GGUF support and memory-optimized inference. Alibaba PAI's Reward LoRAs (26,614 downloads) fine-tune output quality. FastVideo's full-attention variant (22,308 downloads) trades memory for speed. And all of it ships under Apache 2.0, meaning commercial use is unrestricted.
2. The Pricing War Has Cratered Costs
Every major proprietary platform has restructured its pricing since late 2025. The result: generating a 10-second HD video now costs between $0.25 and $5.00 depending on the platform and quality tier. Entry-level plans start as low as $8/month.
| Platform | Entry Plan | Pro Plan | API Cost/Second | Max Resolution |
|---|---|---|---|---|
| Runway | $12/mo | $28/mo | N/A (credit-based) | 4K (Gen-4.5) |
| Pika | $8/mo | $28/mo | N/A (credit-based) | 1080p |
| Kling | $10/mo | $37/mo | ~$0.035/sec (Standard) | 4K 60fps (3.0) |
| Hailuo (MiniMax) | $9.99/mo | $94.99/mo (Unlimited) | $0.045/sec (768p) | 768p |
| Sora 2 | $20/mo (ChatGPT Plus) | $200/mo (Pro) | $0.10-$0.50/sec | 1080p |
| Veo 3.1 (Google) | $7.99/mo (AI Plus) | $249.99/mo (Ultra) | $0.15-$0.40/sec | 1080p |
| Adobe Firefly | $9.99/mo | $19.99/mo | Credit-based | 4K (via Topaz) |
Google's entry at $7.99/month for Veo 3.1 Fast is the most aggressive consumer pricing in the market. Kling remains the best value for API users, while Sora 2 commands a significant premium that reflects both brand and quality positioning.
3. 4K and Native Audio Are Now Baseline
Kling 3.0 (launched February 4, 2026) ships native 4K at 60fps with 16-bit HDR. LTX-2.3 from Lightricks matches 4K at 50fps in a fully open-source package. Runway Gen-4.5 brings 4K to its creative toolset. The race to 4K is over. The next battlefield is temporal consistency at longer durations.
Native audio generation has moved from novelty to expectation. Kling 3.0's "Omni Native Audio" generates synchronized sound alongside video pixels. Veo 3.1 ships audio by default. LTX-2.3 handles video and audio in a single forward pass. Sora 2 introduced synchronized dialogue and sound effects in September 2025. Any platform launching without audio in 2026 is already behind.
4. Helios Rewrites the Performance Playbook
Helios, released March 4, 2026 by a team from Peking University, ByteDance, and Canva, is a 14-billion-parameter autoregressive diffusion model that generates video at 19.5 FPS on a single H100 GPU. That speed matches models one-tenth its size. It produces minute-long videos (1,452 frames at 24 FPS) without KV-cache, quantization, or sparse attention tricks.
With Group Offloading, the model runs on as little as 6 GB of VRAM, bringing 14B-class quality to consumer hardware. Released under Apache 2.0, Helios hit #2 Paper of the Day on HuggingFace and collected over 1,100 GitHub stars in its first week. Its architecture suggests that the "bigger models are slower models" assumption no longer holds.
5. ByteDance's Seedance 2.0: Power Meets Controversy
Seedance 2.0 from ByteDance is technically remarkable: native 2K output (2048x1080), four-modality input (text, image, audio, video), 12-file multimodal reference support, and director-level camera control. It represents the category moving toward unified multimodal video systems.
But the launch has been turbulent. Senators Marsha Blackburn and Peter Welch called for an immediate shutdown after users generated deepfake-quality videos of real people and copyrighted characters. ByteDance paused the global rollout on March 15 to address safety and copyright concerns. The technical capability is undeniable, but the guardrail question may define whether the platform reaches a broad audience.
6. The Wan Ecosystem Is Building a Full Pipeline
Wan is not just a model anymore. It is an ecosystem. The progression from Wan 2.1 to Wan 2.2 to Wan 2.6 shows a deliberate strategy: open-source the base model, let the community build tooling, then layer on premium capabilities.
Wan 2.6, released December 2025, introduced multi-shot generation (up to 15 seconds at 1080p), character consistency across scenes (150 reference frames), multi-character support (up to 3 simultaneous references), and embedded text generation without post-processing. The model uses a Mixture-of-Experts architecture with 14 billion parameters, trained on 1.5 billion videos and 10 billion images.
ComfyUI's native Wan 2.2 support has become the de facto creative interface. Community LoRA collections for animation styles, brand identities, and cinematic looks are expanding rapidly. The open license (Apache 2.0) means studios can fine-tune without legal complexity.
7. Research Points Toward Video World Models
The trending papers on HuggingFace signal where the field is headed. "Video-CoE: Reinforcing Video Event Prediction via Chain of Events" (83 upvotes) treats video generation as causal reasoning. "MosaicMem: Hybrid Spatial Memory for Controllable Video World Models" (69 upvotes) introduces spatial memory that maintains scene consistency across extended generations. "Stereo World Model: Camera-Guided Stereo Video Generation" (8 upvotes) pushes toward spatially aware stereo output.
These papers share a thesis: video generation is converging with world simulation. The models that win long-term will not just generate pixels that look right, but will understand the causal physics of what they are rendering.
Trend Analysis
Rising
- Open-source 4K models: LTX-2.3 and Helios prove that 4K and real-time generation are achievable without proprietary infrastructure.
- Native audio-video generation: Generating synchronized sound alongside video is becoming standard across both open and proprietary platforms.
- Multi-shot and long-form generation: Kling 3.0 supports 6 camera cuts per generation. Helios generates minute-long clips. The 5-second ceiling is gone.
- Consumer hardware access: GGUF quantization and group offloading are bringing 14B-class models to GPUs with 6-8 GB VRAM.
Stable
- Wan-family dominance in open source: Seven months after Wan 2.2's release, no open-source competitor has displaced it at the top of the download charts.
- Credit-based pricing: Most platforms still use credit systems rather than pure per-second billing, keeping cost comparisons deliberately opaque.
- ComfyUI as the creative hub: ComfyUI has solidified its position as the default workflow tool for local video generation.
Emerging
- Video world models: Research is moving from "generate what looks right" to "generate what is physically correct." Expect production models to ship with physics reasoning within 12 months.
- Regulatory friction for Chinese models: Seedance 2.0's global pause and political pressure on ByteDance suggest that geopolitical dynamics will increasingly shape which models reach which markets.
- Unified multimodal generation: Seedance 2.0's four-modality input (text/image/audio/video) and Wan 2.6's multi-character consistency hint at a future where "video generation" and "video editing" merge into one tool.
Predictions
- Wan 2.7 or 3.0 will ship native 4K and audio by Q3 2026. The Wan 2.7 announcement already signals a major all-around upgrade. Alibaba's pattern of rapid iteration makes this nearly certain.
- At least one major platform will offer unlimited HD video generation for under $20/month by mid-2026. Google's $7.99 entry and Adobe's unlimited generation promotion are setting the floor. The economics of inference optimization will push prices lower.
- Helios-style autoregressive architectures will replace diffusion transformers as the default approach for real-time applications by late 2026. The speed advantage (19.5 FPS at 14B parameters) is too significant to ignore for interactive use cases.
- Deepfake regulation will force at least two platforms to implement mandatory provenance watermarking by Q4 2026. The Seedance 2.0 controversy accelerated regulatory attention. C2PA-style content credentials are the likely standard.
- Open-source models will match proprietary quality on standard benchmarks by year-end. LTX-2.3 and Helios are already competitive. The gap closes further with each release.
What This Means for Creators
If you are experimenting with AI video for the first time, start with Pika ($8/month) or Google AI Plus ($7.99/month for Veo 3.1 Fast). Both offer approachable interfaces and enough credits to learn without overspending.
If you need production-quality output with creative control, Runway Gen-4.5 remains the strongest toolset for professional workflows. Its motion brushes, prompt adherence, and scene consistency are best-in-class for directed work.
If you want maximum control and zero recurring costs, set up a local Wan 2.2 pipeline through ComfyUI. The GGUF variants run on consumer GPUs, LoRA support enables custom styles, and the Apache 2.0 license means no restrictions on commercial use.
If speed matters more than anything, watch Helios closely. Real-time generation at 14B quality is a genuine breakthrough, and the 6 GB VRAM minimum via Group Offloading makes it accessible on mid-range hardware.
For API-first workflows, Kling's API at ~$0.035/second is the best value for batch processing. Hailuo's $0.045/second is a close second with the added benefit of MiniMax's fast iteration on new models like Hailuo 2.3.
The single most important shift: you no longer need to choose between open and proprietary. The Wan ecosystem and models like Helios and LTX-2.3 have closed the gap enough that the choice is about workflow preference, not quality compromise.
Full Data Summary
| Platform | Type | Max Resolution | Max Duration | Native Audio | Starting Price |
|---|---|---|---|---|---|
| Runway Gen-4.5 | Proprietary | 4K | 10s | No | $12/mo |
| Pika 2.5 | Proprietary | 1080p | 10s | No | $8/mo |
| Kling 3.0 | Proprietary | 4K 60fps | 15s | Yes | $10/mo |
| Hailuo 2.3 | Proprietary | 768p | 10s | No | $9.99/mo |
| Sora 2 | Proprietary | 1080p | 20s | Yes | $20/mo |
| Veo 3.1 | Proprietary | 1080p | 8s | Yes | $7.99/mo |
| Seedance 2.0 | Proprietary | 2K | 10s | Yes | Paused |
| Adobe Firefly | Proprietary | 4K (Topaz) | 5s | No | $9.99/mo |
| Wan 2.2/2.6 | Open Source | 1080p | 15s | Via 2.6 | Free |
| Helios | Open Source | 1080p | 60s | No | Free |
| LTX-2.3 | Open Source | 4K | 20s | Yes | Free |
| CogVideoX | Open Source | 768x1360 | 10s | No | Free |
This research was produced by Creative AI News.
Subscribe for free to get the weekly digest every Tuesday.