Tencent OmniWeaving: 7 Video Gen Tasks in One Model

Tencent has publicly released OmniWeaving, a unified video generation model built by the HunyuanVideo team that handles seven distinct tasks within a single architecture. The model weights, inference code, and a new benchmark are all available on HuggingFace and GitHub as of April 3, 2026.

For the broader landscape, see our complete guide to AI video generation in 2026.

What Happened

OmniWeaving combines a multimodal large language model (8.3B parameters) with a diffusion transformer (7B parameters) and a visual tokenizer to process interleaved text, image, and video inputs. The system supports text-to-video, image-to-video, key-frame interpolation, reference-driven generation, video editing, multi-image composition, and reasoning-augmented generation.

The reasoning mode is the standout feature. Before generating video, the language model produces intermediate reasoning steps to interpret ambiguous or complex prompts. Tencent calls this thinking mode, and it allows the system to disambiguate user intent before committing to a generation path.

Why It Matters

Most video generation models specialize in one or two tasks. Text-to-video and image-to-video are common, but editing, interpolation, and compositional generation typically require separate tools or pipelines. OmniWeaving collapses all of these into a single model, which simplifies workflows for creators who currently chain multiple tools together.

The reasoning layer adds a capability that few video models offer. Where tools like Google Veo 3.1 and Wan2.7 via ComfyUI excel at single-task generation, OmniWeaving can interpret complex multi-step instructions by reasoning through them first. The team reports state-of-the-art performance among open unified models on their IntelligentVBench benchmark.

Key Details

Architecture: MLLM (8.3B) + MMDiT (7B) + VAE, built on HunyuanVideo-1.5
Tasks: 7 unified capabilities from text-to-video to reasoning-augmented generation
Thinking mode: MLLM generates reasoning steps before video generation begins
Hidden States DeepStacking: Extracts multi-layer features for finer compositional control
Benchmark: IntelligentVBench, a new evaluation suite for unified video generation, released alongside the model
Resources: Project page with demos, arXiv paper, model weights on HuggingFace

What to Do Next

Video creators working with local generation pipelines should evaluate OmniWeaving for multi-task workflows. The model requires multi-GPU inference (the repo recommends 8 GPUs via torchrun), so it is best suited for teams or cloud setups rather than single-GPU workstations. For lighter use cases, pairing a single-task model like Wan2.7 with Netflix VOID for object removal may be more practical. The IntelligentVBench benchmark is worth watching as a new standard for evaluating unified video generation systems.

Tencent OmniWeaving Unifies 7 Video Gen Tasks

What Happened

Why It Matters

Key Details

What to Do Next

Keep reading

Manim-Studio Turns Text Prompts Into Math Animations

Shutterstock Turns Its Stock Library Into an AI Platform

The Best AI Music Generators in 2026: Suno, Udio, ElevenLabs and More

What Happened

Why It Matters

Key Details

What to Do Next

Stay ahead of AI

Keep reading

Manim-Studio Turns Text Prompts Into Math Animations

Shutterstock Turns Its Stock Library Into an AI Platform

The Best AI Music Generators in 2026: Suno, Udio, ElevenLabs and More

Stay ahead of Creative AI