Skywork AI released Matrix-Game-3.0, an open-source interactive video generation model that runs at 720p and 40 frames per second in real time. The 5-billion-parameter model uses memory-augmented architecture to maintain visual consistency across minute-long video sequences, making it the first open-source model to reach real-time interactive video generation at this quality level.
For the broader landscape, see our complete guide to AI video generation in 2026.
What Happened
Skywork AI published Matrix-Game-3.0 on HuggingFace along with full source code on GitHub. The model is built on the Wan2.2-TI2V-5B base model and uses a Diffusion Transformer (DiT) architecture enhanced with a memory buffer that stores prediction residuals for self-correction during generation.
A larger 28B Mixture-of-Experts variant (2x14B) is also available for higher-quality output at the cost of speed. Both models support INT8 quantization for deployment on consumer hardware.
Why It Matters
Real-time interactive video generation has been limited to closed-source research demos until now. Matrix-Game-3.0 gives creators and developers an open-source model that can generate video responses to user input at playable framerates. This opens the door to AI-powered game prototyping, interactive storytelling, and live visual effects without licensing fees or API costs.
The memory-augmented approach solves a persistent problem in video generation: maintaining consistency across long sequences. Where most models degrade after a few seconds, Matrix-Game-3.0 uses camera-aware memory and frame re-injection to stay coherent for over a minute of continuous generation.
Key Details
- Resolution: 720p (704x1280) at 40fps real-time generation
- Model size: 5B parameters (base), 28B MoE (quality variant)
- Architecture: Diffusion Transformer with error buffer and Distribution Matching Distillation
- Training data: Combined Unreal Engine synthetic data, automated AAA game footage, and real-world video augmentation
- Input format: Takes image + text prompt + user actions for interactive control
- Frame output: 57 frames initial, then 40 additional frames per iteration
- Optimization: FlashAttention, INT8 quantization, multi-segment autoregressive distillation
What to Do Next
Developers can clone the GitHub repository and run inference with the provided scripts. The model requires a multi-GPU setup for real-time performance, though INT8 quantization brings it within reach of high-end consumer cards. Game developers and VFX artists exploring AI-driven interactive content should evaluate this alongside other recent open-source video models like Helios and motion generation tools like NVIDIA Kimodo.