Netflix Open-Sources VOID for Video Object Removal

Netflix has released VOID (Video Object and Interaction Deletion), its first public AI model. The open-source tool removes objects from video while preserving physically plausible interactions, solving a problem that existing video inpainting methods handle poorly.

What Happened

Netflix published the VOID model on HuggingFace along with an accompanying research paper and full source code on GitHub. The model is built on CogVideoX-Fun-V1.5-5b and fine-tuned for video inpainting with a novel quadmask conditioning system.

VOID processes video at 384x672 resolution and handles up to 197 frames. It uses a two-pass system: Pass 1 runs base inpainting, while the optional Pass 2 adds optical flow-warped noise for better temporal consistency on longer clips.

Why It Matters

Current video object removal tools can erase objects and fix appearance artifacts like shadows and reflections. But when the removed object was physically interacting with other elements, such as holding something or pushing an object, existing models produce implausible results. VOID addresses this directly.

The model uses a quadmask that encodes four distinct regions: the object to remove, overlap zones, affected regions where physics will change (objects that should fall or shift), and the background to keep. A vision-language model identifies these regions automatically during inference.

Key Details

Architecture: 3D Transformer based on CogVideoX-Fun-V1.5-5b-InP (5 billion parameters)
Training data: Paired counterfactual videos from HUMOTO (human-object interactions via Blender physics simulation) and Kubric (object-only interactions)
Infrastructure: Trained on 8x A100 80GB GPUs with DeepSpeed ZeRO Stage 2
Precision: BF16 with FP8 quantization support
License: Open source with weights and code available

An interactive Gradio demo is available on HuggingFace Spaces for testing without local setup.

What to Do Next

Video editors and VFX artists can try the VOID project page demos to evaluate the model against their workflows. The full pipeline requires a GPU with at least 40GB VRAM for inference, though FP8 quantization can reduce memory requirements. Studios already using CogVideoX-based pipelines can integrate VOID directly.

Netflix Open-Sources VOID for Physics-Aware Video Removal

What Happened

Why It Matters

Key Details

What to Do Next

Keep reading

xAI Grok Imagine Adds Quality and Speed Modes

Claude Computer Use Arrives on Windows for Task Automation

DeepSeek V4 Will Run Exclusively on Huawei Chips

What Happened

Why It Matters

Key Details

What to Do Next

Stay ahead of AI

Keep reading

xAI Grok Imagine Adds Quality and Speed Modes

Claude Computer Use Arrives on Windows for Task Automation

DeepSeek V4 Will Run Exclusively on Huawei Chips

Stay ahead of Creative AI