Netflix's VOID AI removes objects from video and rewrites reality with physics-aware editing

Reviewed byNidhi Govil

3 Sources

Share

Netflix released VOID, an open-source AI model that removes objects from video while understanding physics and causality. Unlike traditional editing tools, VOID rewrites scenes to account for missing elements—erasing a car crash removes the debris and fire, deleting a person from a pool dive leaves the water undisturbed. The model could eliminate costly reshoots for studios.

Netflix AI Transforms Video Editing With Physics-Aware Object Removal

Netflix has released an open-source AI model called VOID that fundamentally changes how objects can be removed from video footage

1

. Short for Video Object and Interaction Deletion, this advanced video editing tool doesn't just erase unwanted elements—it understands physics and causality to rewrite entire scenes as if the deleted object never existed

2

. The Void AI represents a significant departure from traditional generative AI video tools like Sora and Runway, which focus on creating new content from text prompts rather than intelligently modifying existing footage.

Source: Tom's Guide

Source: Tom's Guide

How Video Object and Interaction Deletion Rewrites Reality

While conventional editing tools can remove objects from video, they struggle when deleted elements involve significant interactions like collisions or support

3

. VOID solves this by treating edits as chain reactions that preserve real-world motion. In demonstrations available on GitHub, the model showcases impressive capabilities: removing a person holding a guitar causes the instrument to fall naturally to the ground, while erasing one car from a head-on collision eliminates the resulting fire, debris, and damage as if the accident never occurred

1

. The system analyzes cause and effect relationships, then performs physics-aware sequence reconstruction to maintain believable behavior throughout the edited footage.

Source: TechSpot

Source: TechSpot

Training Methods Enable Counterfactual Scene Generation

To achieve this level of sophistication, Netflix trained the model using Kubric and Humoto to generate thousands of paired datasets showing counterfactual object removals

1

. During inference, a vision-language model identifies parts of the scene impacted by the removed object, which then guides a diffusion model to fill gaps with counterfactual data. This approach allows VOID to apply learned rules about physical interactions rather than simply copying patterns from existing footage

3

. The model uses a 5-billion parameter version of CogVideoX and employs a proprietary "quadmask" system to determine which aspects of the physics need recalculation

2

.

Open-Source Release Signals Industry Shift in Post-Production

Netflix made the open-source AI model available on Hugging Face under an Apache 2.0 license, allowing anyone to access this technology

2

. However, running VOID requires substantial computing power—at least 40GB of VRAM using GPUs like NVIDIA A100 or H100. In a survey of 25 individuals, VOID was preferred over competing tools like ProPainter, Rose, DiffuEraser, and Generative Omnimatte nearly 65 percent of the time

1

. For studios, this represents massive cost-saving potential by eliminating expensive reshoots. The infamous "Game of Thrones" Starbucks cup incident, which required frame-by-frame digital surgery, could now be fixed seamlessly in post-production

2

.

Implications for Trust and the Era of Editable Reality

While VOID remains a research system detailed in a 19-page arXiv paper rather than a commercial product, its capabilities raise important questions about video authenticity

3

. The ability to remove objects from video while maintaining perfect physical consistency means visual evidence may no longer serve as reliable proof. As this technology scales to handle more complex scenarios with denser setups and longer sequences, the line between captured reality and edited footage becomes increasingly blurred. Studios should watch for integration into professional workflows, while audiences may need to reconsider how they evaluate video authenticity in what experts are calling the era of editable reality

2

.

Today's Top Stories

TheOutpost.ai

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

Instagram logo
LinkedIn logo
Youtube logo
© 2026 TheOutpost.AI All rights reserved