3 Sources
[1]
Netflix's Void AI can remove objects from video and show how scenes evolve without them
Serving tech enthusiasts for over 25 years. TechSpot means tech analysis and advice you can trust. What just happened? Top-tier video editing suites can seamlessly remove objects from scenes, even generating realistic shadows and reflections for the freshly removed elements. However, these tools fall short when the deleted object involves significant interactions, such as collisions. In such cases, existing solutions often struggle to produce plausible results. Netflix is addressing this shortcoming with a new video object removal framework called Void. Short for Video Object and Interaction Deletion, the model can effectively delete an object from a scene and adjust for its absence. For example, erasing a car crash from a scene will also modify the remaining elements accordingly as if the accident never happened. This means that flying debris, fire, and damage to nearby props will be removed as if the crash never occurred. Similarly, in a scene involving someone cannonballing into a pool, removing the person would leave the pool water naturally undisturbed. To train the model, its creators used Kubric and Humoto to generate a new paired dataset of counterfactual object removals. During the inference stage, a vision-language model is used to identify parts of the scene impacted by a removed object which serves as a guide for the diffusion model to fill in the blanks with the counterfactual data. Void sounds like a powerful video editing tool that could afford producers lots of flexibility long after filming has wrapped up. Not having to reshoot a scene would save an immense amount of time and money - assuming of course that the effect is seamless and doesn't look like AI slop. Interested parties can learn more about Void over on GitHub. Its creators - Saman Motamed, William Harvey, Luc Van Gool, Benjamin Klein, Ta-Ying Cheng, and Zhuoning Yuan - have also published a 19-page pre-print (PDF) on the subject. As The Register highlights, the model isn't exclusive to Netflix. The streaming giant has also made it available on Hugging Face, meaning anyone can install and use it. And while Void isn't the first of its kind, it might be the best currently available. In a survey of 25 individuals cited by The Register, Void was reportedly preferred over rivals like ProPainter, Rose, DiffuEraser, and Generative Omnimatte nearly 65 percent of the time.
[2]
Netflix's new AI doesn't create videos -- it rewrites reality (and it's open source)
Netflix challenges Sora with its new open source AI that transforms real footage I've spent a lot of time testing every AI video tool that hits the market, from OpenAI's Sora to the latest Runway updates. Usually, the pitch is the same: "Type a prompt, get a movie." But Netflix just quietly released a research model called VOID, and it's doing something completely different. Instead of building new worlds and scenes from scratch, it rewrites the one you've already filmed -- and it's so good at it, you might never trust a "real" video again. What is Netflix VOID? VOID stands for Video Object and Interaction Deletion. At first glance, it looks like a high-end version of the "Magic Eraser" on your Pixel 8 or Galaxy S24. You select an object, and it disappears. But here's where it gets wild: VOID understands physics and causality. In other words, while most editing tools just "patch" the hole left behind with background textures, VOID actually rewrites the logic of the scene to account for the missing object. Several tests on GitHub highlight what the AI can do: * The Guitar Test: In a research demo, a person holding a guitar is deleted. In any other tool, the guitar would just float or vanish. VOID realizes the guitar is no longer supported, so it generates frames where it falls naturally to the ground. * The Crash Test: Remove one car from a head-on collision, and VOID doesn't leave a ghost-impact of fire and smoke. It "re-imagines" the path of the remaining car as if the accident never happened -- turning a wreck into a peaceful drive down an empty road. Why this is the "end of the reshoot" For a company like Netflix, this underscores a massive cost-saving trick in the movie industry. Think about the infamous "Game of Thrones" Starbucks cup moment. Usually, fixing that requires expensive frame-by-frame digital surgery. With VOID, a producer could simply remove the unwanted object and let the AI realistically simulate what should happen next -- whether that's water splashing, dust settling or nothing at all. It goes beyond small fixes, too. Instead of bringing a 100-person crew back for a reshoot, the AI could correct mistakes after filming wraps. It could even change a story detail by removing a key object and recalculating the scene so everything still looks natural. Can you try it? The most surprising part of this release is that Netflix open-sourced it. You can find the model right now on Hugging Face (under an Apache 2.0 license). However, don't expect to run this on your MacBook Air. VOID is a beast. It requires a GPU with at least 40GB of VRAM (think NVIDIA A100 or H100) to run inference comfortably. Plus, It's built on a 5-billion parameter version of CogVideoX and uses a proprietary "quadmask" system to tell the AI which parts of the physics need to be recalculated. The takeaway The "visual receipt" used to be the ultimate proof. Now it's starting to lose its power. Netflix has introduced a tool that can rewrite real footage so seamlessly it looks completely real. At the same time, AI "slop" is getting more convincing than ever -- flooding the internet with content that feels authentic but isn't. The result looks like a world where seeing something no longer means you can trust it. We've officially entered the era of editable reality. Follow Tom's Guide on Google News and add us as a preferred source to get our up-to-date news, analysis, and reviews in your feeds.
[3]
Netflix's VOID AI removes objects while preserving real-world motion
The system analyzes interactions, then regenerates footage so actions still make sense. Netflix is detailing an AI video tool that goes beyond simple cleanup. Its system, called VOID, cuts elements from footage while keeping everything else behaving in a way that still feels grounded. That marks a shift for AI video editing. Existing tools can erase unwanted elements, but they often leave behind movement that feels off, like objects floating or actions stopping without cause. VOID focuses on what happens after an edit, rebuilding the sequence so the outcome still follows believable cause and effect. Recommended Videos The research shows the model can adjust interactions in response to changes, so if a supporting object is removed, the remaining elements react naturally instead of freezing or glitching. It effectively rewrites the physical logic of a shot to match the new setup. For editors and studios, that points to cleaner fixes in post-production without breaking immersion, especially in shots where multiple elements interact. How VOID rewrites a shot VOID treats edits as chain reactions. It maps out what could be affected once something is taken out, then reconstructs the sequence so the action still tracks logically. The model starts by identifying impacted regions, including where shadows, collisions, or support might change. It then builds a structured map of those shifts and generates a new version of the footage that reflects them. A second refinement pass smooths movement and keeps objects from warping as they follow updated paths. Why physics-aware editing matters What stands out is how VOID handles cause and effect. The model was trained on thousands of simulated sequences, which helps it understand how objects respond when conditions change. In one example, removing part of a domino chain doesn't just erase tiles, it stops the reaction entirely because there's nothing left to carry the motion forward. In another case, removing a person interacting with objects doesn't freeze the shot, the remaining behavior continues as expected. VOID applies learned rules about cause and effect instead of copying patterns from past footage. What to watch next VOID is still a research system, with details shared in an arXiv paper rather than a product release. There's no timeline yet for when this kind of editing will reach consumer tools or professional software. Still, the direction is clear. As AI video workflows expand, tools that understand physical interactions will become more important for high-quality edits, especially in film and TV where small inconsistencies break immersion quickly. The next step is scaling to more complex scenarios. That includes denser setups, more objects, and longer sequences where multiple interactions overlap. If that progress holds, physics-aware editing could push video tools toward full sequence reconstruction that holds up under closer scrutiny.
Share
Copy Link
Netflix released VOID, an open-source AI model that removes objects from video while understanding physics and causality. Unlike traditional editing tools, VOID rewrites scenes to account for missing elements—erasing a car crash removes the debris and fire, deleting a person from a pool dive leaves the water undisturbed. The model could eliminate costly reshoots for studios.
Netflix has released an open-source AI model called VOID that fundamentally changes how objects can be removed from video footage
1
. Short for Video Object and Interaction Deletion, this advanced video editing tool doesn't just erase unwanted elements—it understands physics and causality to rewrite entire scenes as if the deleted object never existed2
. The Void AI represents a significant departure from traditional generative AI video tools like Sora and Runway, which focus on creating new content from text prompts rather than intelligently modifying existing footage.
Source: Tom's Guide
While conventional editing tools can remove objects from video, they struggle when deleted elements involve significant interactions like collisions or support
3
. VOID solves this by treating edits as chain reactions that preserve real-world motion. In demonstrations available on GitHub, the model showcases impressive capabilities: removing a person holding a guitar causes the instrument to fall naturally to the ground, while erasing one car from a head-on collision eliminates the resulting fire, debris, and damage as if the accident never occurred1
. The system analyzes cause and effect relationships, then performs physics-aware sequence reconstruction to maintain believable behavior throughout the edited footage.Source: TechSpot
To achieve this level of sophistication, Netflix trained the model using Kubric and Humoto to generate thousands of paired datasets showing counterfactual object removals
1
. During inference, a vision-language model identifies parts of the scene impacted by the removed object, which then guides a diffusion model to fill gaps with counterfactual data. This approach allows VOID to apply learned rules about physical interactions rather than simply copying patterns from existing footage3
. The model uses a 5-billion parameter version of CogVideoX and employs a proprietary "quadmask" system to determine which aspects of the physics need recalculation2
.Related Stories
Netflix made the open-source AI model available on Hugging Face under an Apache 2.0 license, allowing anyone to access this technology
2
. However, running VOID requires substantial computing power—at least 40GB of VRAM using GPUs like NVIDIA A100 or H100. In a survey of 25 individuals, VOID was preferred over competing tools like ProPainter, Rose, DiffuEraser, and Generative Omnimatte nearly 65 percent of the time1
. For studios, this represents massive cost-saving potential by eliminating expensive reshoots. The infamous "Game of Thrones" Starbucks cup incident, which required frame-by-frame digital surgery, could now be fixed seamlessly in post-production2
.While VOID remains a research system detailed in a 19-page arXiv paper rather than a commercial product, its capabilities raise important questions about video authenticity
3
. The ability to remove objects from video while maintaining perfect physical consistency means visual evidence may no longer serve as reliable proof. As this technology scales to handle more complex scenarios with denser setups and longer sequences, the line between captured reality and edited footage becomes increasingly blurred. Studios should watch for integration into professional workflows, while audiences may need to reconsider how they evaluate video authenticity in what experts are calling the era of editable reality2
.Summarized by
Navi
[3]