10 Sources
[1]
Meta's V-JEPA 2 model teaches AI to understand its surroundings | TechCrunch
Meta on Wednesday unveiled its new V-JEPA 2 AI model, a "world model" that is designed to help AI agents understand the world around them. V-JEPA 2 is an extension of the V-JEPA model that Meta released last year, which was trained on over one million hours of video. This training data is supposed to help robots or other AI agents operate in the physical world, understanding and predicting how concepts like gravity will impact what happens next in a sequence. These are the kinds of common sense connections that small children and animals make as their brains develop -- when you play fetch with a dog, for example, the dog will (hopefully) understand how bouncing a ball on the ground will cause it to rebound upward, or how it should run toward where it thinks the ball will land, and not where the ball is at that precise moment. Meta depicts examples where a robot may be confronted with, for example, the point-of-view of holding a plate and a spatula and walking toward a stove with cooked eggs. The AI can predict that a very likely next action would be to use the spatula to move the eggs to the plate. According to Meta, V-JEPA 2 is 30x faster than Nvidia's Cosmos model, which also tries to enhance intelligence related to the physical world. However, Meta may be evaluating its own models according to different benchmarks than Nvidia. "We believe world models will usher a new era for robotics, enabling real world AI agents to help with chores and physical tasks without needing astronomical amounts of robotic training data," explained Meta's Chief AI Scientist Yann LeCun in a video.
[2]
Meta Says Its New AI Model Understands Physical Rules Like Gravity
Expertise Artificial intelligence, home energy, heating and cooling, home technology. A new generative AI model Meta released this week could change how machines understand the physical world, opening up opportunities for smarter robots and more, the company said. The new open-source model, called Video Joint Embedding Predictive Architecture 2, or V-JEPA 2, is designed to help artificial intelligence understand things like gravity and object permanence, Meta said. "By sharing this work, we aim to give researchers and developers access to the best models and benchmarks to help accelerate research and progress," the company said in a blog post, "ultimately leading to better and more capable AI systems that will help enhance people's lives." Current models that allow AI to interact with the physical world rely on labeled data or video to mimic reality, but this approach emphasizes the logic of the physical world, including how objects move and interact. The model could allow AI to understand concepts like the fact that a ball rolling off of a table will fall. Meta said the model could be useful for devices like autonomous vehicles and robots by ensuring they don't need to be trained on every possible situation. The company called it a step toward AI that can adapt like humans can. One struggle in the space of physical AI has been the need for significant amounts of training data, which takes time, money and resources. At SXSW earlier this year, experts said synthetic data -- training data created by AI -- could help prepare a more traditional learning model for unexpected situations. (In Austin, the example used was the emergence of bats from the city's famed Congress Avenue Bridge.) Meta said its new model simplifies the process and makes it more efficient for real-world applications because it doesn't rely on all of that training data. The next steps for world models include training models that are capable of learning, reasoning and planning across different time and space scales, making them better at breaking down complicated tasks. Multimodal models, that can use other senses like audio and touch in addition to vision, will also help future AI models understand the real world.
[3]
Meta Says Its New AI Model Can Understand the Physical World
Expertise Artificial intelligence, home energy, heating and cooling, home technology. Meta says a new generative AI model it released Wednesday could change how machines understand the physical world, opening up opportunities for smarter robots and more. The new open-source model, called V-JEPA 2 for Video Joint Embedding Predictive Architecture 2, is designed to help AI understand things like gravity and object permanence, Meta said. Current models that allow AI to interact with the physical world rely on labelled data or video to mimic reality, but this approach emphasizes the logic of the physical world, including how objects move and interact. The model could allow AI to understand concepts like the fact that a ball rolling off of a table will fall. Meta said the model could be useful for devices like autonomous vehicles and robots by ensuring they don't need to be trained on every possible situation. The company called it a step toward AI that can adapt like humans can. One struggle in the space of physical AI has been the need for significant amounts of training data, which takes time, money and resources. At SXSW earlier this year, experts said synthetic data -- training data created by AI -- could help prepare a more traditional learning model for unexpected situations. (In Austin, the example used was the emergence of bats from the city's famed Congress Avenue Bridge.) Meta said its new model simplifies the process and makes it more efficient for real-world applications because it doesn't rely on all of that training data.
[4]
Meta launches AI 'world model' to advance robotics, self-driving cars
Mark Zuckerberg, CEO of Meta Platforms. Artificial intelligence has been an integral focus for the tech giant's leader amid competition from players like OpenAI, Microsoft and Google. Meta on Wednesday announced it's rolling out a new AI "world model" that can better understand the 3D environment and movements of physical objects. The tech giant, which owns popular social media apps Facebook and Instagram, said its new open-source AI model V-JEPA 2 can understand, predict and plan in the physical world. Known as a world model, these systems take inspiration from the logic of the physical world to build an internal simulation of reality, allowing AI to learn, plan, and make decisions in a more human-like manner. For example, in the case of Meta's new model, V-JEPA 2 can recognize that a ball rolling off a table will fall, or that an object hidden out of view hasn't just vanished. Artificial intelligence has been a key focus for Meta CEO Mark Zuckerberg as the company faces competition from players like OpenAI, Microsoft and Google. Meta is set to invest $14 billion into artificial intelligence firm Scale AI and hire its CEO Alexandr Wang to bolster its AI strategy, people familiar with the matter tell CNBC.
[5]
Meta's new AI helps robots learn real-world logic from raw video
"V-JEPA 2 represents meaningful progress toward our ultimate goal of developing advanced machine intelligence (AMI)," Meta stated in its official announcement. Unlike traditional AI models that require extensive annotations, V-JEPA 2 extracts patterns from raw video. This allows it to generalize across different contexts and handle new situations with greater ease. Meta has already tested the model on lab-based robots. These machines used V-JEPA 2 to pick up unfamiliar objects, reach for targets, and place items in new locations. This marks a step forward in enabling robots to function in unpredictable environments. The company sees major potential for V-JEPA 2 in autonomous machines like delivery robots and self-driving vehicles. These systems need to quickly interpret physical surroundings in order to avoid obstacles and make real-time decisions. With world models like V-JEPA 2, machines can start anticipating the outcomes of their actions in much the same way humans do. Meta joins other tech leaders in pushing world models forward. Google's DeepMind has been developing its own version, Genie, which can simulate entire 3D environments.
[6]
Meta's new world model lets robots manipulate objects in environments they've never encountered before
Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy. Learn more While large language models (LLMs) have mastered text (and other modalities to some extent), they lack the physical "common sense" to operate in dynamic, real-world environments. This has limited the deployment of AI in areas like manufacturing and logistics, where understanding cause and effect is critical. Meta's latest model, V-JEPA 2, takes a step toward bridging this gap by learning a world model from video and physical interactions. V-JEPA 2 can help create AI applications that require predicting outcomes and planning actions in unpredictable environments with many edge cases. This approach can provide a clear path toward more capable robots and advanced automation in physical environments. How a 'world model' learns to plan Humans develop physical intuition early in life by observing their surroundings. If you see a ball thrown, you instinctively know its trajectory and can predict where it will land. V-JEPA 2 learns a similar "world model," which is an AI system's internal simulation of how the physical world operates. model is built on three core capabilities that are essential for enterprise applications: understanding what is happening in a scene, predicting how the scene will change based on an action, and planning a sequence of actions to achieve a specific goal. As Meta states in its blog, its "long-term vision is that world models will enable AI agents to plan and reason in the physical world." The model's architecture, called the Video Joint Embedding Predictive Architecture (V-JEPA), consists of two key parts. An "encoder" watches a video clip and condenses it into a compact numerical summary, known as an embedding. This embedding captures the essential information about the objects and their relationships in the scene. A second component, the "predictor," then takes this summary and imagines how the scene will evolve, generating a prediction of what the next summary will look like. This architecture is the latest evolution of the JEPA framework, which was first applied to images with I-JEPA and now advances to video, demonstrating a consistent approach to building world models. Unlike generative AI models that try to predict the exact color of every pixel in a future frame -- a computationally intensive task -- V-JEPA 2 operates in an abstract space. It focuses on predicting the high-level features of a scene, such as an object's position and trajectory, rather than its texture or background details, making it far more efficient than other larger models at just 1.2 billion parameters That translates to lower compute costs and makes it more suitable for deployment in real-world settings. Learning from observation and action V-JEPA 2 is trained in two stages. First, it builds its foundational understanding of physics through self-supervised learning, watching over one million hours of unlabeled internet videos. By simply observing how objects move and interact, it develops a general-purpose world model without any human guidance. In the second stage, this pre-trained model is fine-tuned on a small, specialized dataset. By processing just 62 hours of video showing a robot performing tasks, along with the corresponding control commands, V-JEPA 2 learns to connect specific actions to their physical outcomes. This results in a model that can plan and control actions in the real world. This two-stage training enables a critical capability for real-world automation: zero-shot robot planning. A robot powered by V-JEPA 2 can be deployed in a new environment and successfully manipulate objects it has never encountered before, without needing to be retrained for that specific setting. This is a significant advance over previous models that required training data from the exact robot and environment where they would operate. The model was trained on an open-source dataset and then successfully deployed on different robots in Meta's labs. For example, to complete a task like picking up an object, the robot is given a goal image of the desired outcome. It then uses the V-JEPA 2 predictor to internally simulate a range of possible next moves. It scores each imagined action based on how close it gets to the goal, executes the top-rated action, and repeats the process until the task is complete. Using this method, the model achieved success rates between 65% and 80% on pick-and-place tasks with unfamiliar objects in new settings. Real-world impact of physical reasoning This ability to plan and act in novel situations has direct implications for business operations. In logistics and manufacturing, it allows for more adaptable robots that can handle variations in products and warehouse layouts without extensive reprogramming. This can be especially useful as companies are exploring the deployment of humanoid robots in factories and assembly lines. The same world model can power highly realistic digital twins, allowing companies to simulate new processes or train other AIs in a physically accurate virtual environment. In industrial settings, a model could monitor video feeds of machinery and, based on its learned understanding of physics, predict safety issues and failures before they happen. This research is a key step toward what Meta calls "advanced machine intelligence (AMI)," where AI systems can "learn about the world as humans do, plan how to execute unfamiliar tasks, and efficiently adapt to the ever-changing world around us." Meta has released the model and its training code and hopes to "build a broad community around this research, driving progress toward our ultimate goal of developing world models that can transform the way AI interacts with the physical world." What it means for enterprise technical decision-makers V-JEPA 2 moves robotics closer to the software-defined model that cloud teams already recognize: pre-train once, deploy anywhere. Because the model learns general physics from public video and only needs a few dozen hours of task-specific footage, enterprises can slash the data-collection cycle that typically drags down pilot projects. In practical terms, you can prototype a pick-and-place robot on an affordable desktop arm, then roll the same policy onto an industrial rig on the factory floor without gathering thousands of fresh samples or writing custom motion scripts. Lower training overhead also reshapes the cost equation. At 1.2 billion parameters, V-JEPA 2 fits comfortably on a single high-end GPU, and its abstract prediction targets reduce inference load further. That lets teams run closed-loop control on-prem or at the edge, avoiding cloud latency and the compliance headaches that come with streaming video outside the plant. Budget that once went to massive compute clusters can fund extra sensors, redundancy, or faster iteration cycles instead.
[7]
Meta releases J-VEPA 2 AI model that understands the world through video - SiliconANGLE
Meta releases J-VEPA 2 AI model that understands the world through video Meta Platforms Inc.'s AI research division today released a new artificial intelligence model that can improve training and AI understanding of the physical world for robots and AI agents through interpreting video information similar to how humans understand the world. The model, named J-VEPA 2, or Video Joint Embedding Predictive Architecture Model, builds on the company's previous work on J-VEPA, which allows AI agents and robots to "think before they act." "As humans we think that language is very important for intelligence, but in fact that's not the case," said Yann LeCun, vice president and chief AI scientist at Meta. "Humans and animals navigate the world by building mental models of reality. What if AI could develop this kind of common sense, an ability to make predictions of what is going to happen in some kind of abstract representation of space?" It is a state-of-the-art AI world model, trained on video that enables robots and other AI models to understand the physical world and predict how it will respond to their actions. World models allow AI agents and robots to build a concept of the physical world and understand the consequences of actions in order to plan a course of actions to a given task. With a world model, a company or organization does not need to run a million trials with an AI in the real world, because a world model can simulate the world for an AI model -- often within minutes -- for training with an understanding of how the world works. A world model can also be used to understand and predict what will happen after a certain action is taken, allowing a robot or AI attached to a sensor to understand the next event that might happen. Humans do this all the time when planning next steps, such as when walking from place to place when avoiding other people in an unfamiliar place or when playing hockey. An AI model could use this kind of planning to help prevent accidents in the workplace by guiding robots on safe paths with other robots and humans working alongside, reducing potential hazards. V-JEPA 2 helps AI agents understand the physical world and its interactions by understanding patterns of how people interact with objects, how objects move in the physical world and how objects interact with other objects. The company said, when the model was deployed on robots in its labs, it found that robots can use J-VEPA 2 to perform tasks such as reaching, picking up an object and placing an object in a new location with ease. "Of course, world models are essential for autonomous cars and robots," said LeCun. "In fact, we believe world models will usher in a new era for robotics enabling real-world AI agents to help with chores and physical tasks without needing astronomical amounts of robotic training data." In addition to the release of J-VEPA 2, Meta released three new benchmarks for the research community to evaluate existing reasoning models that that use video to understand the world.
[8]
Meta releases V-JEPA 2 to train AI on real-world physics
Meta introduced V-JEPA 2 on Wednesday, a new AI "world model" designed to enhance an AI agent's comprehension of its environment. V-JEPA 2 expands upon the original V-JEPA model released last year. Video: Meta The V-JEPA model was trained using over 1 million hours of video footage. This training aims to assist AI agents, particularly robots, in navigating the physical world by predicting outcomes based on concepts, such as gravity. Meta gives the example of a robot holding a plate and spatula while walking toward a stove with cooked eggs. The AI should predict that the next likely action would be transferring the eggs to the plate using the spatula. Meta reports that V-JEPA 2 operates 30 times faster than Nvidia's Cosmos model, which also aims to improve intelligence related to the physical world. Meta stated that these measurements may have been evaluated using different benchmarks from Nvidia. Video: Meta The company's chief AI scientist, Yann LeCun, stated in a video, "We believe world models will usher a new era for robotics, enabling real-world AI agents to help with chores and physical tasks without needing astronomical amounts of robotic training data."
[9]
Meta Debuts AI to Help Robots 'Understand the Physical World' | PYMNTS.com
The tech giant says these capabilities are key to developing AI agents that think before acting, with V-JEPA 2 marking progress toward the company's goal of creating advanced machine intelligence (AMI). "As humans, we have the ability to predict how the physical world will evolve in response to our actions or the actions of others. For example, you know that if you toss a tennis ball into the air, gravity will pull it back down," the company said. "V-JEPA 2 helps AI agents mimic this intelligence, making them smarter about the physical world. The models we use to develop this kind of intelligence in machines are called world models, and they enable three essential capabilities: understanding, predicting and planning." Meta said it trained V-JEPA 2 using video, which helped it discover important patterns in the physical world, such as how people interact with objects, how objects move in the physical world or interact with other objects. The launch of V-JEPA 2 comes one day after reports that Meta CEO Mark Zuckerberg was personally recruiting experts to help assist in his goal of turning Meta into a leader in the field of artificial general intelligence (AGI), a term for machines that can carry out tasks at the same level as humans. In other AI news, PYMNTS CEO Karen Webster spoke Wednesday with Tejas Manohar, co-founder and co-CEO of Hightouch, about the way the technology can help brands overcome "digital fatigue" when it comes to marketing. "Marketing that actually adds value to the consumer's life versus just sort of, 'Hey, here's my product, here's my product,' that is what resonates," Manohar said. Webster underscored the crux of the debate: "Our inboxes, our text inboxes, our apps are just filled with solicitations ... especially now as brands are really hoping to drive more spend." This collective burnout, she argued, is not a technology issue but a failure of strategic design. Manohar agreed, saying brands continue to follow analog rules: static calendars, batch-and-blast messaging, and audience segments defined by guesswork and not signal. "Our belief is that traditional channels just don't support the [human-feeling] capability that modern digital channels do," Manohar explained.
[10]
Meta introduces new AI model for physical reasoning By Investing.com
Investing.com -- Meta Platforms (NASDAQ:META) has unveiled V-JEPA 2, a new world model that improves AI's ability to understand and predict physical interactions. The company announced Thursday that this state-of-the-art model enables robots and other AI agents to better comprehend the physical world and anticipate how it will respond to actions. These capabilities are crucial for developing AI systems that can "think before they act." V-JEPA 2 builds upon Meta's first video-trained model released last year. The new version enhances understanding and prediction capabilities, allowing robots to interact with unfamiliar objects and environments to complete tasks. The model was trained using video to learn important patterns in the physical world, including human-object interactions, object movement, and object-to-object interactions. When tested on robots in Meta's labs, the model demonstrated abilities to perform tasks such as reaching, picking up objects, and placing objects in new locations. Meta has also released three new benchmarks to help researchers evaluate how well existing models learn and reason about the physical world using video. By sharing these resources, Meta aims to accelerate research progress toward developing more capable AI systems. The company emphasized that physical reasoning is essential for building AI agents that can operate in the physical world and for achieving advanced machine intelligence (AMI).
Share
Copy Link
Meta unveils V-JEPA 2, an advanced AI 'world model' designed to understand physical rules and predict real-world interactions, potentially revolutionizing robotics and autonomous systems.
Meta, the tech giant behind Facebook and Instagram, has announced the release of V-JEPA 2, a groundbreaking AI "world model" designed to revolutionize how machines understand and interact with the physical world 12. This open-source model, an extension of last year's V-JEPA, represents a significant leap forward in artificial intelligence's ability to comprehend and predict real-world phenomena.
Source: CNET
V-JEPA 2, which stands for Video Joint Embedding Predictive Architecture 2, is trained on over one million hours of video data 1. Unlike traditional AI models that rely heavily on labeled data, V-JEPA 2 can extract patterns from raw video, allowing it to generalize across different contexts and handle new situations with greater ease 5.
The model is designed to help AI agents understand fundamental concepts such as gravity and object permanence 2. For instance, V-JEPA 2 can predict that a ball rolling off a table will fall, or that an object hidden from view hasn't simply disappeared 4. This level of understanding is akin to the common-sense connections made by small children and animals as they develop.
Source: VentureBeat
Meta has already begun testing V-JEPA 2 on lab-based robots, demonstrating its ability to assist machines in picking up unfamiliar objects, reaching for targets, and placing items in new locations 5. This advancement opens up exciting possibilities for various fields:
Robotics: V-JEPA 2 could enable robots to adapt to unpredictable environments and perform complex tasks without extensive pre-programming 15.
Autonomous Vehicles: The model's ability to quickly interpret physical surroundings could significantly enhance the decision-making capabilities of self-driving cars 45.
Everyday AI Assistants: By understanding the logic of the physical world, AI could become more adept at helping with household chores and physical tasks 1.
According to Meta, V-JEPA 2 boasts impressive performance metrics:
Source: SiliconANGLE
Meta's Chief AI Scientist, Yann LeCun, believes that world models like V-JEPA 2 will usher in a new era for robotics 1. The company's next steps include developing models capable of learning, reasoning, and planning across different time and space scales, as well as incorporating multimodal inputs such as audio and touch 2.
As the AI landscape continues to evolve rapidly, Meta's V-JEPA 2 represents a significant step towards creating more adaptable and intuitive artificial intelligence systems. By open-sourcing this technology, Meta aims to accelerate research and progress in the field, potentially leading to AI systems that can enhance people's lives in meaningful ways 24.
Summarized by
Navi
[5]
Mira Murati's AI startup Thinking Machines Lab secures a historic $2 billion seed round, reaching a $12 billion valuation. The company plans to unveil its first product soon, focusing on collaborative general intelligence.
9 Sources
Startups
8 hrs ago
9 Sources
Startups
8 hrs ago
Meta's new Superintelligence Lab is considering abandoning its open-source AI model, Behemoth, in favor of developing closed models, marking a significant shift in the company's AI strategy and potentially reshaping the AI landscape.
7 Sources
Technology
16 hrs ago
7 Sources
Technology
16 hrs ago
AMD and Nvidia receive approval to resume sales of specific AI chips to China, marking a significant shift in US trade policy and potentially boosting their revenues.
5 Sources
Business and Economy
16 hrs ago
5 Sources
Business and Economy
16 hrs ago
Leading AI researchers from major tech companies and institutions urge the industry to prioritize studying and preserving Chain-of-Thought (CoT) monitoring capabilities in AI models, viewing it as a crucial but potentially fragile tool for AI safety.
3 Sources
Technology
25 mins ago
3 Sources
Technology
25 mins ago
Google announces major advancements in AI-driven cybersecurity, including the first-ever prevention of a live cyberattack by an AI agent, ahead of Black Hat USA and DEF CON 33 conferences.
2 Sources
Technology
28 mins ago
2 Sources
Technology
28 mins ago