4 Sources
4 Sources
[1]
Google's SIMA 2 agent uses Gemini to reason and act in virtual worlds | TechCrunch
Google DeepMind shared on Thursday a research preview of SIMA 2, the next generation of its generalist AI agent that integrates the language and reasoning powers of Gemini, Google's large language model, to move beyond simply following instructions to understanding and interacting with its environment. Like many of DeepMind's projects, including AlphaFold, the first version of SIMA was trained on hundreds of hours of video game data to learn how to play multiple 3D games like a human, even some games it wasn't trained on. SIMA 1, unveiled in March 2024, could follow basic instructions across a wide range of virtual environments, but it only had a 31% success rate for completing complex tasks, compared to 71% for humans. "SIMA 2 is a step change and improvement in capabilities over SIMA 1," Joe Marino, senior research scientist at DeepMind, said in a press briefing. "It's a more general agent. It can complete complex tasks in previously unseen environments. And it's a self-improving agent. So it can actually self-improve based on its own experience, which is a step towards more general-purpose robots and AGI systems more generally." SIMA 2 is powered by the Gemini 2.5 flash-lite model, and AGI refers to artificial general intelligence, which DeepMind defines as a system capable of a wide range of intellectual tasks with the ability to learn new skills and generalize knowledge across different areas. Working with so-called "embodied agents" is crucial to generalized intelligence, DeepMind's researchers say. Marino explained that an embodied agent interacts with a physical or virtual world via a body - observing inputs and taking actions much like a robot or human would - whereas a non-embodied agent might interact with your calendar, take notes, or execute code. Jane Wang, a research scientist at DeepMind with a background in neuroscience, told TechCrunch that SIMA 2 goes far beyond gameplay. "We're asking it to actually understand what's happening, understand what the user is asking it to do, and then be able to respond in a common-sense way that's actually quite difficult," Wang said. By integrating Gemini, SIMA 2 doubled its predecessor's performance, uniting Gemini's advanced language and reasoning abilities with the embodied skills developed through training. Marino demoed SIMA 2 in No Man's Sky, where the agent described its surroundings - a rocky planet surface - and determined its next steps by recognizing and interacting with a distress beacon. SIMA 2 also uses Gemini to reason internally. In another game, when asked to walk to the house that's the color of a ripe tomato, the agent showed its thinking - ripe tomatoes are red, therefore I should go to the red house - then found and approached it. Being Gemini-powered also means SIMA 2 follows instructions based on emojis: "You instruct it 🪓🌲, and it'll go chop down a tree," Marino said. Marino also demonstrated how SIMA 2 can navigate newly generated photorealistic worlds produced by Genie, DeepMind's world model, correctly identifying and interacting with objects like benches, trees, and butterflies. Gemini also enables self-improvement without much human data, Marino added. Where SIMA 1 was trained entirely on human gameplay, SIMA 2 uses it as a baseline to provide a strong initial model. When the team puts the agent into a new environment, it asks another Gemini model to create new tasks and a separate reward model to score the agent's attempts. Using these self-generated experiences as training data, the agent learns from its own mistakes and gradually performs better, essentially teaching itself new behaviors through trial and error as a human would, guided by AI-based feedback instead of humans. DeepMind sees SIMA 2 as a step toward unlocking more general-purpose robots. "If we think of what a system needs to do to perform tasks in the real world, like a robot, I think there are two components of it," Frederic Besse, senior staff research engineer at DeepMind, said during a press briefing. "First, there is a high-level understanding of the real world and what needs to be done, as well as some reasoning." If you ask a humanoid robot in your house to go check how many cans of beans you have in the cupboard, the system needs to understand all of the different concepts - what beans are, what a cupboard is - and navigate to that location. Besse says SIMA 2 touches more on that high-level behavior than it does on lower-level actions, which he refers to as controlling things like physical joints and wheels. The team declined to share a specific timeline for implementing SIMA 2 in physical robotics systems. Besse told TechCrunch that DeepMind's recently unveiled robotics foundation models - which can also reason about the physical world and create multi-step plans to complete a mission - were trained differently and separately from SIMA. While there's also no timeline for releasing more than a preview of SIMA 2, Wang told TechCrunch the goal is to show the world what DeepMind has been working on and see what kinds of collaborations and potential uses are possible.
[2]
Watch Google DeepMind's new AI agent learn to play video games
Google DeepMind's new AI agent learned how to play a bunch of video games -- including No Man's Sky, Valheim, and Goat Simulator 3 -- to become a viable "interactive gaming companion." The new agent tool, SIMA 2, builds on its earlier iteration, SIMA (Scalable Instructable Multiworld Agent), which DeepMind released in March 2024. It also incorporates Google's Gemini AI for the first time, meaning the agent can go beyond simply following instructions to "understand a user's high-level goal, perform complex reasoning in pursuit, and skillfully execute goal-oriented actions within games," even ones it hasn't seen before, according to a DeepMind blog post. It's currently being released to some academics and developers as a limited research preview. Despite SIMA 2's gaming prowess, creating a consumer-facing gaming helper isn't the broader goal here, members of the DeepMind team told The Verge during a Wednesday briefing. Jane Wang, a senior staff research scientist at DeepMind, called it "a really great training ground" for potentially transferring the skills to real-world environments one day. And, as usual, it all comes back to the ever-intensifying AGI race between Google, Meta, OpenAI, Anthropic, and others. "This is a significant step in the direction of Artificial General Intelligence (AGI), with important implications for the future of robotics and AI-embodiment in general," DeepMind's blog post states. Joe Marino, a research scientist at DeepMind, doubled down on that, saying that SIMA 2's ability to take actions in a virtual world and handle environments it has never seen before is a "fundamental" step toward AGI -- and potentially toward building a general-purpose robot down the line.
[3]
Google DeepMind's New AI Agent Learns, Adapts and Plays Games Like a Human - Decrypt
DeepMind planned a limited research preview for developers and academics. Google DeepMind introduced SIMA 2 on Thursday -- a new AI agent that the company claims behaves like a "companion" inside virtual worlds. With the launch of SIMA 2, DeepMind aims to advance beyond simple on-screen actions and move toward AI that can plan, explain itself, and learn through experience. "This is a significant step in the direction of Artificial General Intelligence (AGI), with important implications for the future of robotics and AI-embodiment in general," the company said on its website. The first version of SIMA (Scalable Instructable Multiworld Agent), released in March 2024, learned hundreds of basic skills by watching the screen and using virtual keyboard and mouse controls. The new version of SIMA, Google said, takes things a step further by letting the AI think for itself. "SIMA 2 is our most capable AI agent for virtual 3D worlds," Google DeepMind wrote on X. "Powered by Gemini, it goes beyond following basic instructions to think, understand, and take actions in interactive environments-meaning you can talk to it through text, voice, or even images." By using the Gemini AI model, Google said SIMA can interpret high-level goals, talk through the steps it intends to take, and collaborate inside games with a level of reasoning the original system could not reach. DeepMind reported stronger generalization across virtual environments, and that SIMA 2 completed longer, more complex tasks, which included logic prompts, sketches drawn on the screen, and emojis. "As a result of this ability, SIMA 2's performance is significantly closer to that of a human player on a wide range of tasks," Google wrote, noting that SIMA 2 had a 65% task completion rate, compared to 31% by SIMA 1. The system also interpreted instructions and acted inside entirely new 3D worlds generated by Genie 3, another DeepMind project released last year that creates interactive environments from a single image or text prompt. SIMA 2 oriented itself, understood goals, and took meaningful actions in worlds it had never encountered until moments before testing. "SIMA 2 is now far better at carrying out detailed instructions, even in worlds it's never seen before," Google wrote. "It can transfer learned concepts like 'mining' in one game and apply it to 'harvesting' in another -- connecting the dots between similar tasks." After learning from human demonstrations, researchers said the agent switched into self-directed play, using trial and error and Gemini-generated feedback to create new experience data, including a training loop where SIMA 2 generated tasks, attempted them, and then fed its own trajectory data back into the next version of the model. While Google hailed SIMA 2 as a step forward for artificial intelligence, the research also identified gaps that still need to be addressed, including struggling with very long, multi-step tasks, working within a limited memory window, and facing visual-interpretation challenges common to 3D AI systems. Even so, DeepMind said the platform served as a testbed for skills that could eventually migrate into robotics and navigation. "Our SIMA 2 research offers a strong path towards applications in robotics and another step towards AGI in the real world," it said.
[4]
Google DeepMind's SIMA 2 agent learns to think and act inside virtual worlds - SiliconANGLE
Google DeepMind's SIMA 2 agent learns to think and act inside virtual worlds Google LLC's artificial intelligence research lab DeepMind has introduced a new, video game-playing agent called SIMA 2 that's able to navigate through 3D virtual worlds it has never encountered before and solve all kinds of problems. It's a key step towards the creation of general-purpose agents that will ultimately power real-world robots, the research outfit said. Announced today, SIMA 2 builds on the release of DeepMind's original video game-playing agent SIMA, which stands for "scalable instructable multiworld agent". SIMA debuted around 18 months ago and displayed an impressive level of autonomy, but it was far from complete, failing to perform many kinds of tasks. However, DeepMind's researchers said SIMA 2 is built on top of Gemini, which is Google's most powerful large language model, and that foundation gives it a massive performance boost. In a blog post, the SIMA Team said SIMA 2 can complete a much wider variety of more complex tasks in virtual worlds, and in many cases it can figure out how to solve challenges without ever having come across them before. It can also chat with users, and it improves its knowledge over time when it tackles more difficult tasks multiple times, learning by trial and error. "This is a significant step in the direction of Artificial General Intelligence (AGI), with important implications for the future of robotics and AI-embodiment in general," the SIMA Team said. The original SIMA learned to perform tasks inside of virtual worlds by watching the screen and using a virtual keyboard and mouse to control video game characters. But SIMA 2 goes further, because Gemini gives it the ability to think for itself, DeepMind said. According to the researchers, Gemini enables SIMA 2 to interpret high-level goals, talk viewers through the steps it intends to take, and collaborate with other agents or humans in games with reasoning skills far beyond the original SIMA. They claim it shows stronger generalization across virtual environments and the ability to complete longer and more complicated tasks, including logic prompts, sketches drawn on the screen and emojis. "SIMA 2's performance is significantly closer to that of a human player on a wide range of tasks," the SIMA Team wrote, highlighting that it achieved a task completion rate of 65%, way ahead of SIMA 1's 31% and just shy of the average human rate of 71%. The model was also able to interpret instructions and act inside virtual worlds that had been freshly generated by another DeepMind model known as Genie 3, which is designed to create interactive environments from images and natural language prompts. When exposed to a new environment, SIMA 2 would immediately orient itself, try to understand its surroundings and its goals, and then immediately take meaningful actions. It does this by applying skills learned in earlier worlds to the new surroundings it finds itself in, the researchers explained. "It can transfer learned concepts like 'mining' from one game, and apply it to 'harvesting' in another game," they said. "[It's] like connecting the dots between similar tasks." SIMA 2 can also learn from human demonstrations before switching to self-directed play, where it employs trial and error and feedback from Gemini to create "experience data". This is then fed back into itself in a kind of training loop, so the model can attempt new tasks, learn what it did wrong and right, and then apply what it has learned when it tries a second time. In other words, it won't make the same mistake twice. DeepMind Senior Staff Research Engineer Frederic Besse told media during a press briefing that the end game for SIMA 2 is to develop a new generation of AI agents that can be deployed inside robots, so they can operate autonomously in real-world environments. The skills it learns in virtual environments, such as navigation, using tools and collaborating with humans, can easily be applied to a setting such as a factory or a warehouse. "If we think of what a system needs to do to perform tasks in the real world, like a robot, I think there are two components of it," Besse said. "First, there is a high-level understanding of the real world and what needs to be done, as well as some reasoning. Then there are lower-level actions, such as controlling things like physical joints and wheels." DeepMind's researchers said SIMA 2 is a massive step forward for AI agents, but they admitted there are still weaknesses in the system that remain to be addressed. For instance, the model still struggles with very long, multistep tasks and working with a limited memory window. It also struggles with some visual interpretation scenarios, they said.
Share
Share
Copy Link
Google DeepMind introduces SIMA 2, an advanced AI agent that integrates Gemini's reasoning capabilities to navigate and interact with virtual worlds. The system doubles its predecessor's performance and represents a significant step toward artificial general intelligence and real-world robotics applications.
Google DeepMind has unveiled SIMA 2, a groundbreaking AI agent that represents a significant leap forward in artificial intelligence capabilities
1
. The system integrates Google's powerful Gemini 2.5 flash-lite model to move beyond simple instruction-following toward genuine understanding and reasoning within virtual environments. This marks a substantial improvement over its predecessor, SIMA 1, which was released in March 2024 with limited capabilities.
Source: Decrypt
The original SIMA achieved only a 31% success rate for completing complex tasks, compared to 71% for humans. SIMA 2 dramatically closes this gap, achieving a 65% task completion rate while demonstrating far more sophisticated reasoning abilities
3
. Joe Marino, senior research scientist at DeepMind, described SIMA 2 as "a step change and improvement in capabilities," emphasizing its role as a more general agent capable of completing complex tasks in previously unseen environments.SIMA 2's integration with Gemini enables unprecedented reasoning capabilities that distinguish it from traditional gaming AI. The system can interpret high-level goals, explain its thought processes, and execute complex multi-step plans
2
. In demonstrations, the agent showcased its ability to understand contextual instructions, such as walking to "the house that's the color of a ripe tomato" by reasoning that ripe tomatoes are red and therefore targeting the red house.
Source: TechCrunch
The system's versatility extends to multiple forms of communication, accepting instructions through text, voice, images, and even emojis. Researchers demonstrated how SIMA 2 responds to emoji commands like 🪓🌲 by chopping down trees, showcasing its ability to interpret symbolic communication
1
. This multimodal interaction capability represents a significant advancement in human-AI collaboration within virtual environments.One of SIMA 2's most remarkable features is its capacity for autonomous learning and self-improvement. Unlike its predecessor, which relied entirely on human gameplay data, SIMA 2 uses human demonstrations as a baseline before transitioning to self-directed learning
4
. The system employs a sophisticated training loop where it generates new tasks, attempts to complete them, and uses AI-based feedback to refine its performance.This self-improvement mechanism allows SIMA 2 to learn from trial and error, much like humans do, but guided by AI-generated feedback rather than human supervision. The agent can transfer learned concepts across different gaming environments, applying skills like "mining" from one game to "harvesting" in another, demonstrating genuine conceptual understanding rather than mere pattern matching
3
.Related Stories
DeepMind positions SIMA 2 as a crucial stepping stone toward Artificial General Intelligence (AGI), with significant implications for robotics and real-world applications
2
. The research team emphasizes that working with "embodied agents" is essential for developing generalized intelligence, as these systems must observe inputs and take actions similar to robots or humans in physical environments.Frederic Besse, senior staff research engineer at DeepMind, explained that real-world robotic systems require both high-level understanding and reasoning capabilities, as well as lower-level motor control functions. SIMA 2 primarily addresses the high-level cognitive components, such as understanding concepts, navigation, and task planning, which are fundamental for autonomous robotic systems
4
.Summarized by
Navi
1
Technology

2
Technology

3
Business and Economy
