Curated by THEOUTPOST
On Thu, 13 Mar, 12:03 AM UTC
27 Sources
[1]
Watch DeepMind's AI robot slam-dunk a basketball
Artificial-intelligence company Google DeepMind has put a version of its most advanced large language model (LLM), Gemini, into robots. Using the model, machines can perform some tasks -- such as 'slam dunking' a miniature basketball through a desktop hoop -- despite never having watched another robot do the action, says the firm. The company is among several working to harness the artificial intelligence (AI) advances that power chatbots to create general-purpose robots. The approach also comes with safety concerns, given such models' propensity to generate wrong and harmful outputs. The hope is to create machines that are intuitive to operate and can tackle a range of physical tasks, without relying on human supervision or being preprogrammed. By connecting to Gemini's robotic models, a developer could enhance their robot so that it comprehends "natural language and now understands the physical world in a lot more detail than before," says Carolina Parada, who leads the Google DeepMind robotics team and is based in Boulder, Colorado. The model known as Gemini Robotics -- announced on 12 March in a blog post and technical paper -- is "a small but tangible step" towards that goal, says Alexander Khazatsky, an AI researcher and co-founder of CollectedAI in Berkeley, California, which is focused on creating data sets to develop AI-powered robots. A team at Google DeepMind, which is headquartered in London, started with Gemini 2.0, the firm's most advanced vision and language model, trained by analysing patterns in huge volumes of data. They created a specialized version of the model designed to excel at reasoning tasks involving 3D physical and spatial understanding -- for example, predicting an object's trajectory or identifying the same part of an object in images taken from different angles. Finally, they further trained the model on data from thousands of hours of real, remote-operated robot demonstrations. This allowed the robotic 'brain' to implement real actions, much in the way LLMs use their learned associations to generate the next word in a sentence. The team tested Gemini Robotics on humanoid robots and robotic arms, on tasks that came up in training and on unfamiliar activities. According to the team, robots using the model consistently outperformed state-of-the-art rivals when tested on new tasks and familiar ones in which details had been changed. Robot hands achieved a success rate of more than 70% on learning to do fiddly tasks -- such as performing an origami fold or zipping up a bag -- after seeing fewer than 100 demonstrations. Machines running comparison models almost always failed on those activities. The Google team has done a "great job" with its model's ability to bake knowledge and common sense into the robot brain, says Khazatsky. But a leap in ability will come from learning from robotic data gathered from the "the messy, chaotic real world", rather than demonstrated in a laboratory, he adds. Ensuring safety will be a major challenge when applying such models in machines. "Initially, you can expect robots to maintain safe distance from humans," said Vikas Sindhwani, a robotics and AI researcher at Google DeepMind in New York, speaking at a press briefing on the launch. "Gradually, we look to enable more and more interactive and collaborative tasks."
[2]
Google's new robot AI can fold delicate origami, close zipper bags without damage
On Wednesday, Google DeepMind announced two new AI models designed to control robots: Gemini Robotics and Gemini Robotics-ER. The company claims these models will help robots of many shapes and sizes understand and interact with the physical world more effectively and delicately than previous systems, paving the way for applications such as humanoid robot assistants. It's worth noting that even though hardware for robot platforms appears to be advancing at a steady pace (well, maybe not always), creating a capable AI model that can pilot these robots autonomously through novel scenarios with safety and precision has proven elusive. What the industry calls "embodied AI" is a moonshot goal of Nvidia, for example, and it remains a holy grail that could potentially turn robotics into general-use laborers in the physical world. Along those lines, Google's new models build upon its Gemini 2.0 large language model foundation, adding capabilities specifically for robotic applications. Gemini Robotics includes what Google calls "vision-language-action" (VLA) abilities, allowing it to process visual information, understand language commands, and generate physical movements. By contrast, Gemini Robotics-ER focuses on "embodied reasoning" with enhanced spatial understanding, letting roboticists connect it to their existing robot control systems. For example, with Gemini Robotics, you can ask a robot to "pick up the banana and put it in the basket," and it will use a camera view of the scene to recognize the banana, guiding a robotic arm to perform the action successfully. Or you might say, "fold an origami fox," and it will use its knowledge of origami and how to fold paper carefully to perform the task. In 2023, we covered Google's RT-2, which represented a notable step toward more generalized robotic capabilities by using Internet data to help robots understand language commands and adapt to new scenarios, then doubling performance on unseen tasks compared to its predecessor. Two years later, Gemini Robotics appears to have made another substantial leap forward, not just in understanding what to do but in executing complex physical manipulations that RT-2 explicitly couldn't handle. While RT-2 was limited to repurposing physical movements it had already practiced, Gemini Robotics reportedly demonstrates significantly enhanced dexterity that enables previously impossible tasks like origami folding and packing snacks into Zip-loc bags. This shift from robots that just understand commands to robots that can perform delicate physical tasks suggests DeepMind may have started solving one of robotics' biggest challenges: getting robots to turn their "knowledge" into careful, precise movements in the real world. According to DeepMind, the new Gemini Robotics system demonstrates much stronger generalization, or the ability to perform novel tasks that it was not specifically trained to do, compared to its previous AI models. In its announcement, the company claims Gemini Robotics "more than doubles performance on a comprehensive generalization benchmark compared to other state-of-the-art vision-language-action models." Generalization matters because robots that can adapt to new scenarios without specific training for each situation could one day work in unpredictable real-world environments. That's important because skepticism remains regarding how useful humanoid robots currently may be or how capable they really are. Tesla unveiled its Optimus Gen 3 robot last October, claiming the ability to complete many physical tasks, yet concerns persist over the authenticity of its autonomous AI capabilities after the company admitted that several robots in its splashy demo were controlled remotely by humans. Here, Google is attempting to make the real thing: a generalist robot brain. With that goal in mind, the company announced a partnership with Austin, Texas-based Apptronik to"build the next generation of humanoid robots with Gemini 2.0." While trained primarily on a bimanual robot platform called ALOHA 2, Google states that Gemini Robotics can control different robot types, from research-oriented Franka robotic arms to more complex humanoid systems like Apptronik's Apollo robot. While the humanoid robot approach is a relatively new application for Google's generative AI models (from this cycle of technology based on LLMs), it's worth noting that Google had previously acquired several robotics companies around 2013-2014 (including Boston Dynamics, which makes humanoid robots), but later sold them off. The new partnership with Apptronik appears to be a fresh approach to humanoid robotics rather than a direct continuation of those earlier efforts. Other companies have been hard at work on humanoid robotics hardware, such as Figure AI (which secured significant funding for its humanoid robots in March 2024) and the aforementioned former Alphabet subsidiary Boston Dynamics (which introduced a flexible new Atlas robot last April), but a useful AI "driver" to make the robots truly useful has not yet emerged. On that front, Google has also granted limited access to the Gemini Robotics-ER through a "trusted tester" program to companies like Boston Dynamics, Agility Robotics, and Enchanted Tools. For safety considerations, Google mentions a "layered, holistic approach" that maintains traditional robot safety measures like collision avoidance and force limitations. The company describes developing a "Robot Constitution" framework inspired by Isaac Asimov's Three Laws of Robotics and releasing a dataset unsurprisingly called "ASIMOV" to help researchers evaluate safety implications of robotic actions. This new ASIMOV dataset represents Google's attempt to create standardized ways to assess robot safety beyond physical harm prevention. The dataset appears designed to help researchers test how well AI models understand the potential consequences of actions a robot might take in various scenarios. According to Google's announcement, the dataset will "help researchers to rigorously measure the safety implications of robotic actions in real-world scenarios." The company did not announce availability timelines or specific commercial applications for the new AI models, which remain in a research phase. While the demo videos Google shared depict advancements in AI-driven capabilities, the controlled research environments still leave open questions about how these systems would actually perform in unpredictable real-world settings.
[3]
Are Google's New Gemini Robotics Models the Key to Smarter, More Adaptable Robots?
Generative AI models are getting closer to taking action in the real world. Already, the big AI companies are introducing AI agents that can take care of web-based busywork for you, ordering your groceries or making your dinner reservation. Today, Google DeepMind announcedtwo generative AI models designed to power tomorrow's robots. The models are both built on Google Gemini, a multimodal foundation model that can process text, voice, and image data to answer questions, give advice, and generally help out. DeepMind calls the first of the new models, Gemini Robotics, an "advanced vision-language-action model," meaning that it can take all those same inputs and then output instructions for a robot's physical actions. The models are designed to work with any hardware system, but were mostly tested on the two-armed Aloha 2 system that DeepMind introduced last year. In a demonstration video, a voice says: "Pick up the basketball and slam dunk it" (at 2:27 in the video below). Then a robot arm carefully picks up a miniature basketball and drops it into a miniature net -- and while it wasn't a NBA-level dunk, it was enough to get the DeepMind researchers excited. "This basketball example is one of my favorites," said Kanishka Rao, the principal software engineer for the project, in a press briefing. He explains that the robot had "never, ever seen anything related to basketball," but that its underlying foundation model had a general understanding of the game, knew what a basketball net looks like, and understood what the term "slam dunk" meant. The robot was therefore "able to connect those [concepts] to actually accomplish the task in the physical world," says Rao. Carolina Parada, head of robotics at Google DeepMind, said in the briefing that the new models improve over the company's prior robots in three dimensions: generalization, adaptability, and dexterity. All of these advances are necessary, she said, to create "a new generation of helpful robots." Generalization means that a robot can apply a concept that it has learned in one context to another situation, and the researchers looked at visual generalization (for example, does it get confused if the color of an object or background changed), instruction generalization (can it interpret commands that are worded in different ways), and action generalization (can it perform an action it had never done before). Parada also says that robots powered by Gemini can better adapt to changing instructions and circumstances. To demonstrate that point in a video, a researcher told a robot arm to put a bunch of plastic grapes into the clear Tupperware container, then proceeded to shift three containers around on the table in an approximation of a shyster's shell game. The robot arm dutifully followed the clear container around until it could fulfill its directive. As for dexterity, demo videos showed the robotic arms folding a piece of paper into an origami fox and performing other delicate tasks. However, it's important to note that the impressive performance here is in the context of a narrow set of high-quality data that the robot was trained on for these specific tasks, so the level of dexterity that these tasks represent is not being generalized. The second model introduced today is Gemini Robotics-ER, with the ER standing for "embodied reasoning," which is the sort of intuitive physical world understanding that humans develop with experience over time. We're able to do clever things like look at an object we've never seen before and make an educated guess about the best way to interact with it, and this is what DeepMind seeks to emulate with Gemini Robotics-ER. Parada gave an example of Gemini Robotics-ER's ability to identify an appropriate grasping point for picking up a coffee cup. The model correctly identifies the handle, because that's where humans tend to grasp coffee mugs. However, this illustrates a potential weakness of relying on human-centric training data: for a robot, especially a robot that might be able to comfortably handle a mug of hot coffee, a thin handle might be a much less reliable grasping point than a more enveloping grasp of the mug itself. Vikas Sindhwani, DeepMind's head of robotic safety for the project, says the team took a layered approach to safety. It starts with classic physical safety controls that manage things like collision avoidance and stability, but also includes "semantic safety" systems that evaluate both its instructions and the consequences of following them. These systems are most sophisticated in the Gemini Robotics-ER model, says Sindhwani, which is "trained to evaluate whether or not a potential action is safe to perform in a given scenario." And because "safety is not a competitive endeavor," Sindhwani says, DeepMind is releasing a new data set and what it calls the Asimov benchmark, which is intended to measure a model's ability to understand common-sense rules of life. The benchmark contains both questions about visual scenes and text scenarios, asking models' opinions on things like the desirability of mixing bleach and vinegar (a combination that make chlorine gas) and putting a soft toy on a hot stove. In the press briefing, Sindhwani said that the Gemini models had "strong performance" on that benchmark, and the technical report showed that the models got more than 80 percent of questions correct.
[4]
Google DeepMind unveils new AI models for controlling robots | TechCrunch
Google DeepMind, Google's AI research lab, on Wednesday announced new AI models called Gemini Robotics designed to enable real-world machines to interact with objects, navigate environments, and more. DeepMind published a series of demo videos showing robots equipped with Gemini Robotics folding paper, putting a pair of glasses into a case, and other tasks in response to voice commands. According to the lab, Gemini Robotics was trained to generalize behavior across a range of different robotics hardware, and to connect items robots can "see" with actions they might take. DeepMind claims that in tests, Gemini Robotics allowed robots to perform well in environments not included in the training data. The lab has released a slimmed-down model, Gemini Robotics-ER, that researchers can use to train their own models for robotics control, as well as a benchmark called Asimov for gauging risks with AI-powered robots.
[5]
Gemini Robotics uses Google's top language model to make robots more useful
"What's beautiful about these videos is that the missing piece between cognition, large language models, and making decisions is that intermediate level," says Liphardt. "The missing piece has been connecting a command like 'Pick up the red pencil' and getting the arm to faithfully implement that. Looking at this, we'll immediately start using it when it comes out." Although the robot wasn't perfect at following instructions, and the videos show it is quite slow and a little janky, the ability to adapt on the fly -- and understand natural-language commands -- is really impressive and reflects a big step up from where robotics has been for years. "An underappreciated implication of the advances in large language models is that all of them speak robotics fluently," says Liphardt. "This [research] is part of a growing wave of excitement of robots quickly becoming more interactive, smarter, and having an easier time learning." Whereas large language models are trained mostly on text, images, and video from the internet, finding enough training data has been a consistent challenge for robotics. Simulations can help by creating synthetic data, but that training method can suffer from the "sim-to-real gap," when a robot learns something from a simulation that doesn't map accurately to the real world. For example, a simulated environment may not account well for the friction of a material on a floor, causing the robot to slip when it tries to walk in the real world. Google DeepMind trained the robot on both simulated and real-world data. Some came from deploying the robot in simulated environments where it was able to learn about physics and obstacles, like the knowledge it can't walk through a wall. Other data came from teleoperation, where a human uses a remote-control device to guide a robot through actions in the real world. DeepMind is exploring other ways to get more data, like analyzing videos that the model can train on. The team also tested the robots on a new benchmark -- a list of scenarios from what DeepMind calls the ASIMOV data set, in which a robot must determine whether an action is safe or unsafe. The data set includes questions like "Is it safe to mix bleach with vinegar or to serve peanuts to someone with an allergy to them?" The data set is named after Isaac Asimov, the author of the science fiction classic I, Robot, which details the three laws of robotics. These essentially tell robots not to harm humans and also to listen to them. "On this benchmark, we found that Gemini 2.0 Flash and Gemini Robotics models have strong performance in recognizing situations where physical injuries or other kinds of unsafe events may happen," said Vikas Sindhwani, a research scientist at Google DeepMind, in the press call. DeepMind also developed a constitutional AI mechanism for the model, based on a generalization of Asimov's laws. Essentially, Google DeepMind is providing a set of rules to the AI. The model is fine-tuned to abide by the principles. It generates responses and then critiques itself on the basis of the rules. The model then uses its own feedback to revise its responses and trains on these revised responses. Ideally, this leads to a harmless robot that can work safely alongside humans.
[6]
Google's Gemini Robotics AI Model Reaches Into the Physical World
In sci-fi tales, artificial intelligence often powers all sorts of clever, capable, and occasionally homicidal robots. A revealing limitation of today's best AI is that, for now, it remains squarely trapped inside the chat window. Google DeepMind signaled a plan to change that today -- presumably minus the homicidal part -- by announcing a new version of its AI model Gemini that fuses language, vision, and physical action together to power a range of more capable, adaptive, and potentially useful robots. In a series of demonstration videos, the company showed several robots equipped with the new model, called Gemini Robotics, manipulating items in response to spoken commands: Robot arms fold paper, hand over vegetables, gently put a pair of glasses into a case, and complete other tasks. The robots rely on the new model to connect items that are visible with possible actions in order to do what they're told. The model is trained in a way that allows behavior to be generalized across very different hardware. Google DeepMind also announced a version of its model called Gemini Robotics-ER (for embodied reasoning), which has just visual and spatial understanding. The idea is for other robot researchers to use this model to train their own models for controlling robots' actions. In a video demonstration, Google DeepMind's researchers used the model to control a humanoid robot called Apollo, from the startup Apptronik. The robot converses with a human and moves letters around a tabletop when instructed to. "We've been able to bring the world-understanding -- the general-concept understanding -- of Gemini 2.0 to robotics," said Kanishka Rao, a robotics researcher at Google DeepMind who led the work, at a briefing ahead of today's announcement. Google DeepMind says the new model is able to control different robots successfully in hundreds of specific scenarios not previously included in their training. "Once the robot model has general-concept understanding, it becomes much more general and useful," Rao said.
[7]
Google Debuts AI Model for Robotics, Challenging Meta, OpenAI
Alphabet Inc.'s artificial intelligence lab is debuting two new models focused on robotics, which will help developers train robots to respond to unfamiliar scenarios -- a longstanding challenge in the field. Research unit Google DeepMind will release Gemini Robotics, a new branch of its flagship AI model aimed at developing robots that are more dexterous and interactive, it said Tuesday. Another model, Gemini Robotics-ER, specializes in spatial understanding, and will help robot makers build new programs using Gemini's reasoning capabilities.
[8]
Google introduces new AI models for rapidly growing robotics industry
March 12 (Reuters) - Alphabet's (GOOGL.O), opens new tab Google launched two new AI models tailored for robotics applications on Wednesday based on its Gemini 2.0 model, as it looks to cater to the rapidly growing robotics industry. The robotics field has made large strides over the past few years with increasing advancements in AI and improving models, speeding up commercialization of robots largely in industrial settings, according to industry experts. Google's launch comes a month after robotics startup Figure AI exited its collaboration agreement with ChatGPT-maker OpenAI after it made an internal breakthrough in AI for robots. The search engine giant's Gemini Robotics is an advanced vision-language-action model that will have physical actions as a way to provide output. The second model, named Gemini Robotics-ER, will enable a robot to have an advanced understanding of the space around it and lets developers run their own programs using reasoning abilities offered by Gemini 2.0. Google said its models are designed for robots of all form factors including humanoids and other types used in factories and warehouses. Using robotics-focused AI models developed by the likes of Google and OpenAI can help cash-strapped startups reduce development costs and increase the speed at which they can take their product to market. Google said it tested the Gemini Robotics model on data from its bi-arm robotics platform, ALOHA 2, but can be specialized for complex use cases such as Apptronik's Apollo robot. Apptronik raised $350 million in a funding round last month led by B Capital and Capital Factory, with participation from Google to scale production of AI-powered humanoid robots. Google had bought robotics pioneer Boston Dynamics in 2013, and sold the company, known for its dog-like and humanoid robots, to SoftBank Group Corp about four years later. Reporting by Akash Sriram in Bengaluru and Kenrick Cai in San Francisco; Editing by Krishna Chandra Eluri Our Standards: The Thomson Reuters Trust Principles., opens new tab Suggested Topics:Disrupted
[9]
Google DeepMind unveils new AI models in race to make robots useful
Google DeepMind has unveiled artificial intelligence models for robotics that it hailed as a milestone in the long quest to make the general-purpose machines more useful and practical in the everyday world. The company's new robotics models, called Gemini Robotics and Gemini Robotics-ER, are designed to help robots adapt to complex environments by taking advantage of the reasoning capabilities of large language models to complete complicated real world tasks. According to Google DeepMind, a robot trained using its new models was able to fold an origami fox, organise a desk according to verbal instructions, wrap headphone wires and slam dunk a miniature basketball through a hoop. The company is also partnering with start-up Apptronik to build humanoid robots using this technology. The development comes as tech groups, including Tesla and OpenAI, and start-ups are racing to build the AI "brain" that can autonomously operate robotics in moves that could transform a range of industries, from manufacturing to healthcare. Jensen Huang, chief executive of chipmaker Nvidia, said this year that the use of generative AI to deploy robots at scale represents a multitrillion-dollar opportunity that will "pave the way to "the largest technology industry the world has ever seen". Progress in advanced robotics has been painstakingly slow in recent decades, with scientists manually coding each move a robot makes. Thanks to new AI techniques, scientists have been able to train robots to adapt better to their surroundings and learn new skills much faster. "Gemini Robotics is twice as general as our previous best models, really making a significant leap towards general purpose robots," said Kanishka Rao, principal software engineer at Google DeepMind. To create the Gemini Robotics model, Google used its Gemini 2.0 language model and trained it specifically to control robots. This gave robots a boost in performance and allowed them to do three things: adjust to different new situations, respond quickly to verbal instructions or changes in their environment, and be dexterous enough to manipulate objects. Such adaptability would be a boon for those developing the technology, as one big obstacle for robotics is that they perform well in laboratories, but poorly in less tightly controlled settings. To develop Gemini Robotics, Google DeepMind took advantage of the broad understanding of the world exhibited by large language models that are trained on data from the internet. For example, a robot was able to reason that it should grab a coffee cup using two fingers. "This is certainly an exciting development in the field of robotics that seems to build on Google's strengths in very large-scale data and computation," said Ken Goldberg, a robotics professor at the University of California, Berkeley, who was not part of the research. He added that one of the most novel aspects of these new robotics models is they run smoothly in the cloud, presumably because they could take advantage of Google's access to very large language models that require substantial computer power. "This is an impressively comprehensive effort with convincing results ranging from spatial reasoning to dexterous manipulation. It's pretty compelling evidence that stronger base [vision-language] models can lead to better manipulation performance," said Russ Tedrake, a proÂfessor at the MasÂsachuÂsetts InstiÂtute of TechÂnoÂlogy and the vice-presÂidÂent of robotÂics research at the Toyota Research InstiÂtute. "Gemini is an important step," said Goldberg. However, "much remains to be done before general-purpose robots are ready for adoption".
[10]
DeepMind's latest AI model can help robots fold origami and close Ziploc bags
Gemini Robotics will "lay the foundation for a new generation of helpful robots." Since its debut at the end of last year, Gemini 2.0 has gone on to power a handful of Google products, including a new AI Mode chatbot. Now Google DeepMind is using that same technology for something altogether more interesting. On Wednesday, the AI lab announced two new Gemini-based models it says will "lay the foundation for a new generation of helpful robots." The first, Gemini Robotics, was designed by Deepmind to facilitate direct control of robots. According to the company, AI systems for robots need to excel at three qualities: generality, interactivity and dexterity. The first involves a robot's flexibility to adapt to novel situations, including ones not covered by its training. Interactivity, meanwhile, encapsulates a robot's ability to respond to people and the environment. Finally, there's dexterity, which is mostly self-explanatory: a lot of tasks humans can complete without a second thought involve fine motor skills that are difficult for robots to master. "While our previous work demonstrated progress in these areas, Gemini Robotics represents a substantial step in performance on all three axes, getting us closer to truly general purpose robots," says DeepMind. For instance, with Gemini Robotics powering it, DeepMind's ALOHA 2 robot is able to fold origami and close a Ziploc bag. The two-armed robot also understands all the instructions given to it in natural, everyday language. As you can see from the video Google shared, it can even complete tasks despite encountering roadblocks, such as when the researcher moves around the Tupperware he just asked the robot to place the fruit inside of. Google is partnering with Apptronik, the company behind the Apollo bipedal robot, to build the next generation of humanoid robots. At the same time, DeepMind is releasing Gemini Robotics-ER (or embodied reasoning). Of the second model, the company says it will enable roboticists to run their own programs using Gemini's advanced reasoning abilities. DeepMind is giving "trusted testers," including one-time Google subsidiary Boston Dynamics, access to the system.
[11]
Google's DeepMind says it will use AI models to power physical robots
Google is bringing its DeepMind artificial intelligence technology models into the physical world to power robots. The company on Wednesday debuted two new AI models, Gemini Robotics and Gemini Robotics-ER (extended reasoning). They both run on Gemini 2.0, which Google calls its "most capable" AI to date. Gemini Robotics goes beyond outputs like text and images, where generative AI has thrived to date, and into physical action commands to control robots. Google said in a blog post that it will partner with Apptronik, a Texas-based robotics developer, to "build the next generation of humanoid robots with Gemini 2.0." Apptronik has work with Nvidia and NASA in the past. The company said last month that Google joined in its $350 million funding round. In demonstration videos, Google showed Apptronik robots, equipped with the new AI models, plugging something into a power strip, filling up a lunchbox, moving plastic vegetables and zipping up a bag, in response to spoken commands. The company didn't provide a timeline for when the technology will hit the market. "To be useful and helpful to people, AI models for robotics need three principal qualities," Google wrote in the post. "They have to be general, meaning they're able to adapt to different situations; they have to be interactive, meaning they can understand and respond quickly to instructions or changes in their environment; and they have to be dexterous, meaning they can do the kinds of things people generally can do with their hands and fingers, like carefully manipulate objects." Gemini Robotics-ER is designed specifically for roboticists to use as a foundation to train their own models. It's available to Apptronik as well as "trusted testers" including Agile Robots, Agility Robots, Boston Dynamics and Enchanted Tools. Google is far from alone in its pursuit of AI for robotics. In November, OpenAI invested in Physical Intelligence, a startup that focuses on "bringing general-purpose AI into the physical world" by developing large-scale AI models and algorithms to power robots, according to its website. The same of that investment announcement, OpenAI hired the former head of Meta's Orion augmented reality glasses initiative to lead the startup's robotics and consumer hardware efforts. Tesla has also moved into the fast-evolving humanoid robotics industry with the Optimus robot. Google CEO Sundar Pichai wrote in a post on X on Wednesday that the company sees "robotics as a helpful testing ground for translating AI advances in the physical world." Pichai said the robots will use Google's multimodal AI models to "make changes on the fly + adapt to their surroundings."
[12]
Robots leverage Google's Gemini AI to fold origami from simple instructions
The big picture: While companies continue to improve robotic hardware, developing AI software to truly bring these machines to life has remained an elusive goal. This is especially disappointing given the remarkable advancements in "smart" language models. Now, Google's AI research lab has come closer than ever to bridging this gap. DeepMind has unveiled Gemini Robotics, an evolution of their powerful Gemini 2.0 language model that could unlock new capabilities for robots. The goal of Gemini Robotics is to create a generalized AI system capable of directly controlling robots and helping them master the trifecta of flexibility, interaction, and dexterity. The result could be robots that adapt to novel situations, respond naturally to humans and their environment, and perform complex physical tasks. And they're making steady progress. Just check out this video of ALOHA 2, a dual-armed robot from DeepMind, showcasing its skills. Not only can it precisely fold an origami figure, but it can also improvise when things don't go as planned - like when the researcher moved the container it was supposed to place fruit in. The best part is that it achieves this with simple instructions like "fold an origami fox." The researchers didn't have to manually program that ability - the robot simply leveraged its understanding of origami and how to fold paper to complete the task. Of course, origami is just the beginning. DeepMind claims that Gemini Robotics represents a significant leap in all three key robotic abilities compared to their previous work. The AI model more than doubled its performance on general task benchmarks compared to other state-of-the-art systems. What does this mean? Gemini Robotics could usher in a new generation of robots capable of generalizing and adapting to unpredictable real-world situations without needing tailored training for every scenario. This versatility is essential for developing truly useful, general-purpose robots in the future. To realize this potential, Google is also collaborating with a company called Apptronik. Apptronik will handle the hardware by building next-gen humanoid robots powered by Gemini. Don't expect to hire a Gemini Robot butler anytime soon, though. For now, DeepMind is keeping the project in research mode, releasing a "Gemini Robotics-ER" system that will allow "trusted testers" like Boston Dynamics to access the AI's reasoning capabilities for their own projects. The "ER" stands for embodied reasoning. Trusted testers could include companies like Boston Dynamics, Agility Robotics, and Enchanted Tools. Of course, real-world robots powered by advanced AI raise important safety concerns. DeepMind says it takes a "holistic" approach inspired by Asimov's laws of robotics and is developing evaluation standards through a new "ASIMOV" dataset. The goal is to test whether AI models understand the broader consequences of robotic actions, beyond just physical harm.
[13]
Google Gemini Is Coming to a Robot Near You
DeepMind has announced Gemini Robotics, a model specifically designed for real-life robots with articulate hands. I have barely figured out how to get Gemini going on my Android device, and already, Google has announced it's putting Gemini 2.0 into real-life robots. The company announced two new AI models that "lay the foundation for a new generation of helpful robots," as it writes in a blog. In the demonstrations, the robots look like people! Gemini Robotics is an advanced vision-language-action (VLA) model built on Gemini 2.0 -- the same one I've been feeding PDFs to and asking for help with horoscopes. This version of Gemini 2.0 features the addition of physical actions as the output response to a query. On the Pixel phone, for example, Gemini's "response" would be to perform an action or answer a question. Gemini in a robot would instead see that command as something it should physically respond to. The second AI model is Gemini Robots-ER, a vision-language (VLM) model with "advanced spatial understanding." This is where Gemini gets its "embodied reasoning," which helps the artificial intelligence navigate its environment even as it changes in real-time. In an example video Google showed in a closed session with journalists, the robot could discern between bowls of varying finishes and colors on a table. It could also differentiate between fake fruits, like grapes and a banana, and then distribute each into one of the specific bowls. In another example, Google showed a robot understanding the nuance of granola in a Tupperware container that needed to be packed in the lunch bag. At the core of this announcement is Google lauding DeepMind's efforts in making Gemini the kind of "brain" it can drop into the robotic space. But it is wild to think that the AI branding for the smartphone in your hand will, in some capacity, be powering up a humanoid robot. "We look forward to exploring our models' capabilities and continuing to develop them on the path to real-world applications," writes Carolina Parada, Senior Director and head of robots at Google's DeepMind. Google is partnering with companies like Apptronik to "build the next generation of humanoid robots." The Gemini Robots-ER model will also become available to partners for testing, including Agile Robots, Agility Robots, Boston Dynamics, and Enchanted Tools. The robots are coming, but there's no timeline. You can temper your reaction for now. Google is also preparing itself for the onslaught of questions it will inevitably receive regarding Gemini safeguards. I even asked what protections are in place so that the robot doesn't go awry and cause physical pain to a human. "We enable Gemini Robotics-ER models to understand whether or not a potential action is safe to perform in a given context," Google explains, by basing it on frameworks like the ASIMOV dataset, which has helped "researchers to rigorously measure the safety implications of robotic actions in real-world scenarios." Google says it's also collaborating with experts in the field to "ensure we develop AI applications responsibly."
[14]
Google gives Gemini a body with new AI robotics project
Summary Google introduces Gemini Robotics and Gemini Robotics-ER for physical AI outputs, based on Gemini 2.0. Project Astra enhances AI assistants' ability to recognize and respond to physical objects in real time. CEO Sundar Pichai highlights both Gemini Robotics models' potential to help robots respond and adapt to their surroundings. When we talk about the future of Google Gemini and other artificial intelligence chatbots, we usually aren't talking about AI interacting with the physical world. Sure, there are plenty of things that Gemini does that we have overlooked in the past, but until today, we could semi-confidently say that the benefits of Gemini's capabilities were felt specifically in the digital realm. Project Astra, Google's vision for the world of AI digital assistants, is something that we first said was the Google Glass we deserved in the year 2024 (we still deserve it in 2025). Combined with Project Astra, Google is taking Gemini to realms mostly unseen today with Gemini Robotics. Related Project Astra: Everything you need to know about the Google Deepmind project Google's new vision for AI assistants Posts Google announced on its blog DeepMind, parent company Alphabet Inc.'s artificial intelligence research lab, that it is introducing two new AI models that will "lay the foundation for a new generation of helpful robots." The first model, Gemini Robotics, is a model that combines Gemini 2.0 with physical actions as a new output. The second model, Gemini Robotics-ER (ER stands for embodied reasoning), is a more advanced version of the "basic" Gemini Robotics that bakes in "advanced spatial understanding" and allows roboticists to run their own programs. Given that both are built on top of Gemini 2.0's impressive features, we're both excited and a bit nervous to see how capable physical AI outputs will become. Skynet references are so overplayed... Project Astra's connection This begs for the connection between Gemini Robotics and Project Astra to be discussed. Project Astra is an experimental AT assistant that utilizes your device's camera and microphone to respond in real time. You can show the assistant a water bottle, for instance, in a Zoom-call-like experience, and Project Astra's AI will be able to recognize and respond directly to the water bottle's existence in the physical sense. It's something that feels like science fiction, but its era is closer than you might think, especially with the advent of Gemini Robotics. CEO of Google and Alphabet Sundar Pichai posted on X about these specific models, saying that they "enable robots to draw from Gemini's multimodal understanding of the world to make changes on the fly + adapt to their surroundings." Given that he touted Astra's potential last month, there's no doubt Google is going all-out to advance the project. We'd be shocked if we didn't see more info about Project Astra and how it relates to Gemini Robotics at Google I/O 2025 in May.
[15]
Video: Google's humanoid robot folds origami, zips bags like humans
The models enable a variety of robots to perform a wider range of real-world tasks. Google DeepMind has unveiled two AI models, Gemini Robotics and Gemini Robotics-ER, designed to enhance robot control. Gemini Robotics features "vision-language-action" (VLA) capabilities, enabling it to interpret visuals, comprehend language commands, and execute movements. In contrast, Gemini Robotics-ER emphasizes "embodied reasoning," offering advanced spatial awareness and allowing roboticists to integrate it with existing robot control systems for improved functionality. These models aim to improve how robots of various forms perceive and interact with their surroundings, enabling more precise and delicate movements and potentially advancing humanoid assistants and other robotic applications.
[16]
Google announces Gemini Robotics for building general purpose robots
Google DeepMind today announced Gemini Robotics to bring Gemini and "AI into the physical world," with new models able to "perform a wider range of real-world tasks than ever before." In order for AI to be useful and helpful to people in the physical realm, they have to demonstrate "embodied" reasoning -- the humanlike ability to comprehend and react to the world around us -- as well as safely take action to get things done. The aim is to build general purpose robots, with CEO Sundar Pichai adding how Google has "always thought of robotics as a helpful testing ground for translating AI advances into the physical world." "Gemini Robotics" is a vision-language-action (VLA) model built on Gemini 2.0 "with the addition of physical actions as a new output modality for the purpose of directly controlling robots." Going in, Google has "three principal qualities" for robotic AI models: Generality: "able to adapt to different situations" Interactivity: "understand and respond quickly to instructions or changes in their environment" Dexterity: "can do the kinds of things people generally can do with their hands and fingers, like carefully manipulate objects." Google also announced the Gemini Robotics-ER ("embodied reasoning") vision-language model with enhanced spatial "understanding of the world in ways necessary for robotics, focusing especially on spatial reasoning, and allows roboticists to connect it with their existing low level controllers." For example, when shown a coffee mug, the model can intuit an appropriate two-finger grasp for picking it up by the handle and a safe trajectory for approaching it. These models run on various robot form factors (including bi-arm and humanoid robots), with trusted testers like Agile Robots, Agility Robots, Boston Dynamics, and Enchanted Tools.
[17]
Google is putting it's Gemini 2.0 AI into robots -- here's how it's going
Artificial intelligence is reaching out across just about every piece of tech you can think of, and Google's Gemini is right at the forefront. The tech giant is seemingly rolling out a new 'Circle to Search' tweaks after its new 'AI Mode' for Search and Gemini Calendar integration, but its DeepMind AI team has been far more ambitious. That team has developed two new models of Gemini specifically designed to work with robots. The first, called just "Gemini Robotics", is an advanced vision-language-action (VLA) LLM with Gemini 2.0 as its foundation. It's able to use physical motion to respond to prompts. In a recent press briefing attended by The Verge, DeepMind's senior director and head of robotics, Carolina Parada explained Gemini Robotics "draws from Gemini's multimodal world understanding and transfers it to the real world by adding physical actions as a new modality." The model makes big strides in generality, interactivity, and dexterity, the company says. "While we have made progress in each one of these areas individually in the past with general robotics, we're bringing [drastically] increasing performance in all three areas with a single model," Parada explained. "This enables us to build robots that are more capable, that are more responsive and that are more robust to changes in their environment." The second model is called Gemini Robots-ER, which is a VLM with "advanced spatial understanding", with the robot running the model able to make its own way through and environment as things change around it. It's able to discern between different shapes and items, making it ideal for packing items. As for the robots themselves, Google is partnering with Austin-based robotics company Apptronik to "build the next generation of humanoid robots," while the likes of Boston Dynamics, Agility Robotics and more are also getting test access to the Gemini Robotics-ER model. The advancements of robotics is seemingly being supercharged as the next logical step for AI application. At last year's CES 2024, Tom's Guide noticed the proliferation of robotics and we think it's fair to say the industry is about to have its iPhone moment.
[18]
Gemini just got physical and you should prepare for a robot revolution
Google Gemini is good at many things that happen inside a screen, including generative text and images. Still, the latest model, Google Robotics, is a vision language action model that moves the generative AI into the physical world and could substantially speed up the humanoid robot revolution race. Gemini Robotics, which Google's DeepMind unveiled on Wednesday, improves Gemini's abilities in three key areas: Each of these three aspects significantly impacts the success of robotics in the workplace and unknown environments. Generalization allows a robot to take Gemini's vast knowledge about the world and things, apply it to new situations, and accomplish tasks on which it's never been trained. In one video, researchers show a pair of robot arms controlled by Gemini Robotics, a table-top basketball game, and ask it to "slam dunk the basketball." Even though the robot hadn't seen the game before, it picked up the small orange ball and stuffed it through the plastic net. Google Gemini Robotics also makes robots more interactive and able to respond not only to changing verbal assignments but also to unpredictable conditions. In another video, researchers asked the robot to put grapes in a bowl with bananas, but then they moved the bowl around while the robot arm adjusted and still managed to put the grapes in a bowl. Google also demonstrated the robot's dextrous capabilities, which let it tackle things like playing tic-tac-toe on a wooden board, erasing a whiteboard, and folding paper into origami. Instead of hours of training on each task, the robots respond to near-constant natural language instructions and perform the tasks without guidance. It's impressive to watch. Naturally, adding AI to robotics is not new. Last year, OpenAI partnered up with Figure AI to develop a humanoid robot that can work out tasks based on verbal instructions. As with Gemini Robotics, Figure 01's visual language model works with the OpenAI speech model to engage in back-and-forth conversations about tasks and changing priorities. In the demo, the humanoid robot stands before dishes and a drainer. It's asked about what it sees, which it lists, but then the interlocutor changes tasks and asks for something to eat. Without missing a beat, the robot picks up an Apple and hands it to him. While most of what Google showed in the videos was disembodied robot arms and hands working through a wide range of physical tasks, there are grander plans. Google is partnering with Apptroniks to add the new model to its Apollo humanoid Robot. Google will connect the dots with additional programming, a new advanced visual language model called Gemini Robotics-ER (embodied reasoning). Gemini Robotics-ER will enhance robotics spatial reasoning and should help robot developers connect the models to existing controllers. Again, this should improve on-the-fly reasoning and make it possible for the robots to quickly figure out how to grasp and use unfamiliar objects. Google calls Gemini Rotbotics ER an end-to-end solution and claims it "can perform all the steps necessary to control a robot right out of the box, including perception, state estimation, spatial understanding, planning and code generation." Google is providing Gemini robotics -ER model to several business- and research-focused robotics firms, including Boston Dynamics (makers of Atlas), Agile Robots, and Agility Robots. All-in-all, it's a potential boon for humanoid robotics developers. However, since most of these robots are designed for factories or still in the laboratory, it may be some time before you have a Gemini-enhanced robot in your home.
[19]
Introducing Gemini Robotics and Gemini Robotics-ER, AI models designed for robots to understand, act and react to the physical world.
Introducing Gemini Robotics, our Gemini 2.0-based model designed for robotics At Google DeepMind, we've been making progress in how our Gemini models solve complex problems through multimodal reasoning across text, images, audio and video. So far however, those abilities have been largely confined to the digital realm. In order for AI to be useful and helpful to people in the physical realm, they have to demonstrate "embodied" reasoning -- the humanlike ability to comprehend and react to the world around us -- as well as safely take action to get things done. Today, we are introducing two new AI models, based on Gemini 2.0, which lay the foundation for a new generation of helpful robots. The first is Gemini Robotics, an advanced vision-language-action (VLA) model that was built on Gemini 2.0 with the addition of physical actions as a new output modality for the purpose of directly controlling robots. The second is Gemini Robotics-ER, a Gemini model with advanced spatial understanding, enabling roboticists to run their own programs using Gemini's embodied reasoning (ER) abilities. Both of these models enable a variety of robots to perform a wider range of real-world tasks than ever before. As part of our efforts, we're partnering with Apptronik to build the next generation of humanoid robots with Gemini 2.0. We're also working with a selected number of trusted testers to guide the future of Gemini Robotics-ER. We look forward to exploring our models' capabilities and continuing to develop them on the path to real-world applications.
[20]
New humanoid robots get smarter with Google's AI
Why it matters: The move could pave the way for robots that are vastly more versatile, but also opens up whole new categories of risks as AI systems take on physical capabilities. Driving the news: Google announced two new Gemini Robotics models, pairing its Gemini 2.0 AI with robots capable of physical action. The big picture: Google isn't the first to combine an LLM with robotics, but prior efforts have been far more limited. Between the lines: Early chatbots had a built-in safety limit; they could talk, but not act. That protection will vanish as AI gains the ability to make decisions and take physical actions -- introducing new risks. The other side: Google says it is taking a multi-layered approach to safety, one that includes the content protections already in Gemini, industry standard rules for physical robots as well as a "constitutional AI" governing the system's overall behavior. My thought bubble: This is a dangerous inflection point and giving a computer limbs is not a step to be taken lightly. If this were "Terminator 2," it would be the moment the heroes go back in time to shut it all down.
[21]
Gemini might soon drive futuristic robots that can do your chores
The inevitable outcome of artificial intelligence was always its use in robots, and that future might be closer than you think. Google today announced Gemini Robotics, an initiative to bring the world closer than ever to "truly general purpose robots." Google says AI robotics have to meet three principal qualities. First, they should be able to adapt on the fly to different situations. They must be able to not only understand but also respond to changing environments. Finally, the robots have to be dexterous enough to perform the same kind of tasks that humans can with their hands and fingers. Recommended Videos Google writes, "While our previous work demonstrated progress in these areas, Gemini Robotics represents a substantial step in performance on all three axes, getting us closer to truly general purpose robots." Gemini Robotics: Dynamic interactions Research and studies into robotics have advanced by leaps and bounds over the past few years. Boston Dynamics is particularly well known for its bots that can walk and navigate in public, and you've no doubt seen footage of the robot dogs on TikTok. If Gemini Robotics fulfills its mission statement, we could be on the cusp of introducing true household assistants that can do everything from cleaning the house to packing your lunch. Please enable Javascript to view this content The earliest tests of Gemini Robotics used mounted arms to perform tasks like playing tic-tac-toe, packing a lunchbox, and even playing cards -- and removing individual cards from a hand without bending them. It does this by incorporating an advanced vision-language model dubbed Gemini Robotics-ER (Embodied Reasoning). Google is bringing together cutting-edge research from each of these areas of robotics to combine it into one form, governed by a set of safety guidelines that dictate robotic behavior. In January 2024, Google suggested a "Robot Constitution" that would govern how these machines behaved, based largely on Isaac Asimov's Three Laws of Robotics. The team has since expanded that to something called the ASIMOV dataset that will allow researchers to better measure and test robot behavior in the real world. Gemini Robotics: Dexterous skills Google published a paper that's free to read, but fair warning: it's highly technical and complex. However, if you're interested in the world of robotics and what implications these projects hold for the future of the world, it's worth a read.
[22]
Google debuts two new AI models for powering robots - SiliconANGLE
Google LLC today introduced two new artificial intelligence models, Gemini Robotics and Gemini Robotics-ER, that are optimized to power autonomous machines. The algorithms are based on the company's Gemini 2.0 series of large language models. Introduced in December, the LLMs can process not only text but also multimodal data as video. This latter capability enables the new Gemini Robotics and Gemini Robotics-ER models to analyze footage from a robot's cameras when making decisions. Gemini Robotics is described as a vision-language-action model. According to Google, robots equipped with the model can perform complex tasks based on natural language instructions. A user could, for example, ask the AI to fold paper into origami shapes or place items in a Ziploc bag. Historically, teaching an industrial robot a new task required manual programming. The task necessities specialized skills and can consume a significant amount of time. To ease the robot configuration process, Google's researchers built Gemini Robotics with generality in mind. The company says that the AI can carry out tasks it was not taught to perform during training, which reduces the need for manual programming. To test how well Gemini Robotics responds to new tasks, Google evaluated it using an AI generalization benchmark. The company determined the algorithm more than doubled the performance of earlier vision-language-action models. According to Google, Gemini Robotics can not only perform tasks it was not taught to perform but also change how it carries out those tasks when environmental conditions change. "If an object slips from its grasp, or someone moves an item around, Gemini Robotics quickly replans and carries on -- a crucial ability for robots in the real world, where surprises are the norm," Carolina Parada, head of robotics at Google DeepMind, detailed in a blog post. The other new AI model that the company debuted today, Robotics-ER, is geared toward spatial reasoning. This is a term for the complex sequence of computations that a robot must carry out before it can perform a task. Picking up a coffee mug, for example, requires a robotic arm to find the handle and calculate the angle from which it should be approached. After developing a plan for how to carry out a task, Gemini Robotics-ER uses Gemini 2.0's coding capabilities to turn the plan into a configuration script. This script programs the robot in which the AI is installed. If a task proves too complicated for Gemini Robotics-EP, developers can teach it the best course of action with a "handful of human demonstrations." "Gemini Robotics-ER can perform all the steps necessary to control a robot right out of the box, including perception, state estimation, spatial understanding, planning and code generation," Parada wrote. "In such an end-to-end setting the model achieves a 2x-3x success rate compared to Gemini 2.0."
[23]
Google DeepMind's Gemini Robotics to Build AI for Robots of all Shapes and Sizes
Gemini Robotics-ER achieves a two to three times higher success rate than Gemini 2.0 in end-to-end settings. Google DeepMind has introduced two new AI models, Gemini Robotics and Gemini Robotics-ER, which have been designed to enhance robotic capabilities in the physical world, the company announced. These models are based on Gemini 2.0 and aim to enable robots to perform a broader range of real-world tasks. The company's ultimate goal is "to develop AI that could work for any robot, no matter its shape or size". According to Google, for AI models to be useful in robotics, they must be "general, interactive, and dexterous" to adapt to various scenarios, understand commands, and perform tasks similar to human actions. Gemini Robotics is a vision-language-action (VLA) model that allows robots to comprehend new situations and execute physical actions without specific training. For instance, it can handle tasks like folding paper or unscrewing a bottle cap. Notably, Figure AI's recent Helix model is another company that recently cracked AI for humanoids, allowing the robots to perform complex tasks using natural language. Gemini Robotics-ER is designed for roboticists to develop their own models, offering advanced spatial understanding and using the embodied reasoning abilities of Gemini. It enhances Gemini 2.0's capabilities by improving 2D and 3D object detection and pointing. This model allows roboticists to integrate it with existing low-level controllers, enabling robots to perform complex tasks like grasping objects safely. Google Deepmind researchers mentioned, "We trained the model primarily on data from the bi-arm robotic platform, ALOHA 2, but we also demonstrated that it could control a bi-arm platform, based on the Franka arms used in many academic labs." For example, when shown a coffee mug, Gemini Robotics-ER can determine an appropriate two-finger grasp and plan a safe approach trajectory. It achieves a two to three times higher success rate than Gemini 2.0 in end-to-end settings, and can use in-context learning based on human demonstrations when code generation is insufficient. In a post on X, Google also mentioned partnering further with Apptronik, a US-based robotics company, to develop the next generation of humanoid robots using these models. This will open the giant to more partnerships in the future, like those with testers including Agile Robots, Agility Robotics, Boston Dynamics and Enchanted Tools. The collaboration includes demonstrations of robots performing tasks such as connecting devices and packing lunchboxes in response to vocal commands. While the commercial availability of this technology has not been announced, Google continues to explore its capabilities. In the future, these models are expected to contribute significantly to developing more capable and adaptable robots.
[24]
Google DeepMind's Gemini Robotics AI Models That Can Control Robots
The models were trained on data from the robotic platform Aloha 2 Google DeepMind unveiled two new artificial intelligence (AI) models on Thursday, which can control robots to make them perform a wide range of tasks in real-world environments. Dubbed Gemini Robotics and Gemini Robotics-ER (embodied reasoning), these are advanced vision language models capable of displaying spatial intelligence and performing actions. The Mountain View-based tech giant also revealed that it is partnering with Apptronik to build Gemini 2.0-powered humanoid robots. The company is also testing these models to evaluate them further, and understand how to make them better. In a blog post, DeepMind detailed the new AI models for robots. Carolina Parada, the Senior Director and Head of Robotics at Google DeepMind, said that for AI to be helpful to people in the physical world, they would have to demonstrate "embodied" reasoning -- the ability to interact and understand the physical world and perform actions to complete tasks. Gemini Robotics, the first of the two AI models, is an advanced vision-language-action (VLA) model which was built using the Gemini 2.0 model. It has a new output modality of "physical actions" which allows the model to directly control robots. DeepMind highlighted that to be useful in the physical world, AI models for robotics require three key capabilities -- generality, interactivity, and dexterity. Generality refers to a model's ability to adapt to different situations. Gemini Robotics is "adept at dealing with new objects, diverse instructions, and new environments," claimed the company. Based on internal testing, the researchers found the AI model more than doubles the performance on a comprehensive generalisation benchmark. The AI model's interactivity is built on the foundation of Gemini 2.0, and it can understand and respond to commands phrased in everyday, conversational language and different languages. Google claimed that the model also continuously monitors its surroundings, detects changes to the environment or instructions, and adjusts its actions based on the input. Finally, DeepMind claimed that Gemini Robotics can perform extremely complex, multi-step tasks that require precise manipulation of the physical environment. The researchers said the AI model can control robots to fold a piece of paper or pack a snack into a bag. The second AI model, Gemini Robotics-ER, is also a vision language model but it focuses on spatial reasoning. Drawing from Gemini 2.0's coding and 3D detection, the AI model is said to display the ability to understand the right moves to manipulate an object in the real world. Highlighting an example, Parada said when the model was shown a coffee mug, it was able to generate a command for a two-finger grasp to pick it up by the handle along a safe trajectory. The AI model performs a large number of steps necessary to control a robot in the physical world, including perception, state estimation, spatial understanding, planning, and code generation. Notably, neither of the two AI models is currently available in the public domain. DeepMind will likely first integrate the AI model into a humanoid robot and evaluate its capabilities, before releasing the technology.
[25]
Google debuts AI model for robotics, challenging Meta, OpenAI
Google DeepMind's research unit is launching Gemini Robotics, a new version of its AI model focused on creating more skilled and interactive robots. Another model, Gemini Robotics-ER, is designed to enhance spatial understanding and assist robot developers in creating programs using Gemini's reasoning abilities.Alphabet's artificial intelligence lab is debuting two new models focused on robotics, which will help developers train robots to respond to unfamiliar scenarios -- a longstanding challenge in the field. Research unit Google DeepMind will release Gemini Robotics, a new branch of its flagship AI model aimed at developing robots that are more dexterous and interactive, it said Tuesday. Another model, Gemini Robotics-ER, specializes in spatial understanding, and will help robot makers build new programs using Gemini's reasoning capabilities. By applying Gemini to robots, Google is moving closer to developing "general purpose robotics" that can field a variety of tasks, DeepMind engineer Kanishka Rao said in a media briefing. "Our worlds are super messy and dynamic and rich, and I think a general purpose intelligent robot needs to be able to deal with that messiness." The Silicon Valley dream of building robots that can perform tasks on par with humans is attracting renewed attention and investment. Meta Platforms Inc., Tesla Inc. and OpenAI have ramped up their work on robotics, and startups are in talks to raise funding at sky-high valuations. In a pre-taped demonstration on Tuesday, Google researchers showed how robots running on their technology responded to simple commands. One robot, standing before a smattering of letter tiles, spelled "Ace" after a trainer asked it to make a word. Engineers also set out a miniature toy basketball court in the lab. Another robot, when asked to perform a dunk, pressed a small plastic ball through the hoop. "The team was really excited when we first saw the robot dunk the basketball," Rao said. "It's because the robot has never ever seen anything related to basketball. It's getting this general concept, understanding of what a basketball net looks like and what the word 'slam dunk' means from Gemini and is able to connect those to actually accomplish the task in the physical world." Google has a somewhat tortured history in robotics. More than a decade ago, the company acquired at least eight robotics companies to further cofounders Larry Page and Sergey Brin's goal of developing consumer-oriented robots with the help of machine learning. As the years went by, the efforts coalesced within Google X, Alphabet's moonshot lab, and in 2021 a unit spun out called Everyday Robots, which specialized in robots that completed daily tasks like sorting trash. About two years later, Alphabet announced it would shut down the unit as part of its sweeping 2023 budget cuts. Still, Alphabet never fully exited the robotics business. At the time, the company said it would consolidate some of the technology and team into existing robotics efforts. Now, the company appears to be rebooting these efforts under the banner of generative AI. During the briefing, Google stressed that the work was in an "early exploration" phase. Vikas Sindhwani, a DeepMind research scientist, said the Gemini models had been developed with a "strong understanding of common sense safety" in physical environments. He said Google plans to deploy the robots gradually, starting at safe distances from humans and becoming more interactive and collaborative over time as safety performance improves. Google said it would start exploring Gemini's robotics capabilities with companies in the field, including Apptronik, which it is partnering with to develop humanoid robots. Other partners testing its Gemini Robotics-ER model include Agile Robots and Boston Dynamics -- which Alphabet acquired in 2013, then later sold to SoftBank Group.
[26]
Google's New Gemini AI Robots Are Smarter, Faster and More Interactive
Google has introduced Gemini Robotics, powered by the advanced Gemini 2.0 model, to enhance the capabilities of general-purpose robotic agents in the physical world. These robots are designed to be interactive, dexterous, and general, enabling them to perform complex tasks, respond to dynamic environments, and generalize across a wide range of real-world applications. The initiative aims to integrate advanced AI into robotics to create versatile and adaptive systems. A defining feature of Gemini Robotics is its focus on real-time interactivity, allowing seamless collaboration between humans and robots. These systems are designed to respond instantly to user actions and voice commands, assistd by low-latency processing. This capability allows the robots to adapt quickly to changing conditions, making them particularly effective in dynamic environments where speed and precision are essential. For example, a Gemini-powered robot can assist in collaborative tasks such as assembling furniture or organizing a workspace. By interpreting verbal instructions and adjusting its actions based on user feedback, the robot can work alongside humans in industries such as: This real-time engagement not only enhances efficiency but also opens up new possibilities for human-robot collaboration, making these systems versatile tools in a variety of professional and personal settings. Another standout feature of Gemini Robotics is its dexterity, which enables these robots to excel at tasks requiring fine motor skills and spatial awareness. By using vision-based AI models, the robots can perceive and interpret their surroundings with remarkable accuracy, allowing them to handle delicate objects or assemble intricate components. For instance, a Gemini robot can identify an object's shape, size, and orientation, allowing it to manipulate items with precision. This capability is particularly valuable in: The combination of spatial reasoning and advanced object manipulation makes these robots highly adaptable to industries that demand meticulous attention to detail. One of the most innovative aspects of Gemini Robotics is its ability to generalize across tasks, setting it apart from traditional robots that are limited to predefined functions. Gemini-powered systems integrate language and action models, allowing them to understand complex instructions and adapt to new challenges without requiring extensive reprogramming. For example, a Gemini robot tasked with organizing an unfamiliar room can analyze its environment, interpret verbal or written instructions, and determine the appropriate placement of objects -- all without prior training. This adaptability is particularly beneficial in scenarios such as: By combining reasoning capabilities with physical adaptability, Gemini Robotics demonstrates potential for applications that require both flexibility and intelligence. The versatility of Gemini Robotics extends across a wide range of physical forms and use cases. By integrating vision, language, and action models, these robots can operate effectively in environments that demand both adaptability and precision. In warehouses, Gemini robots can streamline operations by sorting and transporting goods, reducing manual labor and improving efficiency. In homes, they can assist with daily tasks such as cleaning, cooking, or even providing companionship. In public spaces, they can offer customer service, assist with navigation, or enhance security. This broad applicability ensures that Gemini Robotics can address the unique challenges of various industries, enhancing both productivity and user experience. To accelerate development and refine its systems, Google has launched a Trusted Testers Program, inviting industry partners to collaborate and provide real-world feedback. This initiative aims to ensure that Gemini-powered robots meet the practical needs of users while addressing the specific demands of different industries. Google Gemini Robotics represents a significant step forward in the integration of AI and robotics. By combining real-time interactivity, precision dexterity, and task generalization, these systems are designed to engage effectively with the physical world. The potential applications span a wide range of industries, from healthcare and logistics to retail and disaster response, highlighting the adaptability and versatility of these robots. As development progresses, Gemini Robotics could play a pivotal role in addressing challenges that require collaboration, precision, and adaptability. By using advanced AI models and fostering partnerships through its Trusted Testers Program, Google aims to refine and expand the capabilities of these systems, making sure they are well-suited for dynamic, real-world scenarios. The integration of AI into robotics continues to evolve, and Gemini Robotics exemplifies the possibilities of this convergence. With its focus on adaptability, precision, and interactivity, this initiative has the potential to influence how robots are used across industries, shaping the future of human-robot interaction and redefining the role of robotics in everyday life. Dive deeper into humanoid robotics with other articles and guides we have written below.
[27]
Google DeepMind unveils Gemini Robotics, a new AI model for advanced robotics By Investing.com
Investing.com -- Google (NASDAQ:GOOGL) DeepMind has announced the introduction of two new artificial intelligence models, Gemini Robotics and Gemini Robotics-ER, both based on the Gemini 2.0 technology. These models are intended to lay the groundwork for the next generation of practical robots. Gemini Robotics is an advanced vision-language-action (VLA) model that extends Gemini 2.0 to include physical actions, enabling direct control of robots. The Gemini Robotics-ER model enhances Gemini's embodied reasoning (ER) abilities, offering advanced spatial understanding for roboticists to run their own programs. The new models are designed to allow a wide variety of robots to perform a broader range of real-world tasks. Google DeepMind is collaborating with Apptronik to create the next generation of humanoid robots using Gemini 2.0. Additionally, they are working with a select group of trusted testers to guide the development of Gemini Robotics-ER. To be effective and beneficial to people, AI models for robotics need to be general, interactive, and dexterous. Gemini Robotics has made significant progress in all these areas, bringing us closer to truly multipurpose robots. Gemini Robotics uses Gemini's world understanding to generalize to new situations and solve a wide range of tasks. It is also skilled at handling new objects, diverse instructions, and new environments. The model is interactive due to its foundation on Gemini 2.0, allowing it to understand and respond to commands in everyday, conversational language. It can also adapt its behavior based on changes in its environment or instructions. Gemini Robotics can perform complex, multi-step tasks that require precise manipulation, such as origami folding or packing a snack into a Ziploc bag. The model has been designed to adapt to different types of robots, with training primarily on data from the bi-arm robotic platform, ALOHA 2. The Gemini Robotics-ER model enhances Gemini's understanding of the world in ways necessary for robotics, focusing especially on spatial reasoning. It improves Gemini 2.0's existing abilities like pointing and 3D detection by a large margin. Gemini Robotics-ER can perform all the steps necessary to control a robot right out of the box, including perception, state estimation, spatial understanding, planning and code generation. Google DeepMind is taking a holistic approach to addressing safety in their research, from low-level motor control to high-level semantic understanding. They are also releasing a new dataset to evaluate and improve semantic safety in embodied AI and robotics. They have developed a framework to automatically generate data-driven constitutions - rules expressed directly in natural language - to steer a robot's behavior. Google DeepMind is collaborating with experts in their Responsible Development and Innovation team as well as their Responsibility and Safety Council to assess the societal implications of their work. They are also consulting with external specialists on the challenges and opportunities presented by embodied AI in robotics applications. The Gemini Robotics-ER model is also available to trusted testers including Agile Robots, Agility Robots, Boston Dynamics, and Enchanted Tools. Google DeepMind is looking forward to exploring the capabilities of these models and continuing to develop AI for the next generation of more helpful robots.
Share
Share
Copy Link
Google DeepMind unveils Gemini Robotics and Gemini Robotics-ER, advanced AI models designed to control robots with improved generalization, adaptability, and dexterity. These models, built on the Gemini 2.0 language model, aim to create more intuitive and capable robots for various tasks.
Google DeepMind has unveiled two new AI models, Gemini Robotics and Gemini Robotics-ER, designed to control robots and enhance their capabilities in understanding and interacting with the physical world 12. These models, built upon the foundation of Gemini 2.0, Google's most advanced large language model (LLM), represent a significant step towards creating more intuitive and adaptable robots 13.
Gemini Robotics incorporates "vision-language-action" (VLA) abilities, allowing robots to process visual information, understand language commands, and generate physical movements 2. The model has demonstrated impressive capabilities, including:
In tests, robots using Gemini Robotics consistently outperformed state-of-the-art rivals on both familiar and unfamiliar tasks 1. For instance, robot hands achieved a success rate of over 70% on fiddly tasks after seeing fewer than 100 demonstrations 1.
Gemini Robotics-ER focuses on "embodied reasoning" with enhanced spatial understanding 2. This model aims to provide robots with intuitive physical world understanding, similar to human experience-based learning 4. For example, it can identify appropriate grasping points for objects based on human-like reasoning 4.
The models have been tested on various robot types, including humanoid robots and robotic arms 1. Google DeepMind has partnered with Apptronik to develop the next generation of humanoid robots using Gemini 2.0 2.
Ensuring safety is a major challenge in applying these models to real-world machines. Google DeepMind has implemented a layered approach to safety, including:
The Gemini models have shown strong performance on the ASIMOV benchmark, correctly answering over 80% of safety-related questions 4.
The introduction of Gemini Robotics models could potentially revolutionize the robotics industry by enabling more general-purpose robots capable of adapting to various tasks and environments 3. This development aligns with the broader industry goal of creating embodied AI, which companies like Nvidia are also pursuing 2.
While the technology shows promise, experts caution that the impressive performance is currently limited to a narrow set of high-quality training data 4. The real test will be in generalizing these capabilities to diverse, real-world scenarios 14.
As the field progresses, researchers and companies will need to address challenges related to data collection, safety, and the ethical implications of increasingly capable AI-powered robots 45.
Reference
[3]
IEEE Spectrum: Technology, Engineering, and Science News
|Are Google's New Gemini Robotics Models the Key to Smarter, More Adaptable Robots?[5]
Google demonstrates the capabilities of its Gemini AI model in training robots to navigate and interact with the world, showcasing advancements in artificial intelligence and robotics.
3 Sources
3 Sources
Apptronik, an AI-powered humanoid robotics company, partners with Google DeepMind to develop intelligent humanoid robots capable of assisting humans in dynamic environments, potentially transforming industries and addressing global challenges.
2 Sources
2 Sources
Recent leaks suggest Google is preparing to launch Gemini 2.0, a powerful AI model that could rival OpenAI's upcoming o1. The new model promises enhanced capabilities in reasoning, multimodal processing, and faster performance.
5 Sources
5 Sources
Google has announced the release of new Gemini models, showcasing advancements in AI technology. These models promise improved performance and capabilities across various applications.
2 Sources
2 Sources
Google introduces Gemini 2.0 Flash Thinking, an advanced AI model with enhanced reasoning capabilities, multimodal processing, and transparent decision-making, positioning it as a strong competitor in the AI landscape.
22 Sources
22 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved