6 Sources
6 Sources
[1]
Google DeepMind unveils its first "thinking" robotics AI
Generative AI systems that create text, images, audio, and even video are becoming commonplace. In the same way AI models output those data types, they can also be used to output robot actions. That's the foundation of Google DeepMind's Gemini Robotics project, which has announced a pair of new models that work together to create the first robots that "think" before acting. Traditional LLMs have their own set of problems, but the introduction of simulated reasoning did significantly upgrade their capabilities, and now the same could be happening with AI robotics. The team at DeepMind contends that generative AI is a uniquely important technology for robotics because it unlocks general functionality. Current robots have to be trained intensively on specific tasks, and they are typically bad at doing anything else. "Robots today are highly bespoke and difficult to deploy, often taking many months in order to install a single cell that can do a single task," said Carolina Parada, head of robotics at Google DeepMind. The fundamentals of generative systems make AI-powered robots more general. They can be presented with entirely new situations and workspaces without needing to be reprogrammed. DeepMind's current approach to robotics relies on two models: one that thinks and one that does. The two new models are known as Gemini Robotics 1.5 and Gemini Robotics-ER 1.5. The former is a vision-language-action (VLA) model, meaning it uses visual and text data to generate robot actions. The "ER" in the other model stands for embodied reasoning. This is a vision-language model (VLM) that takes visual and text input to generate the steps needed to complete a complex task. The thinking machines Gemini Robotics-ER 1.5 is the first robotics AI capable of simulated reasoning like modern text-based chatbots -- Google likes to call this "thinking," but that's a bit of a misnomer in the realm of generative AI. DeepMind says the ER model achieves top marks in both academic and internal benchmarks, which shows that it can make accurate decisions about how to interact with a physical space. It doesn't undertake any actions, though. That's where Gemini Robotics 1.5 comes in. Imagine that you want a robot to sort a pile of laundry into whites and colors. Gemini Robotics-ER 1.5 would process the request along with images of the physical environment (a pile of clothing). This AI can also call tools like Google search to gather more data. The ER model then generates natural language instructions, specific steps that the robot should follow to complete the given task. Gemini Robotics 1.5 (the action model) takes these instructions from the ER model and generates robot actions while using visual input to guide its movements. But it also goes through its own thinking process to consider how to approach each step. "There are all these kinds of intuitive thoughts that help [a person] guide this task, but robots don't have this intuition," said DeepMind's Kanishka Rao. "One of the major advancements that we've made with 1.5 in the VLA is its ability to think before it acts." Both of DeepMind's new robotic AIs are built on the Gemini foundation models but have been fine-tuned with data that adapts them to operating in a physical space. This approach, the team says, gives robots the ability to undertake more complex multi-stage tasks, bringing agentic capabilities to robotics. The DeepMind team tests Gemini robotics with a few different machines, like the two-armed Aloha 2 and the humanoid Apollo. In the past, AI researchers had to create customized models for each robot, but that's no longer necessary. DeepMind says that Gemini Robotics 1.5 can learn across different embodiments, transferring skills learned from Aloha 2's grippers to the more intricate hands on Apollo with no specialized tuning. All this talk of physical agents powered by AI is fun, but we're still a long way from a robot you can order to do your laundry. Gemini Robotics 1.5, the model that actually controls robots, is still only available to trusted testers. However, the thinking ER model is now rolling out in Google AI Studio, allowing developers to generate robotic instructions for their own physically embodied robotic experiments.
[2]
Google DeepMind's new AI models can search the web to help robots complete tasks
Google DeepMind says its upgraded AI models enable robots to complete more complex tasks -- and even tap into the web for help. During a press briefing, Google DeepMind's head of robotics, Carolina Parada, told reporters that the company's new AI models work in tandem to allow robots to "think multiple steps ahead" before taking action in the physical world. The system is powered by the newly launched Gemini Robotics 1.5 alongside the embodied reasoning model, Gemini Robotics-ER 1.5, which are updates to AI models that Google DeepMind introduced in March. Now robots can perform more than just singular tasks, such as folding a piece of paper or unzipping a bag. They can now do things like separate laundry by dark and light colors, pack a suitcase based on the current weather in London, as well as help someone sort trash, compost, and recyclables based on a web search tailored to a location's specific requirements. "The models up to now were able to do really well at doing one instruction at a time in a way that is very general," Parada said. "With this update, we're now moving from one instruction to actually genuine understanding and problem-solving for physical tasks." To do this, robots can use the upgraded Gemini Robotics-ER 1.5 model to form an understanding of their surroundings, and use digital tools like Google Search to find more information. Gemini Robotics-ER 1.5 then translates those findings into natural language instructions for Gemini Robotics 1.5, allowing the robot to use the model's vision and language understanding to carry out each step. Additionally, Google DeepMind announced that Gemini Robotics 1.5 can help robots "learn" from each other, even if they have different configurations. Google DeepMind found that tasks presented to the ALOHA2 robot, which consists of two mechanical arms, "just work" on the bi-arm Franka robot, as well as Apptronik's humanoid robot Apollo. "This enables two things for us: one is to control very different robots -- including a humanoid -- with a single model," Google DeepMind software engineer Kanishka Rao said during the briefing. "And secondly, skills that are learned on one robot can now be transferred to another robot." As part of the update, Google DeepMind is rolling out Gemini Robotics-ER 1.5 to developers through the Gemini API in Google AI Studio, while only select partners can access Gemini Robotics 1.5.
[3]
Google DeepMind unveils new robotics AI model that can sort laundry
Google DeepMind has unveiled artificial intelligence models that further advance reasoning capabilities in robotics, enabling them to solve harder problems and complete more complicated real world tasks like sorting laundry and recycling rubbish. The company's new robotics models, called Gemini Robotics 1.5 and Gemini Robotics-ER 1.5, are designed to help robots complete multi-step tasks by "thinking" before they act, as part of the tech industry's push to make the general-purpose machines more useful in the everyday world. According to Google DeepMind, a robot trained using its new model was able to plan how to complete tasks that might take several minutes, such as folding laundry into different baskets based on colour. The development comes as tech groups, including OpenAI and Tesla, are racing to integrate AI models into robots in the hope that they could transform a range of industries, from healthcare to manufacturing. "Models up to now were able to do really well at doing one instruction at a time," said Carolina Parada, senior director and head of robotics at Google DeepMind. "We're now moving from one instruction to actually genuine understanding and problem solving for physical tasks." In March, Google DeepMind unveiled the first iteration of these models, which took advantage of the company's Gemini 2.0 system to help robots adjust to different new situations, respond quickly to verbal instructions or changes in their environment, and be dexterous enough to manipulate objects. While that version was able to reason how to complete tasks, such as folding paper or unzipping a bag, the latest model can follow a series of instructions and also use tools such as Google search to help it solve problems. In one demonstration, a Google DeepMind researcher asked the robot to pack a beanie into her bag for a trip to London. The robot was also able to tell the researcher that it was going to rain for several days during the trip, and so the robot also packed an umbrella into the bag. The robot was also able to sort rubbish into appropriate recycling bins, by first using online tools to figure out it was based in San Francisco, and then searching the web for the city's recycling guidelines. The Gemini Robotics 1.5 is a vision-language-action model, which combines several different inputs and then translates them into action. These systems are able to learn about the world through data downloaded from the internet. Ingmar Posner, professor of applied artificial intelligence at the University of Oxford, said learning from this kind of internet scale data could help robotics reach a "ChatGPT moment". But Angelo Cangelosi, co-director of the Manchester Centre for Robotics and AI, cautioned against calling what these robots are doing as real thinking. "It's just discovering regularities between pixels, between images, between words, tokens, and so on," he said. Another development with Google DeepMind's new system is a technique called "motion transfer", which allows one AI model to use skills that were designed for a specific type of robot body, such as robotic arms, and transfer it to another, such as a humanoid robot. Traditionally, to get robots to move around in a space and take action requires plenty of meticulous planning and coding, and this training was often specific to a particular type of robot, such as robotic arms. This "motion transfer" breakthrough could help solve a major bottleneck in AI robotics development, which is the lack of enough training data. "Unlike large language models that can be trained on the entire vast internet of data, robotics has been limited by the painstaking process of collecting real [data for robots]," said Kanishka Rao, principal software engineer of robotics at Google DeepMind. The company said it still needed to overcome a number of hurdles in the technology. This included creating the ability for robots to learn skills by watching videos of humans doing tasks. It also said robots needed to become more dexterous as well as reliable and safe before they could be rolled out into environments where they interact with humans. "One of the major challenges of building general robots is that things that are intuitive for humans are actually quite difficult for robots," said Rao.
[4]
Gemini Robotics 1.5 brings AI agents into the physical world
We're powering an era of physical agents -- enabling robots to perceive, plan, think, use tools and act to better solve complex, multi-step tasks. Earlier this year, we made incredible progress bringing Gemini's multimodal understanding into the physical world, starting with the Gemini Robotics family of models. Today, we're taking another step towards advancing intelligent, truly general-purpose robots. We're introducing two models that unlock agentic experiences with advanced thinking: These advances will help developers build more capable and versatile robots that can actively understand their environment to complete complex, multi-step tasks in a general way. Starting today, we're making Gemini Robotics-ER 1.5 available to developers via the Gemini API in Google AI Studio. Gemini Robotics 1.5 is currently available to select partners. Read more about building with the next generation of physical agents on the Developer blog. Most daily tasks require contextual information and multiple steps to complete, making them notoriously challenging for robots today. For example, if a robot was asked, "Based on my location, can you sort these objects into the correct compost, recycling and trash bins?" it would need to search for relevant local recycling guidelines on the internet, look at the objects in front of it and figure out how to sort them based on those rules -- and then do all the steps needed to completely put them away. So, to help robots complete these types of complex, multi-step tasks, we designed two models that work together in an agentic framework. Our embodied reasoning model, Gemini Robotics-ER 1.5, orchestrates a robot's activities, like a high-level brain. This model excels at planning and making logical decisions within physical environments. It has state-of-the-art spatial understanding, interacts in natural language, estimates its success and progress, and can natively call tools like Google Search to look for information or use any third-party user-defined functions. Gemini Robotics-ER 1.5 then gives Gemini Robotics 1.5 natural language instructions for each step, which uses its vision and language understanding to directly perform the specific actions. Gemini Robotics 1.5 also helps the robot think about its actions to better solve semantically complex tasks, and can even explain its thinking processes in natural language -- making its decisions more transparent.
[5]
Google DeepMind Unveils Gemini Robotics 1.5 And ER 1.5 To Help Robots Reason, Plan, And Learn Across Different Tasks
Robots have traditionally been preprogrammed machines that carry out strictly the instructions given to them. But it seems like that is about to change with the launch of Gemini Robotics 1.5 and Gemini Robotics-ER 1.5. Google DeepMind seems to be taking the leap and pushing for a new era by working on more adaptive robots that can reason, learn, and even solve real-world problems. Tech giants seem to be constantly looking for ways to explore the potential of technology by bringing in more advanced AI models and solutions. While robots have had limited application in terms of performing repetitive tasks under controlled situations, such as moving boxes or assembling car parts, Google DeepMind is determined to equip its latest models to handle complex tasks or even look up information online if there is a need. Google introduced two new AI models, Gemini Robotics 1.5 and Gemini Robotics-ER 1.5, during a recent update on its robotic direction. The ER model is meant to focus on reasoning and breaking down tasks by finding more information from the web, while the robotics model is meant to carry out actions. Carolina Parada, the lead robotics head at Google DeepMind, detailed this approach and explained how, due to this pairing, the robots are able to think multiple steps ahead and not focus on single steps alone. So now, with the updated Gemini Robotics models, you could not only get help with loading suitcases for vacations, but also get assistance with packing choices, checking the weather, and basically planning the trip better. Since two-model systems are at work, it allows for a human way of operating that first plans and then acts. The major upgrade is that knowledge transfer, meaning skills developed on one robot, can be transferred to another, even if it is built or designed differently. The implications of this new Gemini-powered robot can be huge, especially in the healthcare sector, where assistive robots can help according to different patient needs. Even for personal use, it can end up being a useful assistant. However, with technological leaps, challenges do remain, especially with AI models rapidly progressing, and this would not be free from complications either. Data privacy, reliability, and safety questions arise, so Google would need to rigorously conduct tests before enabling large-scale deployment. One thing is for certain: Google DeepMind is determined to transform robots from tools into assistants that can work alongside humans by teaching them how to think and act.
[6]
Google DeepMind unveils new AI models that use the web to aid robots (GOOG:NASDAQ)
Google DeepMind (NASDAQ:GOOG) (NASDAQ:GOOGL) unveiled two new artificial intelligence models on Thursday that the tech company says will allow robots to utilize the internet to perform certain tasks. The models, known as Gemini Robotics 1.5 and Gemini Robotics-ER 1.5, will allow the Gemini Robotics models are designed to help robots think, plan, and complete complex physical tasks more transparently and effectively, enhancing their general problem-solving capabilities. Gemini Robotics-ER 1.5 allows developers to build robots that reason, access digital tools for information, and execute detailed multi-step tasks using internet resources. Alphabet aims to lead in the development of versatile, general-purpose robots that can transfer learning between platforms, thereby expanding market opportunities and technological leadership.
Share
Share
Copy Link
Google DeepMind unveils advanced AI models, Gemini Robotics 1.5 and Gemini Robotics-ER 1.5, enabling robots to reason, plan, and execute complex multi-step tasks. This breakthrough brings AI agents into the physical world, potentially transforming industries from healthcare to manufacturing.
Google DeepMind has unveiled a groundbreaking advancement in artificial intelligence for robotics, introducing two new models: Gemini Robotics 1.5 and Gemini Robotics-ER 1.5. These models represent a significant leap forward in creating robots that can 'think' before acting, potentially revolutionizing the field of robotics and its applications across various industries
1
2
.Source: Wccftech
The new system employs a two-model approach, combining the strengths of both to create more capable and versatile robots:
Gemini Robotics-ER 1.5: This 'embodied reasoning' model acts as the robot's high-level brain, excelling in planning and decision-making within physical environments. It can interact using natural language, estimate its progress, and even use tools like Google Search to gather information
4
.Gemini Robotics 1.5: This model translates the instructions from the ER model into specific actions. It uses vision and language understanding to perform tasks and can explain its thinking processes in natural language
4
.The combination of these models enables robots to undertake complex, multi-step tasks that were previously challenging for traditional robots. Some notable capabilities include:
Web-assisted problem-solving: Robots can now search the internet for information to complete tasks, such as looking up local recycling guidelines to sort waste correctly
2
3
.Multi-step task planning: The system can break down complex tasks into manageable steps, allowing robots to complete activities like sorting laundry by color or packing a suitcase based on weather conditions
2
3
.Skill transfer: Knowledge gained by one robot can be transferred to others with different configurations, potentially accelerating the development and deployment of robotic systems
1
2
.Source: The Verge
The advancements brought by Gemini Robotics 1.5 and Gemini Robotics-ER 1.5 have far-reaching implications for various industries:
Healthcare: Assistive robots could potentially adapt to different patient needs, providing more personalized care
5
.Manufacturing: The ability to quickly reprogram and adapt robots could lead to more flexible and efficient production lines
1
.Household assistance: Robots could become more useful in everyday tasks, from organizing belongings to helping with chores
2
3
.Source: Google DeepMind
Related Stories
While the potential of these new models is significant, several challenges remain:
Safety and reliability: Ensuring that AI-powered robots can operate safely alongside humans is crucial
3
.Data privacy: As robots become more integrated with web services and personal information, protecting user data will be essential
5
.Ethical considerations: The development of more autonomous robots raises questions about decision-making and accountability
3
.Google DeepMind is making Gemini Robotics-ER 1.5 available to developers through the Gemini API in Google AI Studio, while Gemini Robotics 1.5 is currently limited to select partners
2
4
. As development continues, the company aims to overcome hurdles such as enabling robots to learn from human demonstration videos and improving their dexterity3
.The introduction of Gemini Robotics 1.5 and Gemini Robotics-ER 1.5 marks a significant milestone in the field of AI robotics. By enabling robots to reason, plan, and execute complex tasks, Google DeepMind is paving the way for a new generation of intelligent machines that could transform various aspects of our lives and industries.
Summarized by
Navi
[1]
[3]
[4]