5 Sources
[1]
Google's new robotics AI can run without the cloud and still tie your shoes
We sometimes call chatbots like Gemini and ChatGPT "robots," but generative AI is also playing a growing role in real, physical robots. After announcing Gemini Robotics earlier this year, Google DeepMind has now revealed a new on-device VLA (vision language action) model to control robots. Unlike the previous release, there's no cloud component, allowing robots to operate with full autonomy. Carolina Parada, head of robotics at Google DeepMind, says this approach to AI robotics could make robots more reliable in challenging situations. This is also the first version of Google's robotics model that developers can tune for their specific uses. Robotics is a unique problem for AI because, not only does the robot exist in the physical world, but it also changes its environment. Whether you're having it move blocks around or tie your shoes, it's hard to predict every eventuality a robot might encounter. The traditional approach of training a robot on action with reinforcement was very slow, but generative AI allows for much greater generalization. "It's drawing from Gemini's multimodal world understanding in order to do a completely new task," explains Carolina Parada. "What that enables is in that same way Gemini can produce text, write poetry, just summarize an article, you can also write code, and you can also generate images. It also can generate robot actions." General robots, no cloud needed In the previous Gemini Robotics release (which is still the "best" version of Google's robotics tech), the platforms ran a hybrid system with a small model on the robot and a larger one running in the cloud. You've probably watched chatbots "think" for measurable seconds as they generate an output, but robots need to react quickly. If you tell the robot to pick up and move an object, you don't want it to pause while each step is generated. The local model allows quick adaptation, while the server-based model can help with complex reasoning tasks. Google DeepMind is now unleashing the local model as a standalone VLA, and it's surprisingly robust. The new Gemini Robotics On-Device model is only a little less accurate than the hybrid version. According to Parada, many tasks will work out of the box. "When we play with the robots, we see that they're surprisingly capable of understanding a new situation," Parada tells Ars. By putting this model out with a full SDK, the team hopes developers will give Gemini-powered robots new tasks and show them new environments, which could reveal actions that don't work with the model's stock tuning. With the SDK, robotics researchers will be able to adapt the VLA to new tasks with as little as 50 to 100 demonstrations. A "demonstration" in AI robotics is a bit different than in other areas of AI research. Parada explains that demonstrations typically involve tele-operating the robot -- controlling the machinery manually to complete a task actually tunes the model to handle that task autonomously. While synthetic data is an element of Google's training, it's not a substitute for the real thing. "We still find that in the most complex, dexterous behaviors, we need real data," says Parada. "But there is quite a lot that you can do with simulation." But those highly complex behaviors may be beyond the capabilities of the on-device VLA. It should have no problem with straightforward actions like tying a shoe (a traditionally difficult task for AI robots) or folding a shirt. If, however, you wanted a robot to make you a sandwich, it would probably need a more powerful model to go through the multi-step reasoning required to get the bread in the right place. The team sees Gemini Robotics On-Device as ideal for environments where connectivity to the cloud is spotty or non-existent. Processing the robot's visual data locally is also better for privacy, for example, in a health care environment. Building safe robots Safety is always a concern with AI systems, be that a chatbot that provides dangerous information or a robot that goes Terminator. We've all seen generative AI chatbots and image generators hallucinate falsehoods in their outputs, and the generative systems powering Gemini Robotics are no different -- the model doesn't get it right every time, but giving the model a physical embodiment with cold, unfeeling metal graspers makes the issue a little more thorny. To ensure robots behave safely, Gemini Robotics uses a multi-layered approach. "With the full Gemini Robotics, you are connecting to a model that is reasoning about what is safe to do, period," says Parada. "And then you have it talk to a VLA that actually produces options, and then that VLA calls a low-level controller, which typically has safety critical components, like how much force you can move or how fast you can move this arm." Importantly, the new on-device model is just a VLA, so developers will be on their own to build in safety. Google suggests they replicate what the Gemini team has done, though. It's recommended that developers in the early tester program connect the system to the standard Gemini Live API, which includes a safety layer. They should also implement a low-level controller for critical safety checks. Anyone interested in testing Gemini Robotics On-Device should apply for access to Google's trusted tester program. Google's Carolina Parada says there have been a lot of robotics breakthroughs in the past three years, and this is just the beginning -- the current release of Gemini Robotics is still based on Gemini 2.0. Parada notes that the Gemini Robotics team typically trails behind Gemini development by one version, and Gemini 2.5 has been cited as a massive improvement in chatbot functionality. Maybe the same will be true of robots.
[2]
Google rolls out new Gemini model that can run on robots locally
Google DeepMind on Tuesday released a new language model called Gemini Robotics On-Device that can run tasks locally on robots without requiring an internet connection. Building on the company's previous Gemini Robotics model that was released in March, Gemini Robotics On-Device can control a robot's movements. Developers can control and fine-tune the model to suit various needs using natural language prompts. In benchmarks, Google claims the model performs at a level close to the cloud-based Gemini Robotics model. The company says it outperforms other on-device models in general benchmarks, though it didn't name those models. In a demo, the company showed robots running this local model doing things like unzipping bags and folding clothes. Google says that while the model was trained for ALOHA robots, it later adapted it to work on a bi-arm Franka FR3 robot and the Apollo humanoid robot by Apptronik. Google claims the bi-arm Franka FR3 was successful in tackling scenarios and objects it hadn't "seen" before, like doing assembly on an industrial belt. Google DeepMind is also releasing a Gemini Robotics SDK. The company said developers can show robots 50 to 100 demonstrations of tasks to train them on new tasks using these models on the MuJoCo physics simulator. Other AI model developers are also dipping their toes in robotics. Nvidia is building a platform to create foundation models for humanoids; Hugging Face is not only developing open models and datasets for robotics, it is actually working on robots too; and Mirae Asset-backed Korean startup RLWRLD is working on creating foundational models for robots.
[3]
Google DeepMind's optimized AI model runs directly on robots
Emma Roth is a news writer who covers the streaming wars, consumer tech, crypto, social media, and much more. Previously, she was a writer and editor at MUO. Google DeepMind is rolling out an on-device version of its Gemini Robotics AI model that allows it to operate without an internet connection. The vision-language-action model (VLA) comes with dexterous capabilities similar to the one released in March, but Google says "it's small and efficient enough to run directly on a robot." The flagship Gemini Robotics model is designed to help robots complete a wide range of physical tasks, even if it hasn't been specifically trained on them. It allows robots to generalize new situations and understand and respond to commands, as well as perform tasks that require fine motor skills. Carolina Parada, head of robotics at Google DeepMind, tells The Verge that the original Gemini Robotics model uses a hybrid approach, allowing it to operate on-device and on the cloud. But with this device-only model, users can access offline features that are almost as good as those of the flagship. The on-device model can perform several different tasks out of the box, and it can adapt to new situations "with as few as 50 to 100 demonstrations," according to Parada. Google only trained the model on its ALOHA robot, but the company was able to adapt it to different robot types, such as the humanoid Apollo robot from Apptronik and the bi-arm Franka FR3 robot. "The Gemini Robotics hybrid model is still more powerful, but we're actually quite surprised at how strong this on-device model is," Parada says. "I would think about it as a starter model or as a model for applications that just have poor connectivity." It could also be useful for companies with strict security requirements. Alongside this launch, Google is releasing a software development kit (SDK) for the on-device model that developers can use to evaluate and fine-tune it -- a first for one of Google DeepMind's VLAs. The on-device Gemini Robotics model and its SDK will be available to a group of trusted testers while Google continues to work toward minimizing safety risks.
[4]
New Gemini AI lets humanoid robots think and act without internet
Humanoid robot prepares to interact with objects using Google's offline Gemini AI. Google DeepMind has launched a powerful on-device version of its Gemini Robotics AI model. The new system can control physical robots without relying on cloud connectivity. It marks a major step in deploying fast, adaptive, and general-purpose robotics in real-world environments. The model, known as 'Gemini Robotics On-Device,' brings Gemini 2.0's multimodal reasoning into robots with no internet required. It's designed for latency-sensitive use cases and environments with poor or no connectivity.
[5]
Google's new Gemini AI model can run robots locally without internet, here's how
Prioritizes semantic and physical safety, guided by Google's internal standards. Google has formally unveiled a new iteration of Gemini AI that can function solely on robotic hardware and doesn't require an internet connection. The model, called Gemini Robotics On-Device, provides task generalisation and fine tuning with minimal data, and it gives bi-arm robots local, low-latency control. The Gemini Robotics On Device handles language, action, and vision inputs on the device itself, in contrast to cloud-dependent models. This might be helpful in places like manufacturing floors or remote settings where latency needs to be kept to a minimum or connectivity is restricted. Google claims that the model can follow natural language instructions and learn from just 50 to 100 demonstrations to accomplish tasks like zipping bags, folding clothes, and pouring liquids. Also read: OpenAI and Jony Ive's first AI device might not be wearable, court documents reveal Specifically designed as a lightweight extension of the Gemini 2.0 architecture, the on-device version preserves multi-step reasoning and dexterous control while optimising for smaller compute footprints. Having been successfully tested on robots other than its initial training setup, such as the Apptronik Apollo humanoid and the bi-arm Franka FR3, it also facilitates rapid adaptation to new tasks or robotic forms. Through a reliable testing program, Google is also providing a Gemini Robotics SDK to a select group of developers. Users can test the model in MuJoCo physics simulations and adjust it for particular tasks using this SDK. Access is still restricted, though, while the business assesses its effectiveness and security in actual environments. Google claims that with extra safety precautions in place, the development is consistent with its internal AI Principles. These consist of benchmarking for semantic integrity and low-level controllers for physical safety. Under the direction of the company's Responsibility & Safety Council, the system is being assessed using a new semantic safety benchmark. Google is indicating a move towards more autonomous and locally adaptable robotics systems with the on-device model, which could have repercussions for logistics, industrial automation, and other areas.
Share
Copy Link
Google DeepMind has released a new on-device AI model for robotics that can operate without cloud connectivity, marking a significant advancement in autonomous robot control and adaptability.
Google DeepMind has unveiled a groundbreaking advancement in artificial intelligence for robotics with the release of Gemini Robotics On-Device, a new AI model capable of running directly on robotic hardware without requiring an internet connection 12. This development marks a significant step towards creating more autonomous and adaptable robots for various applications.
The Gemini Robotics On-Device model is a vision-language-action (VLA) system that builds upon the previously released Gemini Robotics model. It offers several notable features:
Local Processing: Unlike its predecessor, which used a hybrid approach combining on-device and cloud-based processing, the new model operates entirely on the robot itself 3.
Offline Functionality: The model enables robots to function in environments with poor or no internet connectivity, making it suitable for use in remote locations or areas with strict security requirements 4.
Rapid Adaptation: According to Carolina Parada, head of robotics at Google DeepMind, the model can adapt to new tasks with as few as 50 to 100 demonstrations 23.
Versatility: Initially trained on Google's ALOHA robot, the model has been successfully adapted to other robot types, including the humanoid Apollo robot from Apptronik and the bi-arm Franka FR3 robot 3.
Source: Digit
Google claims that the on-device model performs at a level close to the cloud-based Gemini Robotics model, outperforming other on-device models in general benchmarks 2. Demonstrations have shown robots running this local model performing tasks such as:
The model's ability to generalize and handle new situations makes it particularly promising for applications in manufacturing, logistics, and industrial automation 5.
To facilitate further development and customization, Google is releasing a Gemini Robotics SDK. This toolkit allows developers to evaluate and fine-tune the model for specific use cases 3. The company is also prioritizing safety in the deployment of this technology:
Multi-layered Approach: The full Gemini Robotics system incorporates reasoning about safe actions, option generation, and low-level controllers for critical safety components 1.
Safety Recommendations: For the on-device model, Google suggests that developers implement safety measures similar to those in the full system, including connecting to the Gemini Live API for an additional safety layer 1.
Semantic Safety Benchmark: The system is being evaluated using a new semantic safety benchmark under the guidance of Google's Responsibility & Safety Council 5.
Source: Interesting Engineering
The release of Gemini Robotics On-Device represents a significant advancement in the field of AI-powered robotics. As the technology continues to evolve, it could have far-reaching implications for various industries:
Manufacturing and Logistics: The model's ability to adapt quickly to new tasks and environments could revolutionize production lines and warehouse operations 5.
Healthcare: Local processing of visual data enhances privacy, making the technology more suitable for sensitive environments like hospitals 1.
Remote Operations: The offline functionality opens up possibilities for robotic applications in areas with limited connectivity, such as disaster response or space exploration 4.
Source: The Verge
As AI continues to advance in the robotics field, other companies are also making strides. Nvidia is developing foundation models for humanoids, while startups like Hugging Face and RLWRLD are working on open models and datasets for robotics 2.
With the Gemini Robotics On-Device model and SDK currently available to a group of trusted testers, the broader impact of this technology on the robotics industry and various sectors remains to be seen as development and safety assessments continue 3.
Summarized by
Navi
[4]
A federal judge rules that AI companies can train models on legally acquired books without author permission, marking a significant victory for AI firms. However, the use of pirated materials remains contentious and subject to further legal scrutiny.
34 Sources
Policy and Regulation
8 hrs ago
34 Sources
Policy and Regulation
8 hrs ago
The UK's Competition and Markets Authority (CMA) is considering designating Google with "strategic market status," which could lead to new regulations on its search engine operations, including fair ranking measures and increased publisher control over content use in AI-generated results.
22 Sources
Policy and Regulation
16 hrs ago
22 Sources
Policy and Regulation
16 hrs ago
OpenAI is developing collaboration features for ChatGPT, potentially rivaling Google Docs and Microsoft Word, as it aims to transform the AI chatbot into a comprehensive productivity tool.
3 Sources
Technology
8 hrs ago
3 Sources
Technology
8 hrs ago
Google has donated its Agent2Agent (A2A) protocol to the Linux Foundation, aiming to establish open standards for AI agent interoperability across platforms and vendors.
4 Sources
Technology
16 hrs ago
4 Sources
Technology
16 hrs ago
Amazon is building a colossal AI-focused data center complex in Indiana, part of its Project Rainier initiative, to power AI startup Anthropic. This marks a new era of supersized data centers for AI computing.
2 Sources
Technology
8 hrs ago
2 Sources
Technology
8 hrs ago