Google's robots can now think, search the web, and learn new skills autonomously
Google DeepMind has unveiled two new AI models this week, aimed at making robots smarter than ever before. The updated Gemini Robotics 1.5 and its companion Gemini Robotics-ER 1.5 shift focus from merely executing commands to enabling robots to think through problems, search for information online, and share skills across different robotic agents.
According to Google, these models represent “a foundational step toward mastering the complexities of the physical world with intelligence and adaptability.”
“Gemini Robotics 1.5 marks a key milestone in achieving AGI within the physical domain,” Google stated in its announcement. “By introducing agent capabilities, we’ve moved beyond reactive models to develop systems capable of true reasoning, planning, proactive tool use, and generalization.”
The term “generalization” is particularly significant because this is where previous models faced challenges.
Robots powered by these new models can now perform tasks such as sorting laundry by color, packing a suitcase based on weather forecasts retrieved online, or checking local recycling rules to dispose of waste properly. As a human, you might ask, “What’s the big deal?” But for machines, this requires a skill known as generalization — the ability to apply learned knowledge to new situations.
Robots — and algorithms in general — have historically struggled with this. For instance, if a model is taught to fold a pair of pants, it would be unable to fold a T-shirt unless engineers pre-program every step.
These new models change the game. They can interpret cues, perceive their surroundings, make reasonable assumptions, and carry out complex, multi-step tasks that were previously unachievable — or at least extremely difficult for machines.
But better doesn’t mean perfect. In one experiment, the team showed the robot a set of items and asked it to place them in the correct trash bin. The robot used its camera to visually identify each item, looked up the latest recycling guidelines for San Francisco online, and then placed the items correctly — much like a local would.
This process combined online search, visual perception, and step-by-step planning to make context-aware decisions beyond the capabilities of older robots. The success rate ranged from 20% to 40% — not ideal, but still surprisingly good for a model that previously couldn’t grasp such nuances.
### How Google Turns Robots Into Super Robots
The two models work in tandem. Gemini Robotics-ER 1.5 functions like a brain, figuring out what needs to happen and creating a step-by-step plan. When it needs information, it can perform a Google search. Once the plan is ready, it sends natural language instructions to Gemini Robotics 1.5, which handles the actual physical actions.
More technically, the new Gemini Robotics 1.5 is a vision-language-action (VLA) model that converts visual data and instructions into movement commands. Meanwhile, Gemini Robotics-ER 1.5 is a vision-language model (VLM) that creates multi-step plans to complete tasks.
For example, when a robot sorts laundry, it internally reasons through a series of thoughts: understanding that “sort by color” means placing whites in one bin and colors in another, then breaking down the specific motions needed to pick up each item. The robot can explain its reasoning in simple English, making its decision-making process less of a black box.
Google CEO Sundar Pichai shared his thoughts on X, noting that the new models will enable robots to reason better, plan ahead, use digital tools like search, and transfer learning across different types of robots. He called it “an important step toward truly useful general-purpose robots.”
> The new Gemini Robotics 1.5 models will allow robots to reason more effectively, plan ahead, utilize digital tools like search, and share learning across different robot types. We’re taking a significant step forward toward truly useful general-purpose robots — you can now see how robots reason… pic.twitter.com/kw3HtbF6Dd
>
> — Sundar Pichai (@sundarpichai) September 25, 2025
This release puts Google in the spotlight alongside companies like Tesla, Figure AI, and Boston Dynamics — though each takes a different approach. Tesla is focused on mass production for factories, with Elon Musk promising thousands of units by 2026. Boston Dynamics continues to push the boundaries of robotic mobility, with its Atlas robot performing backflips. Meanwhile, Google is betting on AI that allows robots to adapt to any situation without needing specific programming.
Timing is crucial. U.S. robotics firms are pushing for a national robotics strategy at a time when China has made AI and intelligent robotics a national priority. According to the International Federation of Robotics, based in Germany, China is the world’s largest market for robots used in factories and other industrial environments, with around 1.8 million units in operation by 2023.
DeepMind’s approach diverges from traditional robotic programming, which requires engineers to carefully code every movement. Instead, these models learn from demonstrations and can adapt on the fly. If an object slips from the robot’s grasp or someone moves something mid-task, the robot adjusts seamlessly.
These models build on DeepMind’s earlier work, which earlier this year limited robots to handling single tasks, such as unzipping a bag or folding paper. Now, they’re tackling complex sequences that even humans find challenging — such as packing a suitcase appropriately after checking a weather forecast.
For developers looking to experiment, there’s a split availability strategy. Gemini Robotics-ER 1.5 became available Thursday via the Gemini API on Google AI Studio, meaning any developer can start building with this reasoning model. However, the action model, Gemini Robotics 1.5, remains limited to “select” (likely well-funded) partners.