Google Unveils Gemini 2.5 Computer Use: AI That Navigates the Web Like a Human

Reviewed byNidhi Govil

4 Sources

Share

Google introduces a new AI model capable of interacting with web interfaces, outperforming competitors in various benchmarks. This development marks a significant step towards more versatile and autonomous AI agents.

News article

Google Introduces Gemini 2.5 Computer Use Model

Google has unveiled its latest artificial intelligence breakthrough, the Gemini 2.5 Computer Use model, a specialized AI system capable of interacting with graphical user interfaces (GUIs) in a manner similar to human users

1

3

. This development marks a significant step towards creating more versatile and autonomous AI agents that can navigate web browsers and mobile applications.

Capabilities and Functionality

The Gemini 2.5 Computer Use model is built on the foundation of Gemini 2.5 Pro's visual understanding and reasoning capabilities

3

. It can perform a wide range of actions within web interfaces, including:

  • Clicking buttons and links
  • Typing text
  • Scrolling web pages
  • Manipulating dropdown menus
  • Filling out and submitting forms
  • Navigating between pages
  • Performing web searches

These capabilities allow the model to complete complex tasks autonomously, such as booking appointments, organizing information, and interacting with various web applications

1

.

Performance and Benchmarks

Google claims that Gemini 2.5 Computer Use outperforms leading alternatives on multiple web and mobile control benchmarks

2

. Some notable results include:

  • Online-Mind2Web: 65.7% for Gemini 2.5 vs. 61.0% (Claude Sonnet 4) and 44.3% (OpenAI Agent)
  • WebVoyager: 79.9% for Gemini 2.5 vs. 69.4% (Claude Sonnet 4) and 61.0% (OpenAI Agent)
  • AndroidWorld: 69.7% for Gemini 2.5 vs. 62.1% (Claude Sonnet 4)

The model also demonstrates strong performance in mobile UI control tasks, despite being primarily optimized for web browsers

1

.

Availability and Integration

Developers can access Gemini 2.5 Computer Use through the Gemini API in Google AI Studio and Vertex AI

3

. The model is exposed through a new 'computer_use' tool in the API, which operates within a loop, taking inputs such as user requests, screenshots of the environment, and action history

3

.

Limitations and Future Developments

While Gemini 2.5 Computer Use represents a significant advancement, it does have some limitations:

  • It's currently optimized for web browsers and not yet fully optimized for desktop OS-level control

    4

    .
  • The model supports 13 specific actions, which is less comprehensive than some competitors' offerings

    4

    .
  • Unlike some rival products, it cannot create or edit local files directly

    2

    .

Implications and Applications

The introduction of Gemini 2.5 Computer Use opens up new possibilities for AI-driven task automation and assistance. Potential applications include:

  • Streamlining customer service processes
  • Automating data entry and form filling
  • Enhancing accessibility for users with disabilities
  • Accelerating software testing and development

    1

As AI continues to evolve, models like Gemini 2.5 Computer Use are likely to play an increasingly important role in bridging the gap between human-computer interaction and AI capabilities.

TheOutpost.ai

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

© 2025 Triveous Technologies Private Limited
Instagram logo
LinkedIn logo