6 Sources
[1]
Gemini 3.5 Flash can now see your screen, use your computer, take actions -- all on its own
It's currently available to developers and enterprise customers via the Gemini API and the Gemini Enterprise Agent Platform. Google has been adding new features to Gemini, integrating it with workspace apps like Drive, and basically making the AI more useful for consumers. However, the company has also been working on new enterprise and developer-oriented features, including making it easier for developers to create AI agents that can reason, navigate, and take action across environments. The company announced in a blog post that computer use is now available as a built-in tool in Gemini 3.5 Flash. Previously, developing custom AI agents required a dedicated Gemini 2.5 computer use model, but that's no longer necessary. The new model is available to developers using the Gemini API or via the Gemini Enterprise Agent Platform. To demonstrate its capabilities, Google has created a Browserbase instance where users can prompt the model to perform a task. Gemini 3.5 Flash then navigates through the browser, takes action on its own, and comes back with results. I asked it to find the cheapest flights from New Delhi to Tokyo, and it went to three separate flight booking websites, entered the departure and return dates, searched through the available tickets, and gave me the best options. You can also ask it to play 2048 and watch Gemini decide how to move and merge the tiles to get the highest possible score. Naturally, the ability to control your computer and perform tasks on its own also raises questions around safety, especially for enterprise consumers. To mitigate those risks, Google has used targeted adversarial training for the model. It is also introducing two new safeguards built into computer use with Gemini 3.5 Flash: the model can be configured to require explicit user confirmation before performing sensitive or irreversible actions, and it can also automatically stop tasks if it detects a prompt-injection attack. The company also recommends that developers combine these safeguards with secure sandboxes, strict access controls, and "human-in-the-loop" verification. Computer use with Gemini 3.5 Flash is available today.
[2]
Gemini 3.5 Flash can now see and control your screen, and Google wants enterprises to trust it
Computer use is now a built-in tool in Gemini 3.5 Flash, replacing the standalone Gemini 2.5 computer use model with enterprise safeguards. Google has made computer use a built-in tool inside Gemini 3.5 Flash, the model it launched at I/O 2026 as its fastest agentic AI model. The capability, which lets AI agents see screens, click, type, and scroll across browsers, mobile devices, and desktops, previously required a separate standalone model and is now available as a native tool through the Gemini API and the Gemini Enterprise Agent Platform, the renamed version of Vertex AI. The update means developers no longer need to call a dedicated computer use model to build agents that interact with graphical interfaces. Instead, they can activate computer use as one of several tools within Flash, alongside code execution, search, and function calling. Product manager Mateo Quiros described the integration as giving Flash the ability to see, reason about, and take action on screens. Google first released a standalone Gemini computer use model in October 2025, designed specifically for browser-based agent workflows. That model achieved roughly 70 percent accuracy on the Online-Mind2Web benchmark and was built around a screenshot-action loop where developers fed it a screen capture, received a structured command, executed it, and sent back the updated view. Folding the capability into Flash consolidates what was a two-model workflow into one. The enterprise pitch centres on automation that goes beyond chatbots. Google says the tool enables continuous software testing, where agents navigate applications and verify functionality without human testers stepping through each screen. Knowledge workers could use agents to complete multi-step browser tasks, fill forms, extract data from dashboards, or navigate internal tools. The safety architecture is where Google is drawing the sharpest lines. The company says it applied targeted adversarial training specifically for prompt injection, the attack where malicious instructions embedded in a webpage or document trick an AI agent into performing unintended actions. The threat is not theoretical, as researchers have repeatedly demonstrated that AI agents can be manipulated through content they encounter while carrying out tasks. Google is offering two optional enterprise safeguards on top of the base model. The first requires explicit user confirmation before the agent executes any action flagged as sensitive or irreversible, such as submitting a form, making a purchase, or deleting data. The second automatically halts the agent if it detects an indirect prompt injection attempt, stopping execution rather than risking a compromised action. Both safeguards are opt-in, not defaults. Google recommends a "defense-in-depth" approach where developers layer multiple protections rather than relying on any single mechanism. The company's documentation acknowledges that no individual safeguard is sufficient on its own, a candid framing that contrasts with the more confident marketing language around other AI capabilities. The competitive landscape has shifted considerably since Anthropic pioneered the category. Anthropic's Claude Computer Use works across operating systems and can interact with file systems, not just browsers, making it more versatile for desktop workflows. Google's own Chrome Enterprise already added agentic browsing features earlier this year, including Auto Browse for autonomous multi-step tasks. The new Flash integration extends that philosophy beyond Chrome to any screen an agent can see. OpenAI has also entered the space, and the three companies are now competing on different axes. The question for enterprise buyers is less about which model can click a button and more about which one can do it safely inside a regulated environment. Google has not published updated benchmark scores for computer use as a built-in Flash tool versus the previous standalone model. The company has not disclosed how many enterprises are using the capability or provided case studies with named customers. The claims about targeted adversarial training for prompt injection are described in the blog post but not backed by published research or red-team results. The Gemini Enterprise Agent Platform, where the tool is available, uses pay-as-you-go pricing. Flash is one of the cheaper models in Google's lineup, which could make computer use more accessible for large-scale automation than running it through a heavier model. Whether the cost advantage holds depends on how many actions a typical agent workflow requires and how often the safety guardrails interrupt execution to request confirmation. Computer use in AI is still early. The models can navigate familiar interfaces but struggle with unexpected pop-ups, CAPTCHAs, dynamically loaded content, and layouts they have not seen before. Google's decision to make it a built-in tool rather than a standalone model signals confidence that the capability is mature enough for general availability, but the opt-in safety guardrails signal equal awareness that it is not yet mature enough to run unsupervised.
[3]
Gemini 3.5 Flash gets native computer use for AI agents
Google has announced that computer use is now a built-in tool in Gemini 3.5 Flash, enabling developers to build AI agents that can interact across platforms, including browser, desktop, and mobile environments. Previously available only as a standalone Gemini 2.5 computer use model, the capability is now integrated directly into Gemini 3.5 Flash. According to Mateo Quiros, Product Manager at Google DeepMind, the integration delivers Google's best performance yet for agentic computer use tasks. Gemini 3.5 Flash gets native computer use Gemini already supports function calling and built-in tools such as Search and Maps grounding. With computer use now built into the main Gemini Flash model, developers can use Gemini 3.5 Flash to reliably build custom agents that can see, reason, and take action across browser, mobile, and desktop environments. Google said the capability improves performance for long-horizon and enterprise automation tasks, including continuous software testing and knowledge work across professional applications. Safety measures for computer use To help mitigate prompt injection risks for agents operating in live environments, Google uses targeted adversarial training for computer use in Gemini 3.5 Flash. The company is also introducing two optional enterprise safeguard systems that allow organizations to: * Require explicit user confirmation before sensitive or irreversible actions. * Automatically stop tasks if an indirect prompt injection attempt is detected. Google said enterprises should take a defense-in-depth approach by combining these safeguards with secure sandboxing, human-in-the-loop verification, and strict access controls. Additional information about safety measures is available through the company's best-practices documentation. Availability Developers and enterprises can start using computer use in Gemini 3.5 Flash through the Gemini API and Gemini Enterprise Agent Platform. Google is also providing a Browserbase-hosted demo environment for testing the capability. Developers can additionally access reference implementations and documentation to begin building computer-use agents.
[4]
Google Gives Gemini 3.5 Flash Computer Control Skills, AI Agents Can Now Click Buttons and Fill Forms
The update positions Gemini beyond a simple chat and text-generation platform. Instead of only answering questions, AI agents can now click buttons, fill forms, and complete complex tasks. The feature arrives at a time when major AI companies are racing to build assistants that can do more than mere conversations. Google has announced the integration of Computer Use within Flash on June 24. Previously, this model was only available as a standalone model built on Gemini 2.5, but now users can access the feature via the Gemini API and the Gemini Enterprise Agent Platform. The official claims that, with the built-in Computer Use, 'developers can now use 3.5 Flash to reliably build custom agents that can see, reason, and take action across browser, mobile and desktop environments.' With the new Computer Use feature, can understand what is happening on a screen and respond to it. The AI will take time to consider before clicking on an option, like a human being, rather than relying on rigid, pre-coded prompts. This is a major shift for large language models. Until recently, most AI tools worked mainly through text. They could explain how to complete a task, but could not perform it themselves. This seems to be the beginning of a new era. AI agents built on Gemini can now help with repetitive office work, software testing, scheduling, data entry, and many other routine jobs. For businesses, it can even save time and reduce manual effort.
[5]
Google adds computer use capability to Gemini 3.5 Flash model By Investing.com
Investing.com - Google announced the integration of computer use capability into Gemini 3.5 Flash, expanding functionality previously available only in a standalone Gemini 2.5 computer use model. The feature is now built into the main Gemini Flash model. The computer use tool allows developers to build agents that can see, reason and take action across browser, mobile and desktop environments. Google said the capability improves performance for long-horizon and enterprise automation tasks including continuous software testing and knowledge work across professional applications. Google implemented targeted adversarial training to address prompt injection risks for agents operating in live environments. The company released two optional enterprise safeguard systems that require explicit user confirmation for sensitive actions and automatically stop tasks if an indirect prompt injection is identified. The company recommends developers combine these features with secure sandboxing, human-in-the-loop verification and strict access controls. Additional safety measures are detailed in Google's best practices documentation. Developers and enterprises can access computer use in Gemini 3.5 Flash through the Gemini API and Gemini Enterprise Agent Platform. Google also provides a demo environment hosted by Browserbase for testing the capabilities. This article was generated with the support of AI and reviewed by an editor. For more information see our T&C.
[6]
After Anthropic, Google now lets you build AI agents that control your computer with Gemini 3.5 Flash: Here is how
Google has added adversarial training and two enterprise safety systems Google has built Computer Use directly into Gemini 3.5 Flash, following a similar capability Anthropic introduced for Claude. Computer Use in Gemini 3.5 Flash is available now through the Gemini API and the Gemini Enterprise Agent Platform. The feature lets AI agents interact with software the way a person does: by looking at a screen and deciding where to click, what to type and when to scroll, rather than relying on rigid, pre-coded integrations with each individual app. Previously, this capability existed only as a separate, standalone model built on Gemini 2.5. With this update, it is now part of the main Gemini 3.5 Flash model itself and is available to developers through the Gemini API and the Gemini Enterprise Agent Platform. How it works The practical use case of computer use in Gemini and Claude is automating tasks that would otherwise require a person sitting at a desktop, navigating websites, filling out forms, clicking through multi-step workflows or pulling data from systems that don't offer a clean API to plug into. Because the model works visually, agents built on it can operate across browser, mobile and desktop environments without needing custom code written for each one. Google is positioning this primarily for long-running enterprise tasks, such as continuous software testing or repetitive knowledge work across professional applications. To help developers get started quickly, Google has set up a live demo environment hosted by Browserbase where the capability can be tested directly. Alongside the developer-facing update, Chrome 149 introduces a feature called Select from screen, found in the browser's attachment menu. Clicking it highlights the active tab and lets you drag a selection box over any image or block of text on the page, dropping it straight into a Gemini prompt. It's a small convenience, but it removes the friction of switching tabs or taking a screenshot just to ask Gemini a question about something you're looking at. How safe is Computer Use Letting an AI agent control a mouse and keyboard raises an immediate question: what happens if it lands on a malicious page with hidden instructions designed to hijack its behaviour, a risk known as indirect prompt injection. Google says it used targeted adversarial training specifically to harden Gemini 3.5 Flash's Computer Use capability against this kind of attack. On top of that, it has introduced two optional enterprise safeguards: one that requires explicit human confirmation before the agent takes any sensitive or irreversible action and another that automatically halts a task the moment it detects a prompt injection attempt. Google is also recommending developers pair these features with sandboxing, human-in-the-loop verification and strict access controls rather than relying on the model's built-in protections alone.
Share
Copy Link
Google has integrated computer use capability directly into Gemini 3.5 Flash, allowing AI agents to see screens, navigate browsers, and perform tasks autonomously. Previously requiring a standalone Gemini 2.5 model, the feature now works through the Gemini API and Enterprise Agent Platform with new enterprise safeguards against prompt injection attacks.
Google has announced that computer use capability is now a built-in tool in Gemini 3.5 Flash, marking a shift from requiring a dedicated standalone model to having the functionality integrated directly into its fastest agentic model
1
. The feature enables developers to build AI agents that can see, reason, and take action across browser, mobile, and desktop environments without calling a separate computer use model2
. Previously, this capability was only available through a standalone Gemini 2.5 computer use model that Google first released in October 2025, which achieved roughly 70 percent accuracy on the Online-Mind2Web benchmark2
.
Source: Analytics Insight
The integration means developers can now activate native computer use as one of several tools within Gemini 3.5 Flash, alongside code execution, search, and function calling
2
. According to Mateo Quiros, Product Manager at Google DeepMind, this delivers Google's best performance yet for agentic computer use tasks3
. The feature is currently available to developers and enterprise customers via the Gemini API and the Gemini Enterprise Agent Platform1
.This update positions Gemini beyond a simple chat and text-generation platform. AI agents can now click buttons, fill forms, and complete complex tasks by understanding what is happening on a screen and responding to it
4
. The AI agents built on Gemini 3.5 Flash can autonomously navigate and perform tasks such as finding flight deals, playing games, or extracting data from dashboards1
. To demonstrate its capabilities, Google created a Browserbase instance where users can prompt the model to perform tasks—one example showed the model visiting three separate flight booking websites, entering dates, searching through tickets, and returning the best options1
.The ability to see and control your screen opens possibilities for enterprise automation tasks including continuous software testing, where agents navigate applications and verify functionality without human testers stepping through each screen
2
. Knowledge workers could deploy agents to complete multi-step browser tasks, fill forms, perform data extraction from dashboards, or navigate internal tools2
. For businesses, this can save time and reduce manual effort on repetitive office work, scheduling, and data entry4
.The ability for AI agents to control computers and perform tasks autonomously raises questions around safety, especially for enterprise consumers. Google has implemented targeted adversarial training specifically for prompt injection—the attack where malicious instructions embedded in a webpage or document trick an AI agent into performing unintended actions
2
5
. To mitigate these risks, Google is introducing two optional enterprise safeguards built into computer use with Gemini 3.5 Flash1
.
Source: Digit
The first safeguard requires explicit user confirmation before the agent executes any action flagged as sensitive or irreversible, such as submitting a form, making a purchase, or deleting data
2
. The second automatically halts the agent if it detects an indirect prompt injection attempt, stopping execution rather than risking a compromised action2
. Both safeguards are opt-in rather than defaults, and Google recommends a defense-in-depth approach where developers layer multiple protections2
. The company specifically advises combining these safeguards with secure sandboxing, human-in-the-loop verification, and strict access controls3
.Related Stories
The competitive landscape has shifted considerably since Anthropic pioneered the category with Claude Computer Use, which works across operating systems and can interact with file systems beyond just browsers
2
. OpenAI has also entered the space, and the three companies are now competing on different dimensions. The question for enterprise buyers centers less on which model can click a button and more on which one can do it safely inside regulated environments2
.
Source: The Next Web
Google has not published updated benchmark scores for computer use as a built-in Flash tool versus the previous standalone model, nor disclosed how many enterprises are using the capability or provided case studies with named customers
2
. The Gemini Enterprise Agent Platform uses pay-as-you-go pricing, and Flash is one of the cheaper models in Google's lineup, which could make computer use more accessible for large-scale automation2
. Developers and enterprises can start using the feature through a Browserbase-hosted demo environment for testing, along with reference implementations and documentation3
. The models can navigate familiar interfaces but still struggle with unexpected pop-ups, CAPTCHAs, dynamically loaded content, and unfamiliar layouts2
.Summarized by
Navi
[1]
[2]
[4]
1
Policy and Regulation

2
Policy and Regulation

3
Policy and Regulation
