5 Sources
[1]
OpenAI upgrades the AI model powering its Operator agent
OpenAI is updating the AI model powering Operator, its AI agent that can autonomously browse the web and use certain software within a cloud-hosted virtual machine to fulfill users' requests. Soon, Operator will use a model based on o3, one of the latest in OpenAI's o series of "reasoning" models. Previously, Operator relied on a custom version of GPT-4o. By many benchmarks, o3 is a far more advanced model, particularly on tasks involving math and reasoning. "We are replacing the existing GPT‑4o-based model for Operator with a version based on OpenAI o3," OpenAI wrote in a blog post. "The API version [of Operator] will remain based on 4o." Operator is one among many agentic tools released by AI companies in recent months. Companies are racing to make highly sophisticated agents that can reliably carry out chores more or less without supervision. Google offers a "computer use" agent through its Gemini API that can similarly browse the web and take actions on behalf of users, as well as a more consumer-focused offering called Mariner. Anthropic's models are also able to perform computer tasks, including opening files and navigating webpages. According to OpenAI, the new Operator model, called o3 Operator, was "fine-tuned with additional safety data for computer use," including data sets designed to "teach the model [OpenAI's] decision boundaries on confirmations and refusals." OpenAI has released a technical report showing o3 Operator's performance on specific safety evaluations. Compared to the GPT-4o Operator model, o3 Operator is less likely to refuse to perform "illicit" activities and search for sensitive personal data, and less susceptible to a form of AI attack known as prompt injection, per the technical report. "o3 Operator uses the same multi-layered approach to safety that we used for the 4o version of Operator," OpenAI wrote in its blog post. "Although o3 Operator inherits o3's coding capabilities, it does not have native access to a coding environment or terminal."
[2]
OpenAI confirms Operator Agent is now more accurate with o3
OpenAI says Operator Agent now uses the o3 model, which means it's now significantly better at reasoning capabilities. For those unaware, Operator, which is still in research preview, allows users to delegate web tasks to the AI agent. You can use it to automate repetitive tasks, such as filling forms and ordering products. Operator isn't exactly new and has been around since January 2025, but only those who pay for a $200 Pro and Enterprise subscription have access to it. I recently reported that OpenAI is planning to improve Operator and turn it into a "very useful tool." While we are yet to see any meaningful improvements to the UX, OpenAI confirmed that it has updated Operator to o3. Previously, the web Agent used 4o, a non-reasoning models to automate tasks. With o3, Operator is more persistent and accurate when interacting with the browser. "Now Operator is more persistent and accurate when interacting with the browser, improving overall task success rate. Responses are also more clear, thorough and better structured," the company noted. "Operator continues to be available as a research preview to ChatGPT Pro users globally. This upgraded model is only available in Operator in ChatGPT." However, the question remains: would you pay $200 to automate grocery shopping and similar tasks? Most people won't, and that's why OpenAI is internally testing new ways to make the tool more useful. While Operator remains exclusive to the $200 Pro subscription, OpenAI has plans to release the feature for $20 Plus subscribers in the future.
[3]
OpenAI Operator is getting bigger brains to control the AI agent's virtual hands
The enhanced model may encourage more people to sign up for ChatGPT Pro OpenAI boasted at the beginning of the year that its Operator was a brilliant AI agent capable of handling everything for you online. If you're paying $200 for ChatGPT Pro, it could fill out forms, order products, and otherwise use its virtual mouse and keyboard on your behalf. Now, it's given the AI agent a brain upgrade by embedding the powerful o3 model in the system, replacing the previous customized GPT-4o model running the service. OpenAI o3 Operator continues the mission of Operator to interact with the web on your behalf. It can still book a hotel room, order groceries, and even theoretically pay off that parking ticket. But GPT-4o didn't bring the reasoning power to the service that OpenAI wanted, so the company turned to its o3 "reasoning model." The o3 model is supposed to be smarter in a more intellectual way than GPT-4o. It can be more focused and is better at step-by-step thinking. According to OpenAI, employing the o3 model makes Operator "more persistent and accurate when interacting with the browser." The persistence will help it work through the unexpected obstacles of web browsing, like login requests, pop-ups, and CAPTCHA requests. OpenAI also said that the responses and how the model follows through on long, complex tasks will be noticeably improved. For most ChatGPT users, this may not matter. Unless you're ponying up for ChatGPT Pro, Operator is not available. Until then, running a business or managing someone else's life might make it worthwhile. Otherwise, you might as well stick to the old-fashioned manual approach for menial web tasks. Delegating them to an AI assistant is an appealing idea, but hardly worth the cost if you're not going to be using it heavily most days. It would be like hiring a personal chef to butter your toast. Still, the move to o3 is undeniably a big step forward for OpenAI's vision of useful AI agents. Of course, OpenAI's Operator isn't unique in providing an AI agent at this point. You can find a lot of examples of varying quality, including Browser Use, Proxy 1.0, Hugging Face's HuggingAgent, and Opera's Browser Operator. OpenAI has hinted that it might change the price point for access to Operator, but for now, you have to want to pay ten times as much as ChatGPT Plus costs to get to Operator. Opening it up to Plus subscribers would be huge in terms of giving access to millions more ChatGPT users, but again, you run into the necessity versus convenience question. Filling in my own forms isn't worth $200 but if it's built into the rest of ChatGPT's features, I might just start letting the AI handle it for me.
[4]
OpenAI updates Operator to o3, making its $200 monthly ChatGPT Pro subscription more enticing
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More It was a big week for AI announcements following events from Microsoft, Google, and Anthropic. But OpenAI is finishing things out with news of its own. And no, we're not just talking about its acquisition of Jony Ive's design team to lead a new hardware effort, "io" at OpenAI. Today, the company upgraded its Operator autonomous web browsing and cursor controlling agent within ChatGPT from using the prior GPT-4o multimodal large language model to the newer and more powerful o3 reasoning model. The update, released globally today, May 23, 2025, is available as a "research preview" to paying subscribers of OpenAI's $200 USD-monthly ChatGPT Pro plan. Basically, that is OpenAI's way of saying it's not a fully "sanded down" or perfected product yet -- it may still have kinks and issues. But with rival Google offering its own top tier AI subscription bundle for a price of nearly $250 USD regularly (currently running a discount down to $125 for the first three months) to access its latest Gemini multimodal, Imagen image generation, and Veo video generation models, suddenly OpenAI's ChatGPT Pro plan seems more affordable by comparison. What is OpenAI's Operator and what is it for? Operator first debuted in January 2025 as OpenAI's initial step into semi-autonomous agents, specifically Computer Using Agents (CUAs). The idea is to go beyond the chatbot interface of ChatGPT and allow OpenAI's powerful AI models to start taking more actions on behalf of the user. Thus, Operator was designed to autonomously point, click, scroll, and type to complete web-based tasks such as booking dinner reservations, compiling shopping lists, or ordering event tickets. This agentic capability allows it to complete user tasks directly through a browser interface, from booking reservations to gathering online data. For safety, privacy and security purposes, Operator didn't use any existing web browser on a user's PC or Mac. Instead, it ran in a cloud-hosted virtual browser accessible via a standalone site -- operator.chatgpt.com -- where users could input requests and observe the agent perform tasks in real time. It combined vision, reasoning, and interaction capabilities based on GPT-4o, marking a new direction for OpenAI in agentic AI. The product was launched as a research preview for ChatGPT Pro subscribers and featured built-in safety measures like user confirmations, Watch Mode, and restrictions on high-risk web platforms. It was also being tested in enterprise contexts, including travel planning and civic services, demonstrating its potential across both consumer and business environments. o3 offers improved accuracy, structure, and success rates With this update, OpenAI aims to enhance performance across several key dimensions. The new o3-based Operator demonstrates improved persistence and accuracy during browser interactions. In practical terms, this means it is more likely to complete user tasks successfully and with less need for correction or repetition. Moreover, users can expect responses that are clearer, more structured, and more comprehensive. In comparative evaluations, the new model shows a distinct preference advantage over its predecessor. Human preference studies reveal that users favor the o3 model for its style, comprehensiveness, and clarity. It also performs strongly in instruction following and efficiency, though results for factual correctness are more balanced between versions. Performance on third-party evaluation benchmarks reflects these enhancements. On the OSWorld benchmark that measures completion of browser-based tasks, the o3 model scores 42.9 compared to 38.1 for the previous version. However, OpenAI notes that due to limitations in the automated grading system, the actual performance gain could be closer to 20 percentage points! On WebArena, the new model achieved a score of 62.9, up from 48.1. The most dramatic improvement appears on the GAIA benchmark, where the o3 model scores 62.2, vastly surpassing the prior model's 12.3. Side-by-side task comparisons further illustrate these gains. In one example involving a restaurant booking request, the new model provided a clearer and more detailed list of available reservations, including locations, Michelin ratings, and seating notes, presented in a well-formatted table. The previous version, while functional, delivered less information in a less organized manner, according to an image included with the new o3 Operator release notes: Safeguards remain, as do general cautionary notes about usage on sensitive, financial transactions and account access The o3 model also inherits the safety measures introduced with earlier versions, with further fine-tuning for its role as an agentic system. OpenAI has integrated enhanced training against harmful task execution, prompt injection vulnerabilities, and mistakes involving user intent. Evaluations show that the model now confirms 94% of sensitive actions before executing them, with 100% confirmation in financial transactions. Prompt injection susceptibility has also decreased from 23% to 20%. Notably, the o3 Operator maintains a cautious boundary on certain high-risk web interactions, such as email or financial platforms, where it may require user supervision via Watch Mode or explicitly refuse to proceed. These measures are part of a layered approach to safety that combines model-level robustness with real-time monitoring. While the upgrade to Operator marks a technical improvement, it also reflects OpenAI's ongoing commitment to responsible AI deployment. The system's ability to take real-world actions introduces new risks, and the development team continues to refine its safety protocols accordingly. According to OpenAI's updated o3 system card documentation, the model remains below high-risk capability thresholds in categories such as biological and chemical misuse and has no native coding environment or terminal access, further reducing potential misuse vectors. Operator remains a research preview and is accessible only to ChatGPT Pro users. The Responses API version of Operator will continue to be based on the GPT-4o model, at least for now. Implications for enterprise technical decision-makers The upgraded Operator stands to significantly enhance the workflows of professionals in AI engineering, orchestration, data management, and IT security. For those building or maintaining machine learning models, the model's improved accuracy and structured outputs reduce the overhead of test validation and troubleshooting. In orchestration contexts, it offers a practical, reliable tool for automating browser-based components of complex pipelines. Data engineers can delegate manual web interactions -- such as data verification and scraping -- with more confidence, freeing time for higher-level optimization work. Security professionals, meanwhile, gain a safer way to simulate user behavior in audits and incident response exercises, thanks to the model's layered safety mechanisms. Across these disciplines, the o3-based Operator introduces both a capability upgrade and a risk mitigation framework, making it a practical addition to the modern technical toolkit.
[5]
OpenAI upgrades Operator with o3 model for enhanced reasoning, safety
OpenAI is updating the artificial intelligence (AI) model powering Operator, its AI agent that can autonomously browse the web and interact with certain software inside a cloud-hosted virtual machine to carry out user requests. Operator will soon run on a model based on o3, one of the latest in OpenAI's o series of "reasoning" models. Previously, Operator relied on a customised version of GPT-4o. By several benchmarks, o3 is a more advanced model, particularly on tasks requiring mathematical ability and reasoning. "We are replacing the existing GPT‑4o-based model for Operator with a version based on OpenAI o3," OpenAI wrote in a blog post. "The API version (of Operator) will remain based on 4o." Operator is part of a growing set of agentic tools developed by AI firms as they compete to build agents capable of reliably performing digital tasks with minimal supervision. Google offers a similar agent through its Gemini API, which can browse the web and take actions on users' behalf. It also offers a consumer-facing version called Mariner. Anthropic's models can perform various computer tasks as well, including opening files and navigating webpages. According to OpenAI, the upgraded Operator model, dubbed o3 Operator, was "fine-tuned with additional safety data for computer use," using datasets designed to "teach the model (OpenAI's) decision boundaries on confirmations and refusals." The company has released a technical report detailing o3 Operator's performance in safety evaluations. Compared to the GPT-4o version, the new model is less likely to carry out illicit activities, search for sensitive personal data or fall prey to prompt injection, a common AI attack technique. "o3 Operator uses the same multi-layered approach to safety that we used for the 4o version of Operator," OpenAI wrote in its blog post. "Although o3 Operator inherits o3's coding capabilities, it does not have native access to a coding environment or terminal."
Share
Copy Link
OpenAI has updated its Operator AI agent with the more advanced o3 model, improving its reasoning capabilities and safety features for autonomous web browsing and task completion.
OpenAI has announced a significant upgrade to its Operator AI agent, transitioning from a customized version of GPT-4o to the more advanced o3 model. This update aims to enhance the agent's reasoning capabilities and improve its performance in autonomous web browsing and task completion 1.
Source: Economic Times
The new o3 Operator demonstrates improved persistence and accuracy when interacting with web browsers, leading to higher task success rates. Users can expect clearer, more thorough, and better-structured responses from the agent 2.
Performance improvements are evident in various benchmarks:
OpenAI has prioritized safety in the o3 Operator upgrade. The model was fine-tuned with additional safety data for computer use, incorporating datasets designed to teach decision boundaries on confirmations and refusals 1.
Key safety improvements include:
The o3 Operator maintains cautious boundaries on high-risk web interactions, such as email or financial platforms, often requiring user supervision or refusing to proceed 4.
Operator remains available as a research preview to ChatGPT Pro users globally, who pay a $200 monthly subscription fee 2. While the high price point may limit widespread adoption, OpenAI has hinted at potential changes to make the tool more accessible 3.
Source: Bleeping Computer
OpenAI's Operator upgrade comes amid fierce competition in the AI agent market. Google offers a similar "computer use" agent through its Gemini API and a consumer-focused version called Mariner. Anthropic's models also demonstrate capabilities in performing computer tasks 1.
Other examples of AI agents include Browser Use, Proxy 1.0, Hugging Face's HuggingAgent, and Opera's Browser Operator 3.
Source: TechRadar
The upgrade to o3 Operator represents a significant step forward in OpenAI's vision for useful AI agents. As these technologies continue to evolve, they have the potential to reshape how users interact with digital interfaces and complete online tasks 5. However, the balance between convenience, cost, and ethical considerations will likely remain a key focus as AI agents become more sophisticated and widely adopted.
OpenAI has begun using Google's TPUs to power ChatGPT and other products, marking a significant shift from its reliance on NVIDIA GPUs and Microsoft's data centers.
4 Sources
Technology
21 hrs ago
4 Sources
Technology
21 hrs ago
Mayo Clinic researchers have developed an AI tool called StateViewer that can identify nine types of dementia, including Alzheimer's, from a single brain scan with 88% accuracy, potentially transforming early diagnosis and treatment.
3 Sources
Health
21 hrs ago
3 Sources
Health
21 hrs ago
Google introduces Scheduled Actions for Gemini, allowing users to automate future and recurring AI tasks on Android, iOS, and web platforms. This feature is available for paid subscribers and offers management through a dedicated interface.
2 Sources
Technology
21 hrs ago
2 Sources
Technology
21 hrs ago
An exploration of how AI is influencing early childhood development, its potential benefits and risks, and the urgent need for regulation and parental guidance.
2 Sources
Technology
5 hrs ago
2 Sources
Technology
5 hrs ago
The U.S. Department of Justice has settled its antitrust lawsuit against Hewlett Packard Enterprise's $14 billion acquisition of Juniper Networks, imposing conditions related to AI software and wireless networking to preserve market competition.
6 Sources
Business and Economy
13 hrs ago
6 Sources
Business and Economy
13 hrs ago