The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved
Curated by THEOUTPOST
On Wed, 5 Feb, 12:05 AM UTC
3 Sources
[1]
Everything you need to know about OpenAI's browser-based agent, Operator
Table of Contents Table of Contents What is Operator? When was Operator released? How does Operator work? What can Operator do and how well can it do it? How can I try Operator for myself? OpenAI has finally entered the agentic AI race with the release of its Operator AI in January. The agentic system is designed to work autonomously on its user's behalf and is primed to compete against already established industry rivals like Claude's Computer Use API and Microsoft's Copilot agents -- at least, once it sheds its "research preview" status. Here's everything you need to know about OpenAI's new agent and when you might be able to try it for yourself. What is Operator? OpenAI's Operator is an agent AI, meaning that it is designed to take autonomous action based on the information available to it. But unlike conventional programs, AI agents are able to review changing conditions in real-time and react accordingly, rather than simply execute predetermined commands. As such, AI agents are able to perform a variety of complex, multi-step tasks ranging from transcribing, summarizing and generating action items from a business meeting to booking the flight, hotel accommodations, and rental car for an upcoming vacation based on your family's various schedules to autonomously researching topics and assembling multi-page studies on those subjects. Recommended Videos Operator works slightly differently than other agents currently available. While Claude's Computer Use is an API and Microsoft's AI agents work within the Copilot chat UI itself, Operator is designed to, well, operate, within a dedicated web browser window that runs on OpenAI's servers and executes its tasks remotely. Your local web browser has nothing to do with the process and can be used normally even when Operator is running. The Operator app is powered by a new "Computer-Using Agent" model (CUA) that is, in turn, built atop GPT-4o, which provides the app's multimodal abilities. OpenAI says CUA was trained in a similar fashion to its o1 and o3 reasoning models. As such the CUA model will break down complex tasks into their component problems before trying to solve them sequentially, backtracking if it runs into any logical roadblocks. Introduction to Operator & Agents When was Operator released? OpenAI released Operator on January 23, 2025. It's currently only available to $200/month Pro users in the U.S. through the operator.chatgpt.com website. "Our plan is to expand to Plus, Team, and Enterprise users and integrate these capabilities into ChatGPT in the future," the company wrote in its announcement post. How does Operator work? Demonstrating Operator Users can activate the agent from the ChatGPT home screen, which pops up a dedicated web browser page in a side window for Operator to carry out its tasks. The AI provides a running narrative of what it is currently doing and the user can take over the process at any time. Operator will ask for the user's help in certain tasks, such as logging in to specific secured websites, as well as get the user's confirmation before executing important tasks. It can interact with websites both visually (i.e. through screen shots) and tactically, when it mimics the user's keyboard taps and mouse clicks. What can Operator do and how well can it do it? Since it is limited to the browser, Operator can only perform simple internet-based tasks currently, such as reserving concert tickets, ordering DoorDash, or filling out Instacart orders. The company also claims that the agent will be able to automate tasks like booking hotels and airlines, reserving tables at restaurants, and even doing your online shopping. OpenAI has pitted Operator against Anthropic's Computer Use, as well as Google DeepMind's Mariner agent, in a number of industry benchmarks and claims that Operator has beaten them across the board. On the OSWorld benchmark, which measures how well an agent can complete tasks like merging PDF files, CUA beat out Computer Use 38.1% to 22.0% -- for reference, humans average around 72% success on those tasks. On the WebVoyager benchmark CUA outscored Mariner 87% to 83.5%. Computer Use scored a paltry 56%. However, initial user reactions to the AI agent have been mixed. For example, New York Times columnist Kevin Roost wrote, "In all, I found that using Operator was usually more trouble than it was worth. Most of what it did for me I could have done faster myself, with fewer headaches." "Even when it worked," he continued, "it asked for so many confirmations and reassurances before acting that I felt less like I had a virtual assistant and more like I was supervising the world's most insecure intern." How can I try Operator for myself? In order to get access to OpenAI's Operator agent, you will need to sign up for OpenAI's Pro tier subscription and then access it via the operator.chatgpt.com website.
[2]
OpenAI's Operator agent helped me move, but I had to help it, too | TechCrunch
OpenAI gave me one week to test its new AI agent, Operator, a system that can independently do tasks for you on the internet. Operator is the closest thing I've seen to the tech industry's vision of AI agents -- systems that can automate the boring parts of life, freeing us up to do the things we really love. However, judging from my experience with OpenAI's agent, truly "autonomous" AI systems are still just out of reach. OpenAI trained a new model to power Operator, which combines the visual understanding of GPT-4o with the reasoning capabilities of o1. That model seems to work well for basic tasks; I watched Operator click buttons, navigate menus on websites, and fill out forms. The AI was occasionally successful at independently taking actions, and it works much faster than web-based agents I've seen from Anthropic and Google. But during my trial, I found myself assisting OpenAI's agent more than I'd like. It felt like I was coaching Operator through each problem, whereas I wanted to push certain tasks off my plate altogether. Too often during my test, I had to answer several questions, grant permissions, fill out personal information, and help the agent when it got stuck. In car terms, Operator is like driving a car with cruise control - occasionally taking your foot off the pedals and letting the car drive itself - but it's far from full-blown autopilot. In fact, OpenAI says Operator's frequent pauses are by design. The AI powering Operator, much like the AI powering chatbots like OpenAI's ChatGPT, can't reliably work independently for long periods of time, and it's prone to the same sort of hallucinating. Because of that, OpenAI doesn't want to give the system too much decision-making power or sensitive user information. Maybe that's a safe choice by OpenAI, but it reduces Operator's practicality. That said, OpenAI's first agent is an impressive proof of concept -- and interface -- for an AI that can use the front end of any website. But to create truly independent AI systems, tech companies will need to build more reliable AI models that don't require this much steering. My Operator trial coincided with the week I was moving apartments, so I had OpenAI's agent help with moving logistics. I asked Operator to help me buy a new parking permit. OpenAI's agent told me, "Sure," then opened a window into its browser on my PC's screen. Operator then conducted a search for a San Francisco parking permit in the browser, took me to the correct city website, and even the right page. Operator still lets you use the rest of your computer while it's working, something that can't be said for Google's Project Mariner. This is because OpenAI's agent isn't really working on the computer, but rather, off in the cloud somewhere. For my parking permit, I had to grant Operator permission to start different processes a few too many times. It also stopped to ask me to fill out forms with personal information - such as my name, phone number, and email address. At times, Operator also got lost, forcing me to take control of the browser and get the agent back on track. In another test, I asked Operator to make me a reservation at a Greek restaurant. To its credit, Operator found me a nice place in my area with reasonable prices. But I had to answer more than half a dozen questions throughout the flow. If you have to intervene six or more times just to book a reservation through an AI agent, at what point is it easier to just do it yourself? That's a question I asked myself a lot while testing Operator. In a few of my tests, I ran into websites that blocked Operator for whatever reason. For example, I tried booking an electrician using TaskRabbit, but OpenAI's agent told me that it ran into an error, and asked if it could use an alternative service instead. Expedia, Reddit, and YouTube also blocked the AI agent from accessing their platforms. However, other services are embracing Operator with open arms. Instacart, Uber, and eBay collaborated with OpenAI for the launch of Operator, allowing the agent to navigate their websites on behalf of humans. These businesses are preparing for a future where a subset of user interactions are facilitated by an AI agent. "Customers are using Instacart through a variety of different entry points," said Daniel Danker, chief product officer at Instacart, in an interview with TechCrunch. "We see Operator as, potentially, another one of those entry points." Letting OpenAI's agent use Instacart's website on behalf of a person seems like it would separate Instacart from its customers. However, Danker says Instacart wants to meet customers wherever they are. "We really are bullish about our belief, similar to OpenAI, that agentic systems will have a major impact on how consumers interact with digital properties," said eBay's chief AI officer, Nitzan Mekel-Bobrov, in an interview with TechCrunch. Even if AI agents rise in popularity, Mekel-Bobrov says he expects users will always come to eBay's website, noting that "online destinations are not going anywhere." I had some issues trusting Operator after it hallucinated a few times, and nearly cost me several hundreds dollars. For instance, I asked the agent to find me a parking garage near my new apartment. It ended up suggesting two garages that it said would take just a few minutes to walk to. Besides being way out of my price range, the garages were actually really far from my apartment. One was a 20-minute walk away, and the other was a 30-minute walk. Turns out, Operator had put in the wrong address. This is exactly why OpenAI doesn't give its agent your credit card number, passwords, or access to email. If OpenAI didn't let me intervene here, Operator would've have wasted hundreds of dollars on a parking spot I didn't need. Hallucinations like this are a key roadblock to actually useful autonomous agents - ones that can take bothersome tasks off your plate. No one will trust agents if they're prone to making basic mistakes, especially mistakes with real-world consequences. With Operator, OpenAI seems to have built some impressive tools to let AI systems browse the web. But these tools won't amount to much until the underpinning AI can reliably do what users ask it to do. Until then, humans will be stuck assisting agents -- not the other way around. And that kind of defeats the point.
[3]
AI Agents like OpenAI's 'Operator' have a long way to go before replacing humans
TL;DR: OpenAI's Operator was hyped as a groundbreaking AI agent capable of autonomous tasks, but early impressions suggest it's slow, error-prone, and still requires heavy supervision. OpenAI's Operator was attached to some strong claims in the lead up to its January 23 launch. 'Ph.D. level intelligence', the ability to autonomously carry out coding tasks, and the potential to exceed human capabilities. However, early user experiences with the tool are suggesting the contrary. Image: OpenAI The key distinction of AI Agents like Operator from a chatbot like ChatGPT is that they're designed to act on your behalf. Meaning: give them a task, and they'll take care of it with minimal oversight. Operator functions by essentially taking over your computer, utilizing a Computer-Using Agent (CUA) model that integrates visual processing and reasoning capabilities to interpret what's happening on the screen, and to carry out certain actions. Image: OpenAI Bloomberg's Rachel Metz spent some time with Operator, taking it through various day-to-day tasks. Purchasing groceries, booking reservations, and filling out forms. The agent was able to successfully order lipstick from Sephora, fill out a cart for Ben & Jerry's ice cream, and even suggested adding additional items to qualify for free delivery. However, it fell short on simple tasks like filling out spreadsheets, managing calendars, and navigating unfamiliar web-pages. A common thread among users was that the agent required constant supervision - and was not particularly efficient even when it did succeed. "For several agonizing moments, I watched as OpenAI's artificially intelligent agent slowly navigated the internet like someone who's had the web described to them in great detail but never actually used it." "It asks so many follow-up questions that it negated any time saved." The user-interface for Operator (Image: OpenAI) An AI-enthusiast Reddit user was another one of the first people to gain access to the tool. They took to the platform to share their experience: "Operator is quite simply too slow, expensive, and error-prone." Naturally, the discussion around autonomous AI raises questions about job displacement. Jensen Huang, at CES 2025, famously proclaimed 'IT will become the HR of AI agents." Sam Altman and Mark Zuckerberg are also outspokenly bullish on AI agent capabilities. However, for every bold claim about AI agents replacing workforces, there's reminders of their basic struggles: Credit: Geeks for Geeks The 'world's first AI software engineer' - Devin - was similarly touted as a paradigm shift in the programming space. Following its release, both users and researchers debunked some of the tools claims. Highlighting practices like: "Devin didn't complete the advertised task. Instead, it generated errors in its own code and then fixed them" "Even more concerning was Devin's tendency to press forward with tasks that weren't actually possible." While the claims surrounding AI agents may eventually prove true, practical application, rather than speculation, will be the real determining factor.
Share
Share
Copy Link
OpenAI's new AI agent, Operator, shows potential in automating web-based tasks but falls short of full autonomy, requiring significant user intervention and facing challenges in reliability and efficiency.
OpenAI has entered the agentic AI race with the release of Operator, an AI agent designed to work autonomously on behalf of users 1. Launched on January 23, 2025, Operator is currently available to $200/month Pro users in the U.S. through the operator.chatgpt.com website 1.
Operator is powered by a new "Computer-Using Agent" model (CUA) built on GPT-4o, providing multimodal abilities 1. It operates within a dedicated web browser window on OpenAI's servers, executing tasks remotely 1. The agent can interact with websites both visually and tactically, mimicking user actions like keyboard taps and mouse clicks 1.
Operator is designed to perform internet-based tasks such as reserving concert tickets, ordering food, and booking travel accommodations 1. OpenAI claims that Operator has outperformed competitors like Anthropic's Computer Use and Google DeepMind's Mariner in industry benchmarks 1.
However, early user experiences have been mixed:
During a week-long trial, TechCrunch reporter found that Operator could perform basic tasks like clicking buttons, navigating menus, and filling out forms 2. However, the need for constant supervision and intervention made it feel more like "coaching" the agent rather than offloading tasks entirely 2.
In tests for booking reservations and purchasing parking permits, users had to intervene multiple times, raising questions about the efficiency of using the agent versus completing tasks manually 2.
Some companies are embracing Operator's potential. Instacart, Uber, and eBay have collaborated with OpenAI, allowing the agent to navigate their websites 2. These businesses see AI agents as a potential new entry point for customer interactions 2.
OpenAI's Operator represents a significant step in the development of AI agents, but it also highlights the challenges in creating truly autonomous systems. As the technology evolves, improvements in reliability, efficiency, and decision-making capabilities will be crucial for AI agents to fulfill their promised potential in automating complex tasks and enhancing human productivity 123.
OpenAI's new AI agent, Operator, shows potential in automating online tasks but faces challenges in reliability and user experience.
7 Sources
7 Sources
OpenAI launches Operator, an AI agent capable of performing web-based tasks autonomously, sparking discussions about its implications for AGI and potential risks.
70 Sources
70 Sources
OpenAI is set to launch "Operator," an advanced AI agent capable of autonomously performing complex tasks, in January 2025. This development marks a significant shift towards agentic AI and has far-reaching implications for various industries.
23 Sources
23 Sources
OpenAI rolls out its advanced AI agent, Operator, to ChatGPT Pro subscribers in several countries, excluding the EU and a few European nations. The tool aims to perform various online tasks autonomously, marking a significant advancement in AI technology.
9 Sources
9 Sources
AI agents capable of using computers like humans are emerging, promising to revolutionize how we interact with technology. While still in early stages, these tools raise questions about efficiency, safety, and the future of human-computer interaction.
2 Sources
2 Sources