Curated by THEOUTPOST
On Mon, 20 Jan, 4:01 PM UTC
70 Sources
[1]
OpenAI's Operator is one more step towards AGI, but should we be worried about giving too much power to AI agents?
As expected, OpenAI has released its first autonomous AI agent, called Operator this week. Operator can act independently from you on your computer using a web browser doing pretty much anything that can be done in a web browser. So, it can perform tasks like booking a restaurant table or buying groceries. You just tell it what you want it to do, and off it goes like a faithful Internet-enabled butler that potters away until the task is complete or it needs to come back to you with a question. Say, there's no table available at 7.00pm, would Sir or Madam mind a 7.45pm table instead? Of course, Operator doesn't call you Sir or Madam, but it might as well. For all intents and purposes, this is the Internet butler that we were promised almost 30 years ago when Ask Jeeves was around. Do you remember Ask Jeeves? It was a search engine from 1997 that had an image of an actual butler who stood ready and willing to find things for you online. The character was named after Jeeves, Bertie Wooster's valet in the fictional works of P. G. Wodehouse. Instead of typing in search terms, Ask Jeeves encouraged you to search for things using natural language questions, like "Find me the perfect accompaniment to a roast dinner." Of course, we all know that Google won the search engine war, and in 2006, Ask dropped the Jeeves persona and just became Ask.com. But somehow, we've come full circle with AI, and thanks to technologies like ChatGPT search and Perplexity, searching using natural language requests is back in fashion. As our Internet butlers, except now we call them AI agents... It's no secret that Sam Altman and OpenAI are really interested in AGI, artificial general intelligence, also often referred to as superintelligence. This is the ultimate goal for OpenAI, and why it was founded. Chatbots like ChatGPT might have taken the world by storm, but their popularity is almost like an unintended consequence (a theme I'll return to later) of the race toward AGI. In a video to promote the release of Operator, one of the OpenAI employees sitting next to Sam Altman comes right out and says, [Operator is] "about removing one more bottleneck in our path to AGI." While agents are clearly exciting, they're not the destination for OpenAI; they're just one more step along the path. AGI has the potential to change our world radically. Once we have created an artificial intelligence that's smarter than we are, logically it should be able to construct even smarter versions of itself, and the level of intelligence rises rapidly. We've just got to hope that it doesn't decide to wipe us out. Not to worry you, but Geoffory Hinton, often referred to as the 'Godfather of AI,' recently upped his odds of technology wiping out humanity to 20%. And this is where we return to the theme of unintended consequences. Many experts see AI agents as a threat. While speaking at the World Economic Forum in Davos this week, artificial intelligence pioneer Yoshua Bengio warned that AI agents could be catastrophic for humanity. Speaking to Business Insider, he said, "All of the catastrophic scenarios with AGI or superintelligence happen if we have agents." Bengio would rather we continue towards building AGI without using agents, which allows them to do things autonomously. "All of the AI for science and medicine, all the things people care about, is not agentic," Bengio said. "And we can continue building more powerful systems that are non-agentic." So, could it really be that something designed to act like an Internet butler and do menial tasks like help me buy my groceries accidentally gives AI the power to take over the world? For now, it's hard to imagine how an automated program that slowly plods through the process of booking me a table at a restaurant using a web browser is going to end in humanity's downfall, but AI agents will live or die by one thing - if people actually use them - and I'm not entirely convinced they will. Personally, I don't feel ready to hand over my credit card details to a computer program that will buy things for me to save me time because I'm just not sure I'm ever going to trust it not to make a mistake. Would you? Perhaps OpenAI needs to give its Operator a more human face if it wants me to trust it, and as it turns out, I believe that good old Jeeves might be looking for a job these days...
[2]
Meet OpenAI's Operator, an AI agent that uses the web to book you dinner reservations, order tickets, compile grocery lists and more
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI has unveiled Operator, its first semi-autonomous AI agent, which is designed to "operate" a web browser much like a person would on their behalf, using the cursor to point and click, typing on its own, browsing the web and performing actions on various websites including booking restaurant reservations through OpenTable and assembling orders on Instacart and DoorDash -- instead of being confined to the ChatGPT interface or OpenAI's application programming interface (API). "This product is the beginning of our step into agents," said CEO and co-founder Sam Altman in a demo livestreamed on the company's YouTube Channel today at 1 pm ET. OpenAI president and fellow co-founder Greg Brockman wrote on X: "2025 is the year of agents." The preview, now available to paying subscribers of OpenAI's ChatGPT Pro ($200 per month) plan in the U.S., aims to demonstrate the potential of agentic AI while gathering critical feedback to refine its capabilities. Operator doesn't take over your web browser, though. Instead, you visit a separate, new website -- operator.chatgpt.com -- and are confronted with a prompt input box similar to ChatGPT. Typing in requests into this box -- "find me tickets for the LA Lakers game tonight" -- will trigger Operator to open a separate, virtual browser in the cloud running on OpenAI servers. Then, the agent can execute tasks like filling out forms, managing online reservations, even booking tickets to sports games and concerts, and navigating other common workflows. The user watches the cursor move on its own on the cloud-based browser in real time. If the agent encounters a problem, it will stop and message the user via a text output, similar to ChatGPT's responses. Also, below the virtual browser, the user will see suggestions of actions and things Operator can do on their behalf. Yet, the user can take control at any time -- similar to semi-autonomous driving systems modes on modern cars. Operator also asks the user to input their own payment credentials when it reaches a purchase screen on another website. Finally, users can also save particular workflows that they wish to use going forward and start them again. Powered by what OpenAI calls Computer-Using Agent (CUA) technology, a new variant of GPT-4o trained specifically to use computers. Bridging AI and GUIs Operator stands apart from other automation tools by mimicking human interaction with graphical user interfaces (GUIs). Instead of relying on specialized APIs, the system leverages screenshots for visual input and uses virtual mouse and keyboard actions to complete tasks. The underlying CUA model combines GPT-4o's vision capabilities with reinforcement learning, enabling the agent to perceive, reason, and act on screen. This approach allows Operator to handle diverse tasks, including e-commerce browsing, travel planning, and even repetitive tasks like creating playlists or managing shopping lists. Notable benchmarks illustrate its effectiveness: * 87% success rate on WebVoyager, a test of live website navigation. * 58.1% success rate on WebArena, which simulates real-world e-commerce and content management scenarios. But there's already tough competition: just yesterday, Chinese tech firm ByteDance (parent company of TikTok) also launched its own AI agent for controlling web browsers and performing actions on behalf of a user. Called UI-TARS, it's totally open source and boasts similarly impressive benchmark performance (though does not appear to be compared directly on the same benchmarks), meaning OpenAI's Operator will need to be significantly better or more reliable to justify the relatively high ($200/month) cost of accessing it through ChatGPT Pro subscriptions. Already being tested in enterprise web navigation use cases OpenAI is partnering with several businesses to ensure Operator meets real-world needs. Companies including Instacart, DoorDash, and Etsy are already testing the technology for use cases ranging from grocery delivery to personalized shopping. Brett Keller, CEO of Priceline, remarked on its utility for travel planning, calling it "a significant step in making travel more seamless and personalized." For public sector applications, the City of Stockton is exploring ways to leverage Operator to simplify civic engagement. Jamil Niazi, the city's Director of Information Technology, highlighted the potential for AI to make enrolling in services easier for residents. Yet there are limitations. Tech publication Every got an early preview and has been testing it for the past week and found that: "One of the peculiarities of Operator's design is that it doesn't use your browser. Instead, it uses a browser in one of OpenAI's data centers that you can watch and interact with remotely. The upside of this design decision is that you can use Operator wherever and whenever -- for example, on any mobile device. The downside is that many sites like Reddit already block AI agents from browsing so they can't be accessed by Operator. In this research preview mode, Operator is also blocked by OpenAI from accessing certain resource-intensive sites like Figma or competitor-owned sites like YouTube for performance or legal reasons." Safety measures Given its ability to act on users' behalf, Operator has been developed with robust safety features: * User control: Operator requests confirmation for sensitive actions, such as making purchases or sending emails. * Watch mode: Ensures user supervision for critical tasks, particularly on sensitive sites like email or financial platforms. * Misuse prevention: The system is trained to refuse harmful requests and includes safeguards against adversarial attacks, such as malicious prompts embedded in websites. OpenAI has also incorporated features to protect user privacy, including options to clear browsing data and opt out of data sharing for model improvements. Enterprise edition coming later OpenAI envisions a broader role for Operator in both individual and enterprise settings. Over time, the company plans to expand access to Plus, Team, and Enterprise users, eventually integrating Operator into ChatGPT. There are also plans to make the underlying CUA technology available via an API, enabling developers to create custom computer-using agents. Despite its potential, Operator remains a work in progress. OpenAI has been transparent about its limitations, such as difficulties with complex interfaces or unfamiliar workflows. Early user feedback will play a pivotal role in improving the system's accuracy, reliability, and safety. As OpenAI refines Operator through real-world use, it seeks to transform AI from a passive tool into an active participant in the digital ecosystem. Whether it's simplifying everyday tasks or innovating business workflows, OpenAI positions Operator as the next step in making AI accessible, practical, and secure.
[3]
OpenAI unveils Operator agent for automating web tasks
Hello Operator? Can you give me number nine? Can I see you later? Will you give me back my dime? OpenAI on Thursday launched a human-directed AI agent called Operator that can use a web browser by itself to accomplish various online tasks, or at least try to do so. As demonstrated by OpenAI CEO Sam Altman, software engineer Yash Kumar, researcher Casey Chu, and technical staff member Reiichiro Nakano, the Operator agent can perform online activities that require multiple steps and have specified parameters, such as booking a restaurant reservation through OpenTable within a certain time window or finding concert tickets for a specified performer within a given price range. Just like you feed queries into OpenAI's ChatGPT to answer or respond to, users can give Operator instructions to carry out on the web. While individuals can perform such tasks on their own time at no extra cost, Operator can do so less reliably for US-based ChatGPT Pro subscribers, who pay $200 per month. OpenAI subscribers to Plus, Team, and Enterprise tiers can expect access once the rough spots get ironed out. Operator is similar to Anthropic's computer use API in that it combines the sort of browser automation enabled by software frameworks like Playwright and Selenium with text-based machine learning models and computer vision models for evaluating online words and images presented by browsing websites. The overall aim is to automate web-based tasks to free humans from dull work ... or from employment all together. "Operator can be asked to handle a wide variety of repetitive browser tasks such as filling out forms, ordering groceries, and even creating memes," OpenAI explains in a write-up. "The ability to use the same interfaces and tools that humans interact with on a daily basis broadens the utility of AI, helping people save time on everyday tasks while opening up new engagement opportunities for businesses." Those engagement opportunities presently involve negotiation with OpenAI. The biz said it is working with firms "like DoorDash, Instacart, OpenTable, Priceline, StubHub, Thumbtack, Uber, and others to ensure Operator addresses real-world needs while respecting established norms." In other words, OpenAI's Operator may not interoperate well with web services that aren't expecting frequent automated contact. But to the extent agent-based interaction becomes popular, OpenAI and like-minded agent purveyors may devalue search as a marketing and sales channel, since automated connections to services - and partner preferencing paved by APIs - have the potential to reduce the need for human-driven queries. OpenAI's agent is based on a model called Computer-Using Agent (CUA), which combines GPT-4o's computer vision capabilities with training about how to deal with graphical user interfaces (GUIs). TikTok parent ByteDance recently released a similar open source project for automating GUI interactions, UI-TARS. According to OpenAI, CUA has achieved a 38.1 percent success rate on the OSWorld benchmark test for full computer use tasks, a 58.1 percent success rate on WebArena, and an 87 percent success rate on WebVoyager for web-based tasks. So use Operator when you're open to the possibility of not having your restaurant reservation booked or your groceries ordered. CUA's computer vision modality works by capturing and storing screenshots, which it uses to perform chain-of-thought "reasoning" to perform the requested task. Those familiar with the controversy surrounding Microsoft's screen capturing Recall feature in the latest version of Windows may have some concerns about how OpenAI handles screenshot data. The Register inquired to OpenAI seeking clarification, and we've not heard back. The biz says disabling the "Improve the model for everyone" in ChatGPT settings - on by default - will prevent data in Operator from being used to train its models. We know bad actors may try to misuse this technology As mentioned above, users of Operator enter the task as a text prompt and the AI agent is expected to attempt to accomplish that task, breaking it down into a series of steps and awaiting user intervention when the user is required to log in, provide payment details or solve CAPTCHAs - something current computer vision models can do quite effectively, if allowed. "We know bad actors may try to misuse this technology," OpenAI said. "That's why we've designed Operator to refuse harmful requests and block disallowed content. Our moderation systems can issue warnings or even revoke access for repeated violations, and we've integrated additional review processes to detect and address misuse." According to the ChatGPT maker, Operator has been designed to defend against adversarial websites that might try to lead the AI agent astray through hidden prompts, malicious code, or phishing attempts. The AI agent supposedly has been designed to detect and ignore prompt injection attacks. And it's said to operate under the supervision of a "monitor model" that watches for dubious behavior, augmented by anomaly detection processes involving human review and automated processes. Nonetheless, OpenAI acknowledges, "no system is flawless and this is still a research preview." Operator arrives amid what AI industry leaders have heralded as "the agentic era," a time when generative AI models apply multimodal text, audio, and vision capabilities to interact with other computing systems in order to tackle multi-step tasks that require some form of reasoning and progress assessment. While AI agents may sound promising in theory, they've been something of a letdown in practice - possibly because every step in a complex task adds another opportunity for failure. A recent evaluation of AI code helper Devin, for example, suggests further work will need to be done to make these systems reliable. ®
[4]
OpenAI's Operator is your new autonomous AI assistant ready to do your biding across the web
OpenAI has launched Operator, a largely autonomous AI agent designed to take your simple text prompts and turn them into real-world tasks completed via the internet. In theory, you can ask it to do almost anything that's possible via a web browser. In practice, early users seem to be finding the results rather hit and miss. Examples of the sorts of things Operator can do are booking travel, making restaurant reservations for a certain time, or perhaps buying concert tickets for a specific band within a given price range. Currently released as a research preview only available to ChatGPT Pro subscribers rather than a fully baked product, Operator is based on OpenAI's Computer-Using Agent (CUA) model, which combines the computer vision capabilities from GPT-4o's with specific graphical user interfaces (GUIs) training and advanced reasoning to create a tool capable of browsing the web, formulating multi-step tasks from a text prompt and executing the whole shebang. Arguably, Operator is not unique, what with ByteDance's UI-TARS and Anthropic's Computer Use having a somewhat similar remit. But perhaps what makes Operator a little different is that it doesn't need APIs. "Operator can 'see' (through screenshots) and 'interact' (using all the actions a mouse and keyboard allow) with a browser, enabling it to take action on the web without requiring custom API integrations,' OpenAI says. That said, it does seem like it helps if web services are optimized for Operator. "We're collaborating with companies like DoorDash, Instacart, OpenTable, Priceline, StubHub, Thumbtack, Uber, and others to ensure Operator addresses real-world needs while respecting established norms," OpenAI says. Presumably, your results -- or should that be Operator's results? -- won't be as accurate with non-optimized services. Exactly how good Operator currently is at taking a prompt and running with it isn't entirely clear. OpenAI itself says the CUA models returns a 38.1% success rate on the OSWorld benchmark test for full computer use tasks, a 58.1% for WebArena, and 87% on WebVoyager for web-based tasks. Meanwhile, some early users are reporting that Operator may be more prone to hallucinations than recent builds of ChatGPT itself. For instance, one user claims that when tasked with generating a list of online influencers and tabulating their contact details, Operator entirely made all the details up. Some users also report that Operator is surprisingly slow, something that seems to tally with the video demo posted by OpenAI. It's hardly a whirlwind of mouse inputs, that's for sure. So, it seems you'd be brave to ask Operator to do your grocery shopping next week and expect all the right stuff to turn up, or perhaps anything to turn up at all. The big question, then, is how quickly this early beta version of Operator will develop into something broadly reliable and useful. And then of course how safe it will be if and when it does. "We know bad actors may try to misuse this technology. That's why we've designed Operator to refuse harmful requests and block disallowed content," OpenAI says, also explaining that Operator has been designed to deal with websites that might try to hijack the AI agent with hidden prompts, malicious code, or phishing attempts. For now, these are all, ultimately, imponderables. But for better or worse it does seem like you'll soon be able to hand over quite a few of your routine online tasks to an AI agent.
[5]
OpenAI's Agent Has a Problem: Before It Does Anything Important, You Have to Double-Check It Hasn't Screwed Up
Behold Operator, OpenAI's long-awaited agentic AI model that can use your computer and browse the web for you. It's supposed to work on your behalf, following the instructions it's given like your very own little employee. Or "your own secretary" might be more apt: OpenAI's marketing materials have focused on Operator performing tasks like booking tickets, restaurant reservations, and creating shopping lists (though the company admits it still struggles with managing calendars, a major productivity task.) But if you think you can just walk away from the computer and let the AI do everything, think again: Operator will need to ask for confirmation before pulling the trigger on important tasks, which throws a wrench into the premise of the AI agent acting on your behalf, since the clear implication is you need to make sure it's not screwing up before allowing it any real power. "Before finalizing any significant action, such as submitting an order or sending an email, Operator should ask for approval," reads the safety section in OpenAI's announcement. This measure highlights the tension between keeping stringent guardrails on AI models while allowing them to freely exercise their purportedly powerful capabilities. How do you put out an AI that can do anything -- without it doing anything stupid? Right now, a limited preview of Operator is only available to subscribers of the ChatGPT Pro plan, which costs an eye-watering $200 per month. The agentic tool uses its own AI model called Computer-Using Agent to interact with its virtual environment -- as in use mouse and keyboard actions -- by constantly taking screenshots of your desktop. The screenshots are interpreted by GPT-4o's image-processing capabilities, theoretically allowing Operator to use any software it's looking at, and not just ones designed to integrate with AI. But in practice, it doesn't sound like the seamless experience you'd hope it to be (though to be fair, it's still in its early stages). When the AI gets stuck, as it still often does, it hands control back to the user to remedy the issue. It will also stop working to ask you for your usernames and passwords, entering a "takeover mode." It's "simply too slow," wrote one user on the ChatGPTPro subreddit in a lengthy writeup, who said they were "shocked" by its sluggish pace. "It also bugged me when Operator didn't ask for help when it clearly needed to," the user added. In reality, you may have to sit there and watch the AI painstakingly try to navigate your computer, like supervising a grandparent trying their hand at Facebook and email. Obviously, safety measures are good. But it's worth asking just how useful this tech is going to be if it can't be trusted to work reliably without neutering it. And if safety and privacy are important to you, then you should already be uneasy with the idea of letting an AI model run rampant on your machine, especially one that relies on constantly screenshotting your desktop. While you can opt out of having your data being used to train the AI model, OpenAI says that it will store your chats and screenshots up to 90 days on its servers, TechCrunch reported, even if you delete them. Because Operator can browse the web, that means it will potentially be exposed to all kinds of danger, including attacks called prompt injections that could trick the model into defying its original instructions.
[6]
OpenAI launches 'Operator' - everything about the new agent that can use a computer for you
OpenAI has unveiled "Operator," an advanced AI agent designed to autonomously perform tasks on the web, marking a major leap in AI independence. Currently available as a research preview to ChatGPT Pro subscribers in the United States, Operator empowers users to delegate complex online activities, to an AI-driven system. Unlike traditional AI models that rely on predefined APIs, Operator utilizes a "Computer-Using Agent" (CUA) model. This innovative approach combines GPT-4o's vision capabilities with advanced reasoning through reinforcement learning, enabling Operator to browse the web by interpreting screenshots and performing actions such as typing, clicking, and scrolling. The new agent can even navigate and manipulate web interfaces much like a human user, allowing it to take on tasks such as ordering groceries, making reservations, and more. Although unavailable for causal users just yet, the practical applications are already being seen. The obvious purposes include handling online tasks that up until now required human intervention. For instance, it can purchase plane tickets, order a pizza, and compile expense reports without the assistance of a human. The possibilities are endless when it comes to automating routine activities; Operator aims to enhance productivity and streamline daily workflows for its users. To align Operator with real-world applications, OpenAI, is collaborating with companies including DoorDash, Instacart, OpenTable, Priceline, StubHub, Thumbtack, and Uber to not only utilize the agent's capabilities but also ensure it adheres to the businesses' terms of service agreements. These partnerships aim to refine Operator's functionalities and ensure it meets diverse user needs. While Operator currently excels in various tasks, it may face challenges with complex interfaces, such as creating slideshows or managing calendars, but it is still in review and undergoing safety testing to work out these setbacks. OpenAI has implemented several safety measures to ensure the responsible use of Operator. The agent is programmed to self-correct and will transfer control back to the user if it encounters difficulties or when sensitive information, such as login credentials, is required. Additionally, Operator seeks user approval before executing critical actions like sending emails, thereby maintaining a balance between autonomy and user oversight. The introduction of Operator positions OpenAI prominently in the competitive AI agent market alongside other AI giants like Anthropic, which introduced an autonomous agent last year. OpenAI's move underscores the industry's shift towards creating AI that can perform tasks independently, reflecting a broader trend in AI development focused on enhancing user convenience and efficiency. As Operator continues to evolve through user feedback and strategic partnerships, it is poised to become an integral tool for individuals and businesses seeking to optimize their online interactions and productivity. OpenAI plans to expand access to Operator across more user tiers and integrate its capabilities into ChatGPT, broadening its availability and utility. Until then, OpenAI just announced that its latest model, o3-mini is available for free, giving users even more ways to use its chatbot.
[7]
OpenAI's new ChatGPT agent can perform interactive tasks on your behalf
Imagine an AI bot that can fill out online forms, book airline flights, order groceries, and more. That's the intent of OpenAI's new Operator, an AI that acts as an independent agent to carry out your commands all on its own. Also: Operator isn't worth its $200-per-month ChatGPT Pro subscription yet - here's why Released as a research preview on Thursday, Operator is able to interact directly with a web browser. That means it can navigate web pages by typing, scrolling, and clicking in all the right spots, just as you would yourself. The difference here is that Operator aims to do all that without any intervention on your part. Sounds cool, but Operator is starting off slowly. Beyond its initial research preview status, the tool is now accessible only with ChatGPT Pro subscriptions in the US, which cost $200 a month. As the AI evolves and learns from its mistakes, OpenAI plans to expand its reach to Plus, Team, and Enterprise users and eventually integrate its skills directly into ChatGPT. ChatGPT Pro users who want to take Operator for a spin should browse to its dedicated web page. Make sure you're signed in with your OpenAI account. From there, type a request at the prompt just as you normally would with ChatGPT. Only you'll want to fashion that request as one that asks Operator to carry out tasks on the web independently. Also: ChatGPT vs. ChatGPT Plus: Is a paid subscription still worth it? For example, you could ask Operator to find and book a tour of Rome through Tripadvisor, order more bananas and apples from Instacart, or purchase Apple's AirPods Pro 2 from Amazon. You can even tell the bot to handle several tasks simultaneously. To help Operator carry out actions with specific vendors and websites, OpenAI is working with companies such as DoorDash, Instacart, OpenTable, Priceline, StubHub, Thumbtack, and Uber. As cool as all this may sound, there are certainly potential pitfalls and problems. Depending on the complexity of the task, Operator could get stuck or make a mistake along the way. In that case, it will try to self-correct. If that doesn't work, the tool will hand control back to you to intervene. Operator is also unable to handle confidential information, such as passwords, payment details, and CAPTCHA challenges. If it runs into a website requiring a login or payment card, it will ask you to take over. Further, Operator will refuse to carry out certain sensitive tasks, such as depositing money online or submitting a job application. It will also ask for approval before completing other types of tasks, such as submitting an online order or sending an email. Also: Have a genealogy mystery? How I used AI to solve a family puzzle Privacy is always a concern, especially with AI. To better protect your privacy, you can opt out of training so that Operator won't use your data for learning. You're also able to delete all browsing data, log out of all sites, and erase past conversations by going to the Privacy section in Operator's settings. Security is another worrisome area. How will cybercriminals and hackers be prevented from exploiting and abusing Operator? First, the AI is designed to refuse harmful requests and block prohibited content. Second, it will detect and ignore prompt injections in which hackers try to feed malicious information disguised as legitimate requests. Third, a built-in monitor looks out for suspicious behavior and will pause the task if such behavior is detected. Fourth, OpenAI will use both automated and human reviewers to watch for possible threats. Of course, other privacy and security issues will likely pop up as Operator wends its way through the ChatGPT Pro community. That's the purpose of limiting its initial use to just Pro subscribers before looking to expand to a wider audience. To carry out tasks on the web, Operator uses a new model called Computer-Using Agent (CUA), which combines GPT-4o's vision skills with more advanced reasoning. That combination lets it interact with the menus, controls, and text fields on a web page without the need for a custom API. Also: The best AI chatbots "Operator can be asked to handle a wide variety of repetitive browser tasks such as filling out forms, ordering groceries, and even creating memes," OpenAI said in its news release. "The ability to use the same interfaces and tools that humans interact with on a daily basis broadens the utility of AI, helping people save time on everyday tasks while opening up new engagement opportunities for businesses."
[8]
OpenAI Introduces Operator & Agents : The Future of Task Automation
OpenAI has unveiled "Operator," an advanced AI agent designed to autonomously perform tasks through a cloud-based web browser. This innovative tool can navigate websites, interact with interfaces, and execute user-defined instructions, such as booking reservations or shopping online. Currently available as a research preview for Pro users in the U.S., OpenAI's Operator represents a significant milestone in the advancement of ChatGPT. OpenAI has plans to expand its availability and enhance its features, aiming to make this tool accessible to a broader audience in the near future. Imagine a world where booking a dinner reservation, purchasing concert tickets, or even managing your online shopping list could be done without lifting a finger. For many of us, juggling these everyday tasks alongside work, family, and personal commitments can feel overwhelming. That's where OpenAI's latest innovation, "Operator," steps in. Designed to take the reins on tasks that typically demand your time and attention, offering a glimpse into a future where technology doesn't just assist but actively works for you. Whether it's navigating websites or executing detailed instructions, Operator promises to simplify your life in ways that feel almost magical say OpenAI. But let's be real -- trusting an AI to handle your to-do list might feel like a leap of faith. After all, how can you be sure it will get things right or operate safely? OpenAI has anticipated these concerns, building Operator with safeguards and user oversight at its core. This isn't just about automation; it's about creating a tool that works seamlessly with you, not just for you. In this article, we'll explore what makes Operator so new, how it works, and why it could soon become your go-to assistant for tackling the mundane and the complex alike. AI agents like Operator are designed to simplify and optimize your daily activities by automating repetitive or complex tasks. Unlike traditional AI tools that require constant user input, Operator operates independently, completing tasks based on your instructions. By using advanced AI models, it aims to boost productivity, creativity, and efficiency, allowing you to focus on more critical priorities. Operator's functionality is rooted in its ability to mimic human interactions with websites. It uses a virtual interface that includes a screen, mouse, and keyboard, allowing it to navigate websites and perform tasks without relying on APIs. This flexibility ensures that Operator can interact with virtually any website, even those without dedicated API support. By automating these processes, Operator offers a practical solution to streamline workflows and save time. What Can Operator Do? Operator's capabilities are centered around its cloud-based browser, which allows it to perform a wide range of tasks that typically require manual effort. By mimicking human interactions, Operator can handle tasks such as filling out forms, navigating complex interfaces, and executing user-defined commands. This versatility makes it a valuable tool for addressing real-world challenges. Here are some examples of tasks Operator can perform: These examples highlight Operator's potential to simplify everyday activities, making it a practical tool for both personal and professional use. How Do You Interact with Operator? Operator is designed to give you full control over its actions through user-defined instructions. You can provide detailed prompts that guide the AI agent in completing specific tasks. Throughout the process, you can monitor its progress and intervene if necessary to make adjustments or corrections. This interactive approach ensures that you remain in charge while benefiting from the automation capabilities of the tool. To prioritize safety and accuracy, Operator incorporates confirmation steps for critical actions, such as completing purchases or submitting sensitive information. These safeguards are designed to minimize errors and prevent misuse, making sure that you can trust the system to operate responsibly. By combining autonomy with user oversight, Operator strikes a balance between convenience and control. How Does Operator Ensure Safety? Safety is a fundamental aspect of Operator's design. OpenAI has implemented several measures to ensure that the AI agent operates securely and responsibly. These safeguards are intended to build trust and reliability as the technology continues to evolve. These safety features are designed to protect both users and the broader ecosystem, making sure that Operator operates within ethical and secure boundaries. The Technology Behind Operator: The Computer Using Agent (CUA) Operator is powered by a new model called the "Computer Using Agent" (CUA), which is built on OpenAI's GPT-4. This model enables Operator to interpret screenshots, navigate operating systems, and control a computer in a manner similar to a human user. By combining GPT-4's advanced language capabilities with CUA's operational framework, Operator can perform complex tasks with a high degree of autonomy. The integration of CUA allows Operator to adapt to a wide range of scenarios, from navigating unfamiliar websites to executing intricate workflows. This adaptability is a key factor in its ability to handle diverse tasks, making it a versatile tool for users across various domains. How Does Operator Perform? Operator demonstrates impressive capabilities in navigating websites and completing tasks. It outperforms many existing AI systems in these areas, showcasing its potential as a powerful automation tool. However, it is not without limitations. Operator's performance, while advanced, still falls short of human-level proficiency in certain scenarios. OpenAI acknowledges these challenges and is actively working to improve the system's accuracy, speed, and cost-effectiveness. Despite its current limitations, Operator's performance highlights its potential to transform how tasks are automated. As the technology matures, it is expected to become even more reliable and efficient, further solidifying its role as a valuable tool for users. What's Next for Operator? OpenAI has ambitious plans for the future of Operator. While the research preview is currently limited to Pro users in the U.S., the company intends to expand access to Plus users and international markets over time. OpenAI also plans to introduce API access, allowing developers to integrate Operator's capabilities into their own applications. These developments are expected to broaden the tool's reach and utility. In addition to expanding access, OpenAI is committed to continuously improving Operator's functionality, reliability, and user experience. By incorporating user feedback and addressing challenges, the company aims to refine the tool and unlock its full potential. Why Does Operator Matter? Operator represents a significant advancement in the evolution of AI agents. By allowing autonomous task execution and seamless interaction with websites, it paves the way for a new generation of AI-driven productivity tools. OpenAI envisions a collaborative approach to refining Operator, working closely with users to address challenges and enhance its capabilities. As AI agents like Operator continue to evolve, they promise to transform how you interact with technology. By automating complex tasks and prioritizing safety and reliability, Operator is poised to become an essential tool in the growing ecosystem of AI solutions. Its potential to streamline workflows and improve efficiency underscores its importance as a step forward in the integration of AI into everyday life.
[9]
OpenAI announces Operator AI agent that can browse the web for you
Operator, OpenAI's agent that can perform multi-step tasks autonomously, has arrived. The ChatGPT maker introduced a preview mode of Operator on Thursday, detailing how it works and what it's capable of. Operator can browse the web, performing tasks like calculating refunds from a canceled order and finding customers with specific criteria in an internal sales database. It can also buy groceries and send emails. On a computer, Operator can perform various tasks, like downloading files, combining PDFs, analyzing spreadsheets, and exporting images. OpenAI is delivering on its promise of making 2025 the year of agentic AI. Last week, the company launched Tasks for ChatGPT, which lets users automate future prompts like sending a daily brief of tech news or scheduling reminders. While many of these tasks are already possible through basic tools like Google Alerts and calendars, it's an early example of AI bots doing the legwork for the user. Combined with the release of Operator and its ability to autonomously take on more complex tasks, you can start to see OpenAI's vision for making ChatGPT an indispensable tool leveraging its core product. The model underpinning Operator is a Computer-Using Agent (CUA) that combines GPT-4o's vision mode to "see" what's on the user's screen through screenshots with graphical user interfaces (GUIs) that enable Operator to interact with the screen (clicking buttons, typing, scrolling, etc.). Obviously, safety is a big concern for a semi-autonomous AI agent like Operator. OpenAI says it has taken risks into account in a few different ways. Operator mitigates misuse by blocking harmful or illegal tasks, and can't access blacklisted sites like gambling and adult entertainment sites and drug or gun retailers. And OpenAI is looking over your shoulder as you use Operator. The announcement says that "user interactions are reviewed in real-time by automated safety checkers that are designed to ensure compliance with Usage Policies and have the ability to issue warnings or blocks for prohibited activities," and that the company has developed "automated detection and human review pipelines to identify prohibited usage in priority policy areas, including child safety and deceptive activities." Since Operator can make costly mistakes without human supervision, the model will ask for confirmation "before submitting an order, sending an email, etc., so that the user can double-check the model's work before it becomes permanent." Operator is also currently limited from "higher-risk tasks like banking transactions." Now is where we start to see OpenAI's new premium tier subscription, ChatGPT Pro. Operator in preview mode is only available in the U.S. to those who pay $200 a month as Pro users. But over time, OpenAI expects to expand availability to Plus, Team, and Enterprise users.
[10]
OpenAI's first AI Agent is here, and Operator can make a dinner reservation and complete other tasks on the web for you
Operator can make a dinner reservation, fill out a form, and complete other web tasks OpenAI is always looking for the next big thing to add to ChatGPT, and after months of rumors, including a report from earlier this week that teased a launch, the technology giant's first AI Agent is here. Operator is designed to complete web tasks for you, all with a touch of a button. Essentially, Operator is a Computer Using Agent (CUA) that uses GPT-4o's visual skills to browse and search the web. This means that it can understand the context of what to search for, and thanks to its multi-modality, it understands what it sees as it searches. It's available now as a research preview for ChatGPT Pro subscribers in the United States. Operator is described as "an agent that can use its own browser to perform tasks for you." OpenAI released a demo showing Operator browsing the web as we (that is, we humans) do. You might ask Operator to book a dinner reservation for you, fill out an arduously long form, order groceries from a service, or even book a flight. It can use OpenTable to find and book a reservation at a restaurant, as shown in the demo. Operator will even walk you through its steps. Operator is a 'research preview,' so know that it's in its early days. OpenAI does impose some limitations. We haven't had the chance to go hands-on yet, but it certainly looks impressive. This is OpenAI's first entry into the world of AI agents, which will likely be the theme of the year in the realm of artificial intelligence. OpenAI writes in a blog post announcing Operator that it "is one of our first agents, which are AIs capable of doing work for you independently -- you give it a task and it will execute it." This hints that not only are there other agents in the pipeline - Altman confirmed this during the live demo - but that they're all based around the notion of doing things for you - a big step in the quest to make AI even more helpful, giving us some time back. Operator is powered by the new Computer Using Agent (CUA) model, which pairs GPT4o's vision skills with advanced reasoning. This all comes together to let Operator understand and use elements within a browser - the search bar, various buttons, and on-screen content. OpenAI explains that "Operator can 'see' (through screenshots) and 'interact' (using all the actions a mouse and keyboard allow) with a browser," allowing it to functionally use a browser to complete a task. That's pretty neat, especially if it works at a high rate of success, and according to the blog post, it can self-correct. However, as with most new AI tools and skills, it will likely take some time for this to become truly useful in the real world. That will also require OpenAI to open it up to more folks, though as an early research preview it's still certainly an impressive demo. For now, if you're in the United States and subscribed to ChatGPT Pro, you can try it out on OpenAI's website. OpenAI CEO Sam Altman teased that it would eventually arrive in other countries and be added to the ChatGPT Plus subscription. As we remember from some of the announcements from 12 Days of OpenAI, Europe will likely take a bit longer.
[11]
New ChatGPT Operator Review : Does It Live Up to the Hype?
OpenAI has introduced "Operator," an advanced AI agent designed to autonomously handle a variety of real-world tasks. From booking hotels to making restaurant reservations, Operator seeks to simplify daily activities with minimal user involvement. OpenAI Operator it's designed to act as your personal digital assistant, tackling the grunt work with minimal effort on your part. Currently available to Pro Plan users in the U.S. paying $200 per month, this new Agentic feature uses the latest AI technologies, including GPT-4 with vision capabilities and a specialized "Computer Using Agent" model. If you are interested in learning more about the new Operator the AI Advantage has put the new AI agent through its paces, showing what the Operator can do, where it shines, and where it still falls short. Giving you a clear picture of whether this AI agent is ready to transform your daily routine -- or if it's just another tech experiment in progress. So, does it live up to the hype? And more importantly, is it worth the investment? Operator is a new AI-powered feature integrated into ChatGPT, designed to perform tasks that traditionally require human interaction with websites and applications. By combining GPT-4's advanced natural language processing with vision capabilities, Operator can interpret and navigate digital interfaces in a manner similar to a human user. At the heart of this system is the "Computer Using Agent," a model extensively trained on examples of human-computer interactions to ensure precision and adaptability. Currently, Operator is exclusively available to Pro Plan users in the U.S., a subscription tier that provides access to OpenAI's most advanced features. OpenAI has indicated plans to expand access in the future, potentially bringing this technology to a broader audience. Operator's primary strength lies in its ability to automate a wide range of tasks, reducing the need for manual effort. Its capabilities include: These features make Operator a versatile tool for simplifying time-consuming or repetitive tasks, offering users a more efficient way to manage their daily responsibilities. Here are additional guides from our expansive article library that you may find useful on OpenAI AI technologies. In practical applications, Operator has demonstrated impressive reliability and efficiency. For example, it can simultaneously book a hotel and reserve a restaurant table, completing both tasks with minimal user input. Benchmarks suggest that Operator outperforms similar AI tools in terms of speed and accuracy, positioning it as a strong contender in the growing field of AI-driven task automation. However, Operator's performance is not without limitations. Certain tasks, particularly those involving sensitive actions or complex workflows, still require user oversight. While its capabilities are robust, the need for occasional manual intervention highlights areas where the technology is still evolving. Despite its promise, Operator has several limitations that reflect its developmental stage. These include: These challenges underscore the need for further development before Operator can achieve full autonomy. While it excels in many areas, its current limitations suggest that it is best suited for relatively straightforward tasks at this stage. OpenAI has outlined ambitious plans for the future development of Operator, aiming to enhance its functionality and broaden its appeal. Expected advancements include: These updates could significantly expand Operator's capabilities, positioning it as a leading tool for task automation. Additionally, its success may inspire the development of open source alternatives and drive further innovation in the AI space, potentially reshaping how users interact with digital tools. At present, Operator is available exclusively to Pro Plan users in the U.S., a subscription tier priced at $200 per month. This level of exclusivity ensures that users gain access to OpenAI's most advanced features, but it also limits the tool's reach to a broader audience. OpenAI has hinted at plans to expand availability to team subscriptions, which could make Operator more accessible to businesses and collaborative users. As the technology matures, broader access could enable more individuals and organizations to benefit from Operator's capabilities, further solidifying its role in the AI-driven task automation landscape. Operator represents a significant step forward in AI's ability to perform practical, real-world tasks. By reducing manual effort and streamlining daily activities, it highlights the fantastic potential of AI in both personal and professional contexts. While it is not yet fully autonomous, its current capabilities demonstrate how AI can bridge the gap between human intent and task execution, offering a glimpse into a future where technology handles routine responsibilities with minimal input. As Operator continues to evolve, its impact on everyday life could be profound. By automating time-consuming tasks and simplifying complex workflows, it has the potential to enhance productivity and efficiency for users across various domains. However, its current limitations, such as the need for manual intervention and its high cost, suggest that widespread adoption may take time. With ongoing development and refinement, Operator could redefine how users interact with digital tools, making everyday tasks faster, easier, and more efficient.
[12]
OpenAI Unveils Operator, a ChatGPT Moment for AI Agents
OpenAI today introduced Operator, a new AI agent that can perform tasks on the web independently. Users simply give it instructions, and it completes the task without human intervention. "AI agents are AI systems that can do work for you independently. You give them a task, and they go off and do it," said OpenAI chief Sam Altman. Simply put, the Operator can navigate websites, fill out forms, and make purchases -- all by interacting with the web like a human. Unlike traditional automation tools that rely on APIs, Operator processes information visually, moving a virtual mouse and typing into a browser. "Before, if you wanted your model to buy stuff from Instacart, you'd need to figure out if Instacart had an API... Now, this is just using screenshots, no API, nothing," said OpenAI's Yash Kumar during the demo. Initially available for Pro users ($200 monthly ChatGPT Pro plan) in the US, Operator will expand to other regions, though European availability will take longer due to regulatory challenges. Altman, however, said that the company would make the tech "better, cheaper, and more widely available soon." Also, Operator will be released in OpenAI's API "in the next few weeks." "2025 is the year of agents," said OpenAI's Greg Brockman. "Operator -- research preview of an agent that can use its own browser to perform tasks for you." While Operator is impressive, it's not flawless. During the live demo, it made mistakes, such as selecting the wrong location for a restaurant booking. OpenAI admits that errors -- sometimes embarrassing ones -- are part of the early research phase. "Operator is an early research preview. It will do a lot of cool things. It also makes mistakes, sometimes embarrassing ones," Kumar noted. Safety is another concern. AI navigating the web independently could fall for scams, make incorrect purchases, or misinterpret user intent. To address this, Operator includes safeguards such as human confirmations and fraud detection. "What if the website is misaligned? Maybe it's fraudulent or asks Operator to wire money... We've developed our model to avoid those instructions, but we also have a separate layer -- like an antivirus -- that monitors suspicious activity," explained OpenAI's Reiichiro Nakano. OpenAI isn't the only company working on AI agents. Anthropic recently launched 'Computer Use,' a feature in Claude 3.5 Sonnet, which allows AI to navigate computers like humans -- using a cursor, clicking buttons, and typing text. Both 'Operator' and 'Computer Use' share a similar goal: enabling AI to interact with digital systems as a human would. However, the key difference is accessibility. While 'Computer Use' is primarily available through API integrations for developers, Operator is directly accessible to consumers through ChatGPT. "We really want to put it in people's hands," OpenAI emphasised. Performance also varies. In OSWorld, a test that evaluates AI's ability to use computers, OpenAI's COUA model scored 38.1%, while Claude 3.5 Sonnet scored 14.9%. This suggests Operator may be more reliable for real-world tasks, though both systems are still in development. Meanwhile, Perplexity, known for its AI-powered search engine, has taken a different approach with Perplexity Assistant. Unlike Operator, which focuses on web navigation, Perplexity Assistant is designed for mobile devices. As Perplexity describes it, "Perplexity Assistant uses reasoning, search, and apps to help with daily tasks." A key advantage of Perplexity Assistant is its deep integration with smartphone workflows. It can search the web, book appointments, and even use a phone's camera to identify objects. Unlike Operator, it maintains context across tasks, allowing users to research restaurants and book reservations in one seamless flow. However, Perplexity has struggled with reliability in past features. As one report noted, "Perplexity launched half-baked products in the past. For instance, our testing found that Perplexity's shopping feature... tended to be slow and error-prone." The company acknowledges these issues, with Srinivas from Perplexity stating, "Some Perplexity Assistant actions [might] not always work." While Perplexity Assistant competes more with Google Assistant and Siri, Operator is positioned as a tool that could disrupt traditional web-based tasks. What about Microsoft's Copilot Vision? Microsoft has also entered the AI assistant space with Copilot Vision, but its approach is distinct from Operator and Perplexity Assistant. Instead of performing tasks, Copilot Vision enhances browsing by reading pages, summarising content, and offering insights in real-time. As Microsoft describes it, "Copilot can now understand the full context of what you're doing online. When you choose to enable Copilot Vision, it sees the page you're on, it reads along with you, and you can talk through the problem you're facing together." Unlike Operator, Copilot Vision does not take independent actions like making bookings or purchases. It simply provides guidance while browsing, much like an intelligent companion. Privacy is also a key focus -- Copilot Vision is opt-in, and all browsing data is deleted once a session ends.
[13]
OpenAI's latest tool wants to use your computer for you
Summary OpenAI's new tool, Operator, can execute autonomous online tasks. Operator requires human confirmation for actions like purchases and has safeguards against external tampering and limitations like managing calendars. Rollout begins with US-based Pro subscribers, future availability for others pending improvements. OpenAI, the creators of ChatGPT and Dall-E, released a new product today. Called Operator, the "agent" is reportedly able to perform repetitive online tasks, and can even interact with website interfaces. OpenAI claims Operator is a solution for dull or time-consuming tasks, and has a significant level of autonomy. In the tool's introductory video, Operator responds to a prompt asking it to find a recipe in a website, then purchase the ingredients for it. It then starts performing the tasks on-screen, describing its steps and requesting additional input when needed. Related Google Gemini Advanced vs ChatGPT Plus: Which is better? Both services are great and cost the same, but which is better: Gemini Advanced or ChatGPT Plus? Posts Should you allow an AI to use your computer? Would you show ChatGPT your browsing history? The idea of letting an AI tool use your computer raises all sorts of concerns. Even for simple tasks, like the one in the example video, it can present risks. Imagine Operator buying 1,000 cans of sauce for the pasta, instead of just one, due to AI hallucination. According to OpenAI, Operator features "advanced reasoning", and also requires user confirmation for actions like purchases. Login credentials can only be provided by a human, too. In other situations, like online banking or dealing with sensitive information, OpenAI claims Operator either refuses the task outright or requires human supervision. How well that stands is yet to be seen, considering ChatGPT has been tricked in the past to provide dangerous information -- like, as reported by TechCrunch, a bomb recipe. The company states that Operator has a number of safeguards against external tampering, too. Lastly, since all activity happens on-screen, the user can simply call off the workflow. Related ChatGPT and Gemini are cool, but they're not where the future of AI is heading The future of AI is almost certainly going to be models trained for specific tasks, rather than general-purpose LLMs. Posts OpenAI's Operator will have a staged rollout US-based Pro users to get it first Right now, Operator has significant limitations, like creating slideshows or managing calendars -- Microsoft's Copilot office tools seem to have an edge there. OpenAI's plan is to perform a staged rollout, beginning with US-based ChatGPT Pro subscribers, who can already use the tool. Customers on other plans and living in other countries will be able to use Operator "in the future". OpenAI gave no timeline for wider availability, however, stating that it's still trying to improve the tool's "safety and usability at scale". Related I led ChatGPT-4o through a dungeon, and here's how it went After the release of the smartest AI known to man, I just had to know; how good is it at D&D? Posts
[14]
OpenAI Releases 'Operator' AI Agent That Can Perform Tasks for You
Operator is currently being rolled out to ChatGPT Pro users in the US, starting today. After months of anticipation, OpenAI has finally released its first "Operator" AI agent that can perform tasks for you on the web. It's the first AI agent that is being rolled out to consumers. So far, we have seen AI models generating texts, images, videos, and audio clips. However, with the agentic AI model, OpenAI has demonstrated that AI systems can perform actions and accomplish tasks as well. OpenAI's Operator AI agent is currently in early research preview and it's rolling out to ChatGPT Pro users in the US, starting today. The Pro subscription plan costs $200 per month. That said, OpenAI says Operator will be available to ChatGPT Plus users in more regions in the coming months. You can access the Operator AI agent at operator.chatgpt.com. As for how the Operator AI agent works, it basically connects to a web browser in a cloud environment where it performs tasks. It takes a screenshot of the screen and analyzes the visual information as to where to click next. It works similar to Anthropic's Computer Use tool. For deeper integration, OpenAI has partnered with many services including Uber, Instacart, OpenTable, DoorDash, eBay, and more. For example, you can ask Operator in natural language to reserve a table at a restaurant, book tickets, order snacks, buy groceries, and much more. Operator then opens the website and starts performing tasks. You can monitor the progress and also take control of the cloud browser. For critical tasks like making a payment or booking a ticket, Operator asks the user for final confirmation. Note that you have to enter your credentials and payment information into the cloud browser. OpenAI says that the Operator AI agent can "make mistakes" and it's not perfect since it's in early research preview. However, the ChatGPT maker put out some benchmarks for the Operator AI agent. In the OSWorld benchmark, OpenAI's Operator scored 38.1% and in the WebArena benchmark, it achieved 58.1%. Both scores are lower than average human scores. Nevertheless, Operator has kickstarted the agentic era. Google is also working on Project Mariner which can perform tasks for you in the Chrome browser, however, it's still in development. From the progress so far, it seems 2025 is going to be the year of AI agents.
[15]
ChatGPT's 'Operator' Browses the Web for You
OpenAI announced Operator today, its first attempt at an AI-powered agent that can automate complex tasks and perform various actions on websites to save you time. This includes making restaurant reservations, shopping online, and booking travel accommodations. Specific sensitive actions may require user approval. "On particularly sensitive websites, such as email, Operator requires active user supervision, ensuring users can directly catch and address any potential mistakes the model might make," OpenAI explains. This is the reason why Operator currently doesn't support sending emails or deleting calendar events, but OpenAI is working on that. There are automations in task categories like Delivery, Dinning, Shopping and Travel. Explanations of specific actions being utilized are displayed on the screen while Operator is performing automation. Instead of using developer APIs to plug into web apps, Operator's Computer-Using Agent (CUA) model has been trained to interact directly with website frontends using its own dedicated web browser. OpenAI claims Operator respects the terms of service agreements of its launch partners DoorDash, eBay, Instacart, Priceline, StubHub, and Uber. The ChatGPT maker doesn't expect the CUA to perform 100% reliably all the time. OpenAI's support document acknowledges as much, saying, "Operator cannot reliably handle many complex or specialized tasks." Some examples include "creating detailed slideshows, managing intricate calendar systems, or interacting with highly customized or non-standard web interfaces." Operator has other downsides, including task-specific rate limits and an overall usage limit that resets daily. Moreover, it can fail in some tasks, like solving a CAPTCHA challenge, and has difficulty navigating complex web interfaces. Operator is currently available as a research preview via operator.chatgpt.com to ChatGPT subscribers in the United States on the most expensive $200/month ChatGPT Pro plan. Folks on the Plus, Team, and Enterprise tiers must be patient as OpenAI works to bring Operator to those tiers. The feature will "soon" expand to additional languages and countries. Unfortunately, "Europe will take a while," CEO Sam Altman said. OpenAI earlier implemented simple automation capabilities in ChatGPT, like setting reminders, but Operator is its first attempt at an AI agent. Rival Google unveiled its own AI agent in November 2024, Project Mariner, as an experimental Chrome extension that can fill out web forms on your behalf, click buttons, move the mouse pointer, and more. AI agents are considered the next logical step in the AI revolution. These things promise to use the web on your behalf based on your prompt, freeing you from directly interacting with websites. However, the utility of AI agents is currently questionable, at best, as they're in experimental stages and won't be used widely until reliability improves. Source: OpenAI
[16]
Every question about OpenAI Operator -- answered
OpenAI has launched a research preview of Operator, a general-purpose AI agent capable of independently performing tasks by taking control of a web browser. This feature is first available to U.S. users on ChatGPT's $200 Pro subscription plan, with plans to expand to additional user tiers in the future. Operator can automate various tasks, including booking travel accommodations, making restaurant reservations, and online shopping. Users can select from categories such as shopping, delivery, dining, and travel within the Operator interface. When activated, a dedicated web browser window pops up, showing users the actions Operator performs alongside explanations. Users can maintain control of their screens while Operator operates in its own browser environment. OpenAI claims superintelligence is closer than you think The AI agent is powered by a Computer-Using Agent (CUA) model, which combines the vision capabilities of the GPT-4o model with advanced reasoning. CUA interacts with the front end of websites without requiring developer-focused APIs. This functionality allows it to use buttons, navigate menus, and fill out forms as a human would. OpenAI collaborates with various companies, including DoorDash, eBay, Instacart, and Priceline, ensuring Operator adheres to their terms of service agreements. OpenAI states that the CUA model is designed to ask for user confirmation before finalizing tasks that have external effects, such as submitting an order or sending an email. Despite its capabilities, OpenAI cautions that CUA may not perform reliably in all scenarios and struggles with complex tasks like creating detailed slideshows, managing intricate calendars, or navigating non-standard web interfaces. For sensitive tasks, such as banking transactions, user supervision is required. Operator does not collect or screenshot user data, and it mandates direct oversight on particularly sensitive sites like email and financial services, enabling users to address any errors promptly. Operator has certain limitations. OpenAI enforces rate limits -- both daily and task-dependent -- and specifies that certain tasks, like sending emails or deleting calendar events, will be refused for security reasons. OpenAI plans to revise these restrictions in the future, although no specific timeline is provided.https://www.youtube.com/watch?v=m0Cjiq8P6iU Operator may also encounter difficulties with complex web interfaces, password fields, and CAPTCHA checks, prompting the user to intervene at that point. OpenAI acknowledges the safety risks associated with AI systems that can take actions on the web, emphasizing the necessity to prevent potential exploits by malicious actors. OpenAI has implemented several safety measures. The agent requests user control input during sensitive transactions and conducts user confirmations before significant actions. Operator rejects specific high-risk tasks and requires direct supervision on sensitive platforms. Investigative measures include cautious navigation to prevent prompt injections, a monitoring system to pause operations during suspicious activities, and an automated detection pipeline for updated safeguards. Operator is a general-purpose AI agent that can autonomously perform tasks on the web using a dedicated browser. It interacts with websites by clicking buttons, navigating menus, and filling forms. Unlike traditional assistants, Operator doesn't just process information; it can perform actions on the web, like booking accommodations or ordering groceries, by interacting with websites directly. It can handle repetitive tasks like booking travel, ordering food, making reservations, and shopping online. The research preview allows OpenAI to gather feedback, improve safety, and refine the tool before wider deployment. CUA combines GPT-4o's vision capabilities with advanced reasoning, enabling Operator to see and interact with graphical user interfaces like buttons and forms. Not yet. Operator struggles with complex interfaces and specialized workflows. Operator has dynamic daily and task-specific usage limits, and it cannot perform tasks like sending emails or handling CAPTCHAs. It requires user supervision for sensitive actions, like inputting payment or login details, and does not store such data. Operator is designed with safeguards, including user confirmations, takeover mode for sensitive inputs, and monitoring for malicious activity. It asks for user confirmation before completing significant actions and employs monitoring systems to pause tasks if suspicious activity is detected. Users can opt out of data collection, delete browsing data, and control privacy settings through Operator's interface. It's trained to detect and ignore malicious inputs, and a monitoring system can pause tasks if something suspicious occurs. Currently, Operator is available to U.S. users on ChatGPT's $200 Pro subscription plan. OpenAI plans to roll it out globally, but Europe may take longer due to regional considerations. Yes, OpenAI plans to expand access to Plus, Team, and Enterprise tiers. Yes, OpenAI plans to release the CUA model in the API for developers to create their own agents. OpenAI is partnering with companies like DoorDash, Instacart, and Uber to optimize Operator's functionality while respecting terms of service.
[17]
OpenAI launches Operator -- an agent that can use a computer for you
Operator is available today at operator.chatgpt.com to anyone signed up with ChatGPT Pro, OpenAI's premium $200-a-month service. The company says it plans to roll the tool out to other users in the future. OpenAI claims that Operator outperforms similar rival tools, including Anthropic's Computer Use (a version of Claude 3.5 Sonnet that can carry out simple tasks on a computer) and Google DeepMind's Mariner (a web-browsing agent built on top of Gemini 2.0). The fact that three of the world's top AI firms have converged on the same vision of what agent-based models could be makes one thing clear. The battle for AI supremacy has a new frontier -- and it's our computer screens. "Moving from generating text and images to doing things is the right direction," says Ali Farhadi, CEO of the Allen Institute for AI (AI2). "It unlocks business, solves new problems." Farhadi thinks that doing things on a computer screen is a natural first step for agents: "It is constrained enough that the current state of the technology can actually work," he says. "At the same time, it's impactful enough that people might use it." (AI2 is working on its own computer-using agent, says Farhadi.) OpenAI's announcement also confirms one of two rumors that circled the internet this week. One predicted that OpenAI was about to reveal an agent-based app, after details about Operator were leaked on social media ahead of its release. The other predicted that OpenAI was about to reveal a new superintelligence -- and that officials for newly inaugurated President Trump would be briefed on it. Could the two rumors be linked? OpenAI superfans wanted to know. Nope. OpenAI gave MIT Technology Review a preview of Operator in action yesterday. The tool is an exciting glimpse of large language models' potential to do a lot more than answer questions. But Operator is a work in progress. "It's still early, it still makes mistakes," says Yash Kumar, a researcher at OpenAI. (As for the wild superintelligence rumors, let's leave that to OpenAI CEO Sam Altman to address: "twitter hype is out of control again," he posted on January 20. "pls chill and cut your expectations 100x!")
[18]
OpenAI's Operator Lets ChatGPT Use the Web for You
OpenAI is letting some users try a new ChatGPT feature that uses its artificial intelligence to operate a web browser in order to book trips, buy groceries, hunt for bargains, and do many other online chores. The new tool, called Operator, is an AI agent: it relies on an AI model trained on both text and images to interpret commands and figure out how to use a web browser to execute them. OpenAI claims it has the potential to automate many day-to-day tasks and workday errands. OpenAI's Operator follows rival releases by both Google and Anthropic, which have already demonstrated ones capable of using the web. AI agents are widely seen as the next evolutionary stage for AI following chatbots, and many companies have already hopped on the hype-train by touting them. In most cases, these are very limited in their abilities and simply use a language model to automate things normally done with regular software. "AI is evolving from this tool that could answer your questions to one that is also able to take action in the world, carrying out complex, multi-step workflows," says Peter Welinder, VP of product at OpenAI. "We'll see a lot of impact on people's productivity -- but also the quality of work that people are able to accomplish." OpenAI admits that giving ChatGPT access to a web browser does introduce new risks, and it says that Operator may sometimes misbehave. It says it has implemented various new safeguards and plans to extend Operator's capabilities gradually. Welinder and Yash Kumar, product and engineering lead for OpenAI's Computer Using Agent, say the plan is to learn from how people use the tool. They acknowledge that the tool could make unwanted bookings or purchases but add a lot of work has gone into ensuring that it asks before doing anything risky. "It will come back to me and ask for confirmations before taking steps that might be irreversible," Kumar says. OpenAI today also released a new "system card" outlining the problems that might arrive with Operator. These include the potential for it to misunderstand commands or diverge from what a user asks; to be misused by users; or to be targeted by cybercriminals. "It also poses an incredible amount of safety challenges," Kumar says. "Because your attack vector area and your risk vector area increase quite significantly." Operator will initially be available as a "research preview" for ChatGPT users with a Pro account, which costs a hefty $200 per month. The company says it plans to expand access while rolling the tool out slowly because it will inevitably make some mistakes along the way. In several demonstrations, Operator showed the potential for AI to take on a more active role as a web helper. The tool features a remote web browser and a chat window for communicating with a user. At WIRED's request, Operator was asked to book an Amtrak train trip from New Haven to Washington DC. It went to the right website, and entered the necessary information correctly to bring up the timetable, and then asked for further instruction. If a user were logged into the Amtrak website, or into a browser profile with stored credit card information, Operator would be able to go ahead and book a ticket -- although it is designed to ask for permission first. Kumar asked Operator to book a table at Beretta, a restaurant in San Francisco. The program went to the OpenTable website, found the correct restaurant and looked up availability before asking what to do next. OpenAI says it has partnered with a number of popular sites, including OpenTable, to ensure that Operator works smoothly on them. The new tool is based on OpenAI's GPT-4o AI model, which can perceive a browser and web page and converse in typed text. The tool incorporates additional training designed to help it understand how to execute tasks online. OpenAI will also make its Computer Use Agent available through its API.
[19]
OpenAI's new Operator AI agent can do things on the web for you
OpenAI is releasing a "research preview" of an AI agent called Operator that can "go to the web to perform tasks for you," according to a blog post. "Using its own browser, it can look at a webpage and interact with it by typing, clicking, and scrolling," OpenAI says. It's launching first in the US for subscribers of OpenAI's $200 per month ChatGPT Pro tier. Operator relies a "Computer-Using Agent" model that combines GPT-4o's vision capabilities with "advanced reasoning through reinforcement learning" to be able to interact with GUIs, OpenAI says. "Operator can 'see' (through screenshots) and 'interact' (using all the actions a mouse and keyboard allow) with a browser, enabling it to take action on the web without requiring custom API integrations," according to OpenAI. Operator can use reasoning to "self-correct," and if it gets stuck, it will give the user control. It will also ask the user to take over when a website asks for sensitive information like login credentials and "should" ask for a user to approve actions like sending an email. OpenAI also says that Operator has been designed to "refuse harmful requests and block disallowed content." OpenAI says that it's collaborating with companies such as DoorDash, Instacart, OpenTable, Priceline, StubHub, Thumbtack, Uber so that Operator "addresses real-world needs while respecting established norms." But the company cautions that not everything might work as you expect just yet; the tool currently has problems with "complex interfaces like creating slideshows or managing calendars." Down the line, OpenAI says it plans to bring Operator to Plus, Team, and Enterprise users and "integrate these capabilities into ChatGPT."
[20]
OpenAI launches Operator: How will this AI agent impact the industry?
In a significant move towards redefining task automation, OpenAI has unveiled Operator, an autonomous AI agent designed to perform tasks within a web browser. Currently available as a research preview to Pro users in the United States, Operator is positioned as a step forward in streamlining digital workflows. Today's digital world, driven by AI and its myriad applications, is increasingly defined by automation, from algorithms that curate social media feeds to bots that respond to customer queries. Yet, the notion of an AI agent autonomously navigating the internet and performing multi-step tasks represents a shift from specialised, pre-programmed tools to more generalised and adaptable systems. Operator doesn't just promise convenience, it embodies a vision of AI as a collaborator, capable of taking on the mundane so humans can focus on what they do best. At the heart of Operator lies OpenAI's Computer-Using Agent (CUA) model. The CUA seamlessly integrates GPT-4's vision capabilities with advanced reasoning facilitated by the o1-mini model. This combination allows the AI to navigate and interact with websites in a manner akin to a human user. Operator can scroll through pages, click buttons, fill out forms, and even type - offering a hands-free solution for tasks like booking travel, managing accounts, or even creating memes. While the idea of an AI agent "using a computer for you" might feel novel, its potential utility is clear when you consider the myriad of repetitive online tasks that professionals handle daily. Also Read: AI agents explained: Why OpenAI, Google and Microsoft are building smarter AI agents Operator's design allows it to emulate human interactions with web interfaces rather than relying on APIs or specialised integrations. This approach makes it adaptable to a broad range of use cases but also introduces challenges, such as navigating poorly designed websites or handling unexpected errors. OpenAI's reliance on the o1-mini model for reasoning ensures that the Operator can dynamically adjust to varying conditions, but the technology is still a work in progress, particularly when it comes to ensuring consistent performance across the web's diverse landscape. Operator's applications are versatile. Imagine managing a small business where inventory needs to be updated across multiple e-commerce platforms. Instead of logging into each account and manually inputting data, Operator could take on the responsibility. Or consider planning a trip - finding flights, booking hotels, and managing itineraries across different platforms. Operator can simplify these complex, multi-step workflows into a few prompts. For professionals, Operator could be a game-changer. Tasks like filling out expense reports, conducting online research, or monitoring market trends often consume a disproportionate amount of time. By delegating these to an AI agent, users can focus on strategic decision-making or creative problem-solving. Additionally, Operator's ability to integrate with existing workflows - without requiring extensive customisation - means that it could be adopted by individuals and businesses alike with minimal disruption. Also Read: Galaxy AI on Samsung Galaxy S25 series: Future of flagship AI on smartphones? From a personal perspective, the prospect of using Operator to tackle mundane but necessary tasks is appealing. It might mean reclaiming the hours spent filling out forms or conducting repetitive research - time that could be better spent on more creative or strategic activities. However, this also raises questions about dependency. If an AI agent can handle a significant portion of our digital tasks, how do we ensure we maintain a working knowledge of these processes? It's worth noting that Operator is currently in its early stages. OpenAI has limited its availability to Pro users in the United States, signaling a cautious approach to scaling this technology. There's likely a combination of technical and ethical considerations at play here. For one, Operator's reliance on real-time web interactions means it must contend with the unpredictability of web design and functionality. Websites are not standardized, and what works seamlessly on one platform might fail on another. Moreover, as Operator interacts with sensitive personal data, OpenAI must ensure robust safeguards are in place to protect user privacy and security. From a technical standpoint, OpenAI's decision to limit the rollout allows them to collect valuable user feedback and refine Operator's performance before expanding access. Early adopters will play a crucial role in identifying bugs, usability issues, and edge cases that the development team may not have anticipated. This iterative process is essential for building trust and ensuring that Operator meets the diverse needs of its users. Also Read: DeepSeek-R1, BLOOM and Falcon AI: Exploring lesser-known open source LLMs The launch of Operator is part of a broader shift in the AI industry - one that emphasises autonomous agents capable of performing increasingly complex tasks with minimal human oversight. This aligns with industry projections that autonomous agents will be a key focus in 2025, as organisations seek to improve efficiency and profitability. However, such advancements come with challenges, including the potential for misuse, ethical concerns, and the risk of over-reliance on automation. For businesses, Operator offers a glimpse into a future where routine tasks can be delegated to AI, freeing up human resources for higher-value activities. But it also raises questions about the evolving role of human workers in an AI-driven landscape. While some roles may become obsolete, new opportunities will likely emerge around managing, training, and overseeing these autonomous agents. The potential for improved productivity is significant, but it must be balanced with strategies to address workforce displacement and ensure equitable access to the benefits of AI. Despite its promise, Operator is not without limitations. Its ability to perform tasks depends heavily on the design and accessibility of the websites it interacts with. Websites with unconventional layouts, dynamic content, or complex CAPTCHA systems may pose challenges. Additionally, Operator's current capabilities are unlikely to extend to tasks requiring deep contextual understanding or creative problem-solving - areas where human judgment still reigns supreme. Moreover, Operator's reliance on web interactions introduces potential vulnerabilities. For instance, if a website's interface changes unexpectedly, the AI may struggle to adapt in real time. There are also concerns about how Operator handles sensitive information, such as login credentials or personal data. OpenAI's commitment to user privacy will be a critical factor in determining the technology's long-term adoption and success. Also Read: India's AI Dreams at Risk? US GPU Clampdown Explained: What does it mean for consumers? Looking ahead, OpenAI's measured approach to Operator's rollout suggests that the company is prioritising refinement and user feedback over rapid expansion. As the technology matures, we can expect iterative improvements, increased accessibility, and perhaps even integrations with other OpenAI products or third-party platforms. For now, Operator remains an intriguing glimpse into what's possible, offering just enough functionality to hint at a future where AI agents become indispensable tools in our digital toolkits. Operator's success will depend not only on its technical capabilities but also on how well it integrates into users' lives. Will it become an indispensable tool that enhances productivity, or will it be seen as an optional luxury for tech-savvy early adopters? As with any new technology, the answer will likely depend on how OpenAI addresses the challenges of accessibility, security, and trust.
[21]
Meet 'Operator', a web-enabled AI agent that performs tasks for You
OpenAI's Operator is an AI agent that automates web tasks like ordering food, scheduling appointments, and creating memes. It uses GPT-4o and reinforcement learning to understand and interact with websites, improving user productivity. Currently in research preview for US Pro users, Operator will expand to other user tiers and regions soon. OpenAI on Thursday introduced Operator, its first artificial intelligence (AI) agent, which can "go to the web to perform tasks for you". It marks the latest entry into the agents segment by a major player, following the likes of Google and Salesforce. ET explains what Operator can do, how it works and who can access it. What can Operator do? Users can ask Operator to carry out a range of repetitive browser tasks such as filling out forms, ordering groceries and even creating memes, OpenAI said in a blog post. Some who have access shared on social media that they tried using the agent to order dinner ingredients based on pictures and recipes, schedule a barber appointment by checking Google calendar availability, plan a trip by parsing recommendations on Reddit that would be within budget, among other tasks. OpenAI is collaborating with firms including food delivery app DoorDash, ecommerce site eBay, grocery delivery platform Instacart, taxi aggregator Uber, sports and entertainment ticket booking app StubHub to ensure conformity with their terms of service agreements. "It (Operator) has limitations and will evolve based on user feedback," OpenAI said. It added, however, that the agent has produced state-of-the-art results, setting new benchmarks when evaluated for full computer use tasks (38% success rate on the OSWorld benchmark) and web-based tasks (58% and 87% success rates on WebArena and WebVoyager benchmarks, respectively). How does it work? Operator processes raw pixel data to understand what's happening on the screen and uses a virtual mouse and keyboard to complete actions. It can recognise buttons, menus and text fields people see on a screen. It does not need to use back-end application programming interfaces (APIs) to interact with platforms. The agent is powered by a new model called Computer-Using Agent. This combines the vision capabilities of its most advanced generative AI model GPT-4o with advanced reasoning through reinforcement learning. The ability to use the same interfaces and tools that humans interact with on a daily basis broadens the utility of AI, helping people save time on everyday tasks while opening up new engagement opportunities for businesses, the company said. OpenAI CEO Sam Altman said during the launch livestream that AI agents are "going to be a big trend in AI and really impact the work people can do, how productive they can be, how creative they can be, what they can accomplish". Who is able to access it? Operator is currently a research preview, available to Pro users in the United States. The company plans to expand access to Plus, Team and Enterprise users and integrate Operator's capabilities into ChatGPT in the future. It will also be available in other countries "soon", Altman said during the livestream. "Europe will, unfortunately, take a while," he added.
[22]
OpenAI unveils 'Operator' web-based AI Agent for task automation
OpenAI has introduced Operator, an AI-powered agent designed to complete web-based tasks autonomously. Using a built-in browser, Operator can interact with websites by typing, clicking, and scrolling, simplifying a variety of repetitive tasks for users. Operator is one of OpenAI's first "agents," AI tools capable of performing tasks independently based on user instructions. Currently in a research preview phase, Operator is designed to evolve through user feedback. According to OpenAI, it can handle tasks like filling out forms, ordering groceries, and even creating memes. "Operator can use the same interfaces people interact with daily, helping save time and enhancing digital engagement opportunities," OpenAI explained. Operator is powered by a new model called the Computer-Using Agent (CUA), which integrates the vision capabilities of GPT-4 with advanced reasoning through reinforcement learning. CUA allows Operator to interact with graphical user interfaces (GUIs), such as buttons, menus, and text fields, by analyzing screenshots and performing actions like a human user. When Operator encounters challenges, it uses reasoning to self-correct. For more complex scenarios, it hands control back to the user, enabling a collaborative experience. Operator has already achieved state-of-the-art benchmark results in WebArena and WebVoyager, key tests for browser-based task performance. OpenAI has prioritized safety and privacy in Operator, implementing multiple safeguards to ensure secure usage: While Operator is designed with robust safeguards, OpenAI acknowledges it is still a research preview and may encounter limitations. Operator is in its early stages and may face challenges with tasks involving complex interfaces, such as creating slideshows or managing calendars. OpenAI has outlined its future plans: OpenAI is collaborating with companies such as DoorDash, Instacart, OpenTable, Priceline, and others to refine Operator for real-world applications. It is also exploring public sector use cases with organizations like the City of Stockton to simplify access to government services. Through these partnerships, OpenAI aims to ensure that Operator delivers practical value across diverse industries while improving its functionality based on user and business feedback. Operator became available to Pro users in the U.S. starting January 23, 2025, through operator.chatgpt.com. Users can initiate tasks by describing what they need and can take over control whenever necessary. OpenAI plans to gradually roll out Operator to additional user tiers, including Plus, Team, and Enterprise, once its safety and usability are thoroughly validated.
[23]
OpenAI's Operator AI Agent Can Automate Web-Based Tasks
OpenAI has released a preview of its first AI agent, dubbed Operator, that can automate "repetitive browser tasks such as filling out forms, ordering groceries, and even creating memes." At launch, Operator is available to those with a $200-per-month ChatGPT Pro subscription in the US, with plans to roll out the tool to Plus, Team, and Enterprise users in the future. Operator is powered by a new model called a Computer-Using Agent (CUA). It combines the visual capabilities of the GPT-4o model with "advanced reasoning through reinforcement learning" to interact with the buttons, menus, and text fields people often see on their screens. It can see through screenshots and take all the actions a mouse and a keyboard allow. When provided with a command, Operator uses its own browser to execute the task, handling all the typing, clicking, and scrolling by itself. When stuck, it returns the control to the user and asks for clarification before proceeding. In a video, an OpenAI employee demonstrates the tool by making it fetch a pasta recipe from the Allrecipes website and order the necessary ingredients through Instacart. Another video shows how the tool can be used to plan trips using Priceline. Operator can also tap into services like DoorDash, OpenTable, StubHub, Thumbtack, and Uber. OpenAI says the tool is designed to reject harmful requests and block banned content. It's also supposed to reject sensitive tasks, such as a banking transaction or acting on a job application. Additionally, Operator will ask a user to take over when required to enter sensitive information, such as login credentials or payment details. Tasks like submitting an order or sending an email also require a nod from the user. As with its other models, OpenAI warns that its latest is still in its "research preview" and "may make mistakes." In October last year, rival AI firm Anthropic launched a similar tool with the Claude 3.5 Sonnet model. And Google is reportedly building a Gemini-based agent to automate Chrome tasks.
[24]
ETtech Explainer: OpenAI's new AI agent, Operator, and what it can do
OpenAI has unveiled 'Operator,' its first AI agent capable of automating various web browser tasks like booking flights, ordering groceries, and creating memes. Currently available only to ChatGPT Pro users in the US, Operator relies on the Computer-Using Agent model to navigate websites and requires user intervention for complex tasks. This tool aims to streamline repetitive online activities.OpenAI introduced "Operator," its first AI agent on Thursday. Operator is designed to handle tasks such as filling out forms, booking travel tickets, booking concert tickets, filling an online grocery order, or even creating memes by remotely operating a web browser using mouse clicks, scrolling, and typing, just like a person. According to OpenAI, Operator is a research preview of an agent designed to browse the web pages and perform tasks on your behalf, automating a variety of actions. "Operator can be asked to handle a wide variety of repetitive browser tasks such as filling out forms, ordering groceries, and even creating memes," OpenAI said in an online post. Also Read: OpenAI unveils 'Operator' agent that handles web tasks How does it work? Operator uses a model known as the Computer-Using Agent (CUA), based on GPT-4o, to interpret website screenshots and navigate using standard browser controls like a cursor and mouse. Users simply provide instructions, such as "Book a flight" or "Order groceries online," and Operator handles the process. If it encounters a challenge, such as CAPTCHAs or password fields, it pauses and prompts the user to intervene, maintaining user control throughout. Who can use Operator? Currently, Operator is only available for ChatGPT Pro users in the United States who are over 18 years of age. OpenAI is limiting access initially to gather user feedback and improve the tool's functionality. The company plans to expand availability to other paid users and eventually integrate Operator directly into ChatGPT. Handling challenges If Operator faces a task it cannot complete, such as navigating a complex interface or lacking details, it will alert the user and pause, suggesting you to take over. After manually resolving the issue, users can either complete the task themselves or let Operator resume. Limitations of Operator As per OpenAI's official website, at this stage, Operator cannot handle complex or specialised tasks, such as creating detailed slideshows, managing advanced calendar systems, or navigating highly customised web interfaces. Additionally, in its research preview, Operator deliberately avoids high-stakes actions such as processing financial transactions, sending emails, or deleting calendar events to prioritise user safety and reliability. Operator supports running multiple tasks simultaneously but enforces limits on the number of concurrent tasks and conversations for security reasons. These limits may vary, and users will receive a notification if they reach the maximum allowed.
[25]
OpenAI's new Operator AI agent handles tasks -- but with hiccups
How Nike's new CEO plans to revive the struggling sportswear giant OpenAI has released Operator, a largely autonomous AI tool designed to execute tasks on the internet based on simple text prompts. Operator stands apart by not only providing answers to queries but taking actionable steps to complete tasks. This AI agent aims to handle routine digital tasks such as scheduling appointments or conducting online transactions directly through a dedicated browser environment managed by OpenAI's servers. However, early user feedback suggests inconsistent performance and a higher frequency of hallucinations compared to previous iterations like ChatGPT. OpenAI's Operator represents the next evolutionary leap for AI by venturing beyond mere conversational engagements into practical task completion. Unlike other AI tools such as Anthropic's Computer Use and Google DeepMind's Mariner, Operator operates remotely, executing tasks via a browser on OpenAI's servers. This distinction allows Operator to stand out by autonomously performing actions on the web. However, some users have expressed concerns over Operator's speed, reporting sluggishness compared to expectations set by OpenAI's demonstrations. The integration of Operator with OpenAI's other products marks a significant step in AI evolution. Currently available as a research preview for ChatGPT Pro subscribers in the United States, OpenAI plans to make Operator accessible to ChatGPT Plus, Team, and Enterprise users in the future. This move aims to seamlessly integrate Operator's capabilities into existing platforms, enhancing productivity and broadening AI's practical usage in everyday applications. Despite its innovative promise, Operator's development phase suggests it remains a work in progress, with OpenAI advising users not to overestimate its abilities. The competitive landscape in AI task automation is intensifying. OpenAI claims Operator outperforms similar tools from Anthropic and Google DeepMind, leveraging the visual skills of the GPT-4o model to navigate web environments effectively. Operator uses screenshots and pixel scanning to interpret and execute web tasks. This capability positions it as a formidable contender in AI technology, promising efficiency improvements. Despite these advancements, Operator's reliance on external browsers presents unique challenges. Its current propensity for errors underscores the need for further refinements before achieving widespread reliability. Safety considerations and ethical concerns remain prominent as OpenAI continues to refine Operator. The tool's autonomy introduces potential risks, including unintended consequences from task executions and vulnerabilities in circumventing existing system protection measures. Such concerns highlight the ongoing need for stringent safeguards to ensure responsible AI deployment. OpenAI acknowledges these challenges and emphasizes the importance of user confirmation before Operator finalizes tasks with significant external impacts. The emergence of Operator and similar technologies could pose challenges to traditional internet services. For example, advanced AI task automation may disrupt search engines reliant on user interaction data for targeted advertisements. Such shifts underscore the broader implications of AI evolution for established business models. Operator's advancement reflects a broader technological trend where AI increasingly influences digital commerce and online interactions. Despite its early troubles, Operator signals an important shift in AI applications from passive information retrieval to active task management. As OpenAI continues to enhance this technology, Operator is likely to play a significant role in transforming how digital tasks are conducted, bridging the gap between human intentions and technological execution.
[26]
OpenAI launches Operator, an AI agent that performs tasks autonomously | TechCrunch
OpenAI CEO Sam Altman kicked off this year by saying in a blog post that 2025 would be big for AI agents, tools that can automate tasks and take actions on your behalf. Now, we're seeing OpenAI's first real attempt. OpenAI announced on Thursday that it is launching a research preview of Operator, a general-purpose AI agent that can take control of a web browser and independently perform certain actions. Operator is coming to U.S. users on ChatGPT's $200 Pro subscription plan first. OpenAI says it plans to roll this feature out to more users in its Plus, Team, and Enterprise tiers eventually. This initial research preview is available through operator.chatgpt.com, but soon, OpenAI says it wants to integrate Operator into ChatGPT. The new Operator feature promises to automate tasks such as booking travel accommodations, making restaurant reservations, or shopping online, according to OpenAI. There are several task categories users can choose from within Operator - including shopping, delivery, dining, and travel - all of which enable different kinds of automation. When ChatGPT users activate the Operator agent, a small window will pop up showing a dedicated web browser that the agent uses, along with text to explain the tasks the agent is taking. Users can still take control of their screen while Operator is working. OpenAI says that Operator is powered by computer-using agent, or CUA, that combines the vision capabilities of the company's GPT-4o model with the reasoning abilities from OpenAI's more advanced models. The CUA is trained to interact with the front-end of websites, meaning it doesn't need to use developer-facing APIs to tap into different services. In other words, the CUA can use buttons, navigate menus, and fill out forms on a webpage -- much like a human would. "The CUA model is trained to ask for user confirmation before finalizing tasks with external side effects, for example before submitting an order, sending an email, etc., so that the user can double-check the model's work before it becomes permanent," OpenAI writes in materials provided to TechCrunch. "[It] has already proven useful in a variety of cases, and we aim to extend that reliability across a wider range of tasks." OpenAI says it's collaborating with companies like DoorDash, Instacart, Priceline, StubHub, and Uber to ensure Operator respects the norms of these businesses. But OpenAI warns that CUA isn't perfect. The company says it "[doesn't] expect CUA to perform reliably in all scenarios just yet." Out of an abundance of caution, OpenAI is also requiring supervision for some tasks, like banking transactions, CUA and Operator might be able to perform entirely on their own. "On particularly sensitive websites, such as email, Operator requires active user supervision, ensuring users can directly catch and address any potential mistakes the model might make," OpenAI says in its materials. Operator appears to be OpenAI's boldest attempt yet at creating an AI agent. Last week, OpenAI released Tasks, giving ChatGPT simple automation features such as the ability to set reminders and schedule prompts to run at a set time every day. Tasks gave ChatGPT users some familiar, but necessary, features to make ChatGPT as practical to use as Siri or Alexa. However, Operator shows off capabilities that the previous generation of virtual assistants could never do.
[27]
OpenAI's Operator Is Here to Do the Clicking and Typing for You
OpenAI has introduced a new AI agent called Operator that's designed to make everyday tasks easier, from making dinner reservations and ordering groceries to filling out forms. In a demo video posted Thursday, the company highlights how the AI agent can interact with web pages by typing, clicking and scrolling, when using a special browser. Users can describe the task they want to be done, but it can handle multiple requests at the same time, such as shopping on Etsy while making a dinner reservation elsewhere. It can "see" via screenshots and "interact" in the same way a mouse and keyboard would allow within a browser, according to OpenAI. Operator, which OpenAI describes as "one of our first agents," is currently available in a limited preview. With competitors like Google and Anthropic already offering similar AI agents, OpenAI is working to narrow the gap. It's also part of OpenAI's larger effort to make its generative AI even more useful by automating more aspects of daily life, potentially getting closer to delivering on the promise that it'll forever change the way we interact with technology. "The ability to use the same interfaces and tools that humans interact with on a daily basis broadens the utility of AI, helping people save time on everyday tasks while opening up new engagement opportunities for businesses," the company said in a blog post. The tool is powered by a new model called the Computer-Using Agent (CUA), which combines GPT-4o's vision capabilities with advanced reasoning through reinforcement learning. It's trained to interact with graphical user interfaces, including the buttons, menus and text fields people see on a screen. If any problems arise, the company said Operator can use its reasoning capabilities to self-correct or return control to the user. It's also trained to ask the user to take over for tasks that require certain inputs, such as login credentials or payment details. The tool is now available to paying Pro users in the US at operator.chatgpt.com.
[28]
OpenAI's big, new Operator AI already has problems
OpenAI has announced its AI agent tool, called Operator, as a research preview as of Thursday, but the launch isn't without its minor hiccups. The artificial intelligence brand showcased features of the new tool in an online demo, explaining that Operator is a Computer Using Agent (CUA) based on the GPT-4o model, which enables multi-modal functions, such as the ability to search the web and being able to understand the reasoning of the search results. Recommended Videos However, those who have gotten a chance to test the tool have noted critiques, such slow responsiveness in comparison to the demos, and hallucinations akin to the standard ChatGPT chatbot, according to Quartz. Similarly, Bezingza reported that some X users' complaints got the attention of OpenAI CEO, Sam Altman. According to the publication, an X user shared on the platform issues with Operation's interactions with a news website. The CEO reportedly responded, promising a prompt fix of the issue. However, what the user experienced could potentially be a hallucination. While many are fascinated by the features of Operator showcased in OpenAI's launch demo, price remains a big deterrent from experimenting with the AI agent. The tool is available under OpenAI's $200 per month ChatGPT Pro tier, making it a rather exclusive experience. BGR writer Chris Smith noted being a ChatGPT Plus subscriber, but said he could not justify paying $200 per month to access the Operator tool. However, OpenAI is expected to bring Operator to its ChatGPT Plus, Team, and Enterprise tiers at some point. One of users' biggest complaints is that the Operator is currently available only in the U.S. European users have expressed disdain that they cannot access the tool. ComputerWorld also noted that AI agents as a whole can potentially introduce unique safety risks, such as using automated ecosystems to launch traffic attacks or bypass CAPTCHA codes. While OpenAI insists it has a security on lock on Operator, the publication spoke with researchers that noted the technology could clash with search engines, such as Google, that have their own data processing strategies in place for its own purposes.
[29]
OpenAI Reportedly Launching 'Operator' That Can Control Your Computer This Week
So-called "computer use agents" are expected to be a major leap in AI that will allow bots to actually complete tasks on your behalf. OpenAI is reportedly preparing for the launch of Operator sometime this week. Operator is name of its computer-use agent that can complete tasks in a user's web browser on their behalf. Other companies including Google and Anthropic have been developing similar "agents" in hopes they will be the next major leap towards AI fulfilling its promise of being able to perform tasks currently done by humans. According to The Information, which first reported on the impending launch, Operator will provide users with suggested prompts in categories like travel and dining and events. Users could, for instance, ask Operator to find a good flight from New York to Maui that would not have them landing too late in the evening. Operator will not complete a transactionâ€"the user will remain in the loop and complete the checkout process. It is easy to imagine certain ways Operator could be useful. Aging individuals who are not computer savvy could potentially ask Operator to help them send an email, and see it navigate to Gmail and open a compose window for them. Tech savvy people do not need this type of help, but older generators often struggle navigating the web and completing even simple tasks is a challenge. Bots could help in other areas as well, such as in quality-assurance testing where companies need to test that their new websites or services work properly. So-called "computer use agents" do come with potential risks. We have already seen a startup introduce a web-navigating bot to automate the process of posting marketing spam to Reddit. Bots that take control of the end-user client are able to bypass API limitations meant to block automation. AI startups will need to take some measures to combat abuse, or else websites will become even more flooded with spam than they are today. These agents like Operator essentially work by taking screenshots of a user's browser and sending the images back to OpenAI for analysis. Once its models determine the next step necessary to complete a task, a command is sent back to the browser to move and click the mouse on the appropriate target, or type into an input box. It takes advantage of multi-modal technology OpenAI and others have been developing that can interpret multiple forms of input, in this case text and imagery. The entire promise of a recent crop of AI startups is that they will be able to create an artificial general intelligence (AGI) that can replace humans on most tasks they perform today and make everyone's lives more efficient. As exponential gains in the performance of language models have slowed, these companies have been looking for new unlocks that will get them there, and computer use agents are one. An artificial intelligence can not truly replace humans until it can physically complete the tasks for themâ€"writing is just part of a task. Bots also need to be able to navigate spreadsheets, watch videos, and more. After Anthropic released an initial preview of its computer use bot, early testers complained it was half-baked at best, getting stuck in loops when it does not know what to do or forgetting the task and starting to do something else entirely, like looking at pictures of nature on Google Images. It is also slow, and expensive to operate. Keeping humans in the loop will be essential with a bot that is granted such high-level control and access to critical data. It seems like perhaps computer-use agents will be akin to self-driving cars. Google was able to make a car drive down a straightaway on its own easy enough, but the edge-case scenarios have taken years to solve. There is debate on how to measure AGI and when it will be "achieved," but OpenAI has told its biggest backer Microsoft that it believes AGI will be reached once it has created an AI that can generate at least $100 billion in profit. That is a lofty goal considering OpenAI predicts it will generate $12 billion in revenue in 2025 while still losing billions. At the same time, neither Microsoft nor Google has seen enterprise customers willing to adopt AI tools as fast as they hoped. Instead of charging $20-30 per employee to add AI tools into their bundles, both companies are now shoving AI into their standard bundles and hiking the prices by a couple of dollars respectively.
[30]
OpenAI's agent that can do work for you is here
Inside Barron Trump's high-end real estate ambitions with GOP connections Operator uses its own browser, and can interact with a webpage by typing, clicking, and scrolling, OpenAI said. Users can have Operator do tasks such as completing online forms and grocery shopping, according to the startup. The AI agent is powered by a new OpenAI model called Computer-Using Agent (CUA), which combines vision capabilities from OpenAI's multimodal GPT-4o model with advanced reasoning from reinforcement learning. CUA was trained to interact with graphical user interfaces, or GUIs, such as buttons and text fields on webpages. Because Operator has "reasoning" skills, it can "self-correct" and give users back control when it needs help. The research preview is only being made available to ChatGPT Pro users in the U.S. for now, OpenAI said, because it has "limitations and will evolve based on user feedback." One example, the startup said, is "challenges with complex interfaces like creating slideshows or managing calendars." The startup plans to roll the AI agent out to other ChatGPT users, and eventually integrate Operator's capabilities into the chatbot. Operator was designed "to refuse harmful requests and block disallowed content," OpenAI said, adding that the startup can send warnings and revoke access over multiple violations through its moderation systems. The AI agent "is trained to ensure that the person using it is always in control and asks for input at critical points," the startup added. For example, Operator will prompt the user to take over when it needs to fill out sensitive information, such as logging in to a website or entering credit card details. "While Operator is designed with these safeguards, no system is flawless and this is still a research preview," OpenAI said.
[31]
OpenAI unveils 'Operator' agent that handles web tasks
OpenAI on Thursday introduced an artificial intelligence program called "Operator" that can tend to online tasks such as ordering items or filling out forms. Operator can look up web pages and interact with them by typing, clicking, or scrolling the way a person might, according to OpenAI. "Operator can be asked to handle a wide variety of repetitive browser tasks such as filling out forms, ordering groceries, and even creating memes," OpenAI said in an online post. "The ability to use the same interfaces and tools that humans interact with on a daily basis broadens the utility of AI, helping people save time on everyday tasks while opening up new engagement opportunities for businesses." An AI "agent," the latest Silicon Valley trend, is a digital helper that is supposed to sense surroundings, make decisions, and take actions to achieve specific goals. Google in December announced agent capabilities with the launch of Gemini 2.0, its most advanced artificial intelligence model to date. AI race rival Anthropic two months earlier added a "computer use" feature to its Claude frontier AI model in an experimental public beta phase. "Developers can direct Claude to use computers the way people do -- by looking at a screen, moving a cursor, clicking buttons, and typing text," Anthropic said in a post at the time, cautioning that it was a work in progress. OpenAI described Operator as one of its first AI agents capable of doing work for people independently, designed to complete tasks it is given. Operator is available only to US users who pay for Pro subscriptions to the OpenAI service "to ensure a safe and iterative rollout," OpenAI said. "If it encounters challenges or makes mistakes, Operator can leverage its reasoning capabilities to self-correct," OpenAI said. "When it gets stuck and needs assistance, it simply hands control back to the user." Operator is trained to ask the user to take over for tasks that require login, payment details, or when solving "CAPTCHA" security challenges intended to distinguish between people and software online, according to OpenAI. "Users can have Operator run multiple tasks simultaneously by creating new conversations, like ordering a personalized enamel mug on Etsy while booking a campsite on Hipcamp," OpenAI said.
[32]
OpenAI unveils 'Operator' agent that handles web tasks
OpenAI on Thursday introduced an artificial intelligence program called "Operator" that can tend to online tasks such as ordering items or filling out forms. Operator can look up web pages and interact with them by typing, clicking, or scrolling the way a person might, according to OpenAI. "Operator can be asked to handle a wide variety of repetitive browser tasks such as filling out forms, ordering groceries, and even creating memes," OpenAI said in an online post.OpenAI on Thursday introduced an artificial intelligence program called "Operator" that can tend to online tasks such as ordering items or filling out forms. Operator can look up web pages and interact with them by typing, clicking, or scrolling the way a person might, according to OpenAI. "Operator can be asked to handle a wide variety of repetitive browser tasks such as filling out forms, ordering groceries, and even creating memes," OpenAI said in an online post. "The ability to use the same interfaces and tools that humans interact with on a daily basis broadens the utility of AI, helping people save time on everyday tasks while opening up new engagement opportunities for businesses." An AI "agent," the latest Silicon Valley trend, is a digital helper that is supposed to sense surroundings, make decisions, and take actions to achieve specific goals. Google in December announced agent capabilities with the launch of Gemini 2.0, its most advanced artificial intelligence model to date. AI race rival Anthropic two months earlier added a "computer use" feature to its Claude frontier AI model in an experimental public beta phase. "Developers can direct Claude to use computers the way people do-by looking at a screen, moving a cursor, clicking buttons, and typing text," Anthropic said in a post at the time, cautioning that it was a work in progress. OpenAI described Operator as one of its first AI agents capable of doing work for people independently, designed to complete tasks it is given. Operator is available only to US users who pay for Pro subscriptions to the OpenAI service "to ensure a safe and iterative rollout," OpenAI said. "If it encounters challenges or makes mistakes, Operator can leverage its reasoning capabilities to self-correct," OpenAI said. "When it gets stuck and needs assistance, it simply hands control back to the user." Operator is trained to ask the user to take over for tasks that require login, payment details, or when solving "CAPTCHA" security challenges intended to distinguish between people and software online, according to OpenAI. "Users can have Operator run multiple tasks simultaneously by creating new conversations, like ordering a personalized enamel mug on Etsy while booking a campsite on Hipcamp," OpenAI said.
[33]
ChatGPT's New Operator Agent Helps Put AI to Work For You
You Can Now Set ChatGPT to Complete Tasks, and This Is How to Use It Properly The company behind ChatGPT is unveiling a new way to use AI. OpenAI has just announced Operator, an AI agent that can perform tasks for you autonomously. Operator Available Only for Pro Subscribers The AI agent is designed to go to the web and accomplish tasks for you. Some of the tasks it can currently take care of include making a dinner reservation, shop for an item, filling out a form, and more. OpenAI is collaborating with a number of companies, including DoorDash, Instacart, OpenTable, Priceline, StubHub, Thumbtack, and Uber. In a blog post, OpenAI described more on how Operator works: Operator can "see" (through screenshots) and "interact" (using all the actions a mouse and keyboard allow) with a browser, enabling it to take action on the web without requiring custom API integrations. If it encounters challenges or makes mistakes, Operator can leverage its reasoning capabilities to self-correct. When it gets stuck and needs assistance, it simply hands control back to the user, ensuring a smooth and collaborative experience. Along with some preset prompts, users can also add their own custom instructions. Operator can also run multiple tasks simultaneously. The agent is far from perfect, though. OpenAI said that Agent currently has problems with complex interfaces, like managing a calendar. Operator is available on a separate site. For now, Operator is only available as a "research preview" for anyone who subscribes to the $200 monthly Pro tier launched in December 2024. In the future, the tool will arrive to Plus, Team, and Enterprise users. Operator will also be integrated directly into ChatGPT. The Next Generation of AI While ChatGPT, and other chatbots, have previously been passive and required users to lead the interaction, Operator is a new breed of AI. Putting AI to work to do tasks around the web can make the technology even more useful in everyday life. OpenAI is betting heavily on AI agents. In a blog post in late 2024, CEO Sam Altman said AI agents might enter the workforce in 2025.
[34]
OpenAI introduces Operator to automate tasks like vacation planning, restaurant reservations
OpenAI is taking its ChatGPT chatbot to the next level, adding a feature to automate tasks like planning vacations, filling out forms, making restaurant reservations and ordering groceries. The tool, announced on Thursday, is called Operator. OpenAI describes it as "an agent that can go to the web to perform tasks for you" and added that it's trained to interact with "the buttons, menus, and text fields that people use daily" on the web. It can also ask follow-up questions to further personalize the tasks it completes, such as login information for other websites. Users can take control of the screen at any time. "Operator is one of our first agents, which are AIs capable of doing work for you independently," OpenAI wrote in a blog post on Thursday. "You give it a task and it will execute it." For now, Operator is only available to ChatGPT Pro users. It can be accessed at Operator.ChatGPT.com. OpenAI said it eventually plans to expand to Plus, Team and Enterprise users and to integrate Operator into ChatGPT. The company also said it currently has trouble with some tasks, such as managing calendars and creating slideshows. OpenAI, which is backed by Microsoft, said users can opt out of some of the company's training data collection by turning off the "Improve the model for everyone" setting in ChatGPT, meaning data in Operator will not be used to train its models. The company also said users can delete all browsing data and log out of all sites "with one click" in the privacy section. Operator directly competes with an earlier release from Anthropic, the Amazon-backed AI startup behind the Claude chatbot that was founded by ex-OpenAI research executives. In October, Anthropic introduced "Computer Use," a capability that allowed its AI agents to use computers like humans to complete complex tasks. Anthropic said it can interpret what's on a computer screen, select buttons, enter text, navigate websites and execute tasks through any software and real-time internet browsing. The tool can "use computers in basically the same way that we do," Jared Kaplan, Anthropic's chief science officer, told CNBC in an interview at the time. He said it can do tasks with "tens or even hundreds of steps." The generative AI market, which includes OpenAI and Anthropic as well as Google, Amazon, Microsoft and Meta, is predicted to top $1 trillion in revenue within a decade. Google recently agreed to a new investment of more than $1 billion in Anthropic, a source familiar with the situation confirmed to CNBC this week. Anthropic is in late-stage talks to raise a funding round of $2 billion at a $60 billion valuation led by Lightspeed Venture Partners, CNBC reported earlier this month. OpenAI is pushing towards a potential future of artificial general intelligence. AGI is a vaguely defined benchmark referring to AI that equals or surpasses human intellect on a wide range of tasks. Scale AI CEO Alexandr Wang, whose company provides training data to key AI players, said Thursday in an interview with CNBC that he defines AGI as "powerful AI systems that are able to use a computer just like you or I could." He said it will likely take two to four years to reach that level of the technology.
[35]
OpenAI Launches 'Operator' In Research Preview, A Super AI Agent That Can Perform Tasks For You
OpenAI has been aggressively pushing to evolve in AI technology and keep exploring new potential. While the company is focused on bringing new updates and stepping into varied domains, it started this year with an announcement that AI agents would be the next big thing in 2025 and will help automate user tasks. While the idea of AI handling your tasks on your behalf seemed quite fascinating, the company has accelerated the project by launching Operator in research preview. OpenAI has shared its ambitious approach to redefining AI technology and is working towards achieving artificial general intelligence. It has even shared the progress it has made towards this and the confidence it has in the direction their onto. It hinted that AI agents would become the next big thing and started the year with an announcement about Operator, an AI agent that is underway and meant to handle browser-based user tasks. Now, the company has officially entered the AI agent revolution by launching its advanced AI agent, Operator. This agent acts like an automated personal assistant and can execute tasks such as booking flights, handling customer service, managing workflows, and other hands-on tasks in real-time. OpenAI shared about the development through a press release on its website. According to the press release, the Operator is currently available as a research preview for Pro users in the U.S. OpenAI has used the Computer-using-agent (CUA) model to automate tasks, which is a smart move given that some tasks are generally repetitive and can be time-consuming. With AI stepping in, it can handle them from the browser directly. The AI assistant is said to be intuitive as it combines GPT-4o's vision capabilities with logical human-like reasoning so that the AI agent seems natural and can handle real-world tasks. The operator was initially launched as a research preview to fine-tune the model before it is fully rolled out. OpenAI plans to integrate it into ChatGPT, which could bring advanced functionality to a broader audience and serve as a game-changer for the super AI agent. OpenAI has also hinted at a collaboration with Instacart to handle grocery shopping for users as the company plans to use Operator across multiple sectors and extend its access.
[36]
OpenAI's First AI Agent Is Here and This Is What It Can Do
OpenAI plans to launch the AI agent to more subscription tiers eventually OpenAI released its first artificial intelligence (AI) agent, Operator, on Thursday. Currently available as a research preview, the agent comes with a dedicated web browser. It is a general-purpose AI agent that can autonomously perform tasks online based on prompts given by the user. The AI firm said the tool can be used to book tickets online, reserve a table in a restaurant, or buy a product online. Currently, Operator is only available in the US to ChatGPT Pro subscribers, but the company plans to expand it to other subscription tiers in the future. In a live stream, OpenAI CEO Sam Altman introduced the company's first AI agent. Explaining what agents are, Altman said, "AI agents are AI systems that do work for you independently. You give them a task, and they go off and do it. We think it will be a big trend in AI." Operator is powered by the Computer-Using Agent (CUA), an AI model that combines vision capabilities from GPT-4o with advanced reasoning, an OpenAI blog post explained. The AI agent was post-trained using reinforcement learning. It can interact with graphical user interfaces (GUIs) including buttons, menus, and text fields on the screen. With its dedicated browser, the agent can perform tasks behind the scenes while freeing up the screen for the user. The AI agent accepts both text and images as input. To complete tasks, the CUA processes raw pixel data of the screen and uses a virtual keyboard and mouse to execute actions. OpenAI claims it can navigate multi-step tasks, handle errors, and can also adapt to unexpected changes. Rowan Cheung, founder of the AI newsletter The Rundown AI, had early access to Operator and highlighted some of its use cases in a series of posts on X (formerly known as Twitter). The AI agent was able to plan a weekend trip based on advice from Reddit, a specific budget, and interests. Interestingly, when the agent was blocked from accessing Reddit, it completed the task by running a Bing search with Reddit as a keyword. In another instance, Cheung asked the Operator to find cryptocurrency tokens worth looking into. During its research, the agent got stuck on an "Are you human" CAPTCHA and immediately pinged the user to take control to confirm. Once Cheung confirmed, the AI agent took control and continued with the task. The AI agent can seamlessly allow the user to jump in and take control at any given time and edit or change the task. Once the user is done, they can also give the control back to the agent. This ensures that the user has control over the AI agent at all times. OpenAI also stated that it is collaborating with companies such as DoorDash, eBay, Instacart, and Uber to ensure that Operator respects the terms of service agreements of these businesses while accessing the platforms. Coming to safety, the AI firm claimed that it has run extensive safety testing and has implemented mitigations against three safety classes -- misuse, model mistakes, and frontier risks. To reduce the risk of misuse, OpenAI has trained the CUA model to refuse harmful tasks and illegal or regulated activities. The company has also blocked gambling, adult entertainment, as well as drug and gun retailer websites. In addition, the company has also implemented automated and human-based reviews of user interactions. For model mistakes or hallucinations, the AI agent is trained to ask for user confirmation before finalising tasks with external side effects. The CUA also declines to help with tasks such as banking transactions and while accessing sensitive websites, the agent requires active user supervision. Frontier risks are the unexpected actions taken by a state-of-the-art AI model as it is generally not tested exhaustively. OpenAI said the CUA model has been evaluated against its Preparedness Framework, and the Operator System Card provides full details into the safety approach and ongoing improvements. Currently, Operator is only available via the operator.chatgpt.com URL to ChatGPT Pro subscribers in the US. The company has stated that it plans to integrate the AI agent with all ChatGPT clients in the future. Notably, a ChatGPT Pro subscription is priced at $200 (roughly Rs. 17,200) a month.
[37]
AI tool Operator launched by OpenAI, capable of performing web tasks independently By Investing.com
Investing.com -- A new artificial intelligence (AI) tool named Operator has been unveiled by OpenAI today, designed to independently carry out tasks on the web. This tool uses its own browser to interact with webpages through typing, clicking, and scrolling. As a research preview, Operator has some limitations but will evolve based on user feedback. Operator can manage a variety of repetitive browser tasks, including filling out forms, ordering groceries, and creating memes. This tool expands the functionality of AI by using the same interfaces and tools that humans interact with daily, saving people time on routine tasks and providing new opportunities for businesses. For a safe and iterative rollout, the launch of Operator is initially available to Pro users in the U.S. at operator.chatgpt.com. This early release will help gather feedback from users and the broader ecosystem, enabling improvements over time. The plan is to eventually extend access to Plus, Team, and Enterprise users and integrate these capabilities into ChatGPT in the future. Operator is powered by a new model named Computer-Using Agent (CUA), which combines GPT-4o's vision capabilities with advanced reasoning through reinforcement learning. CUA is designed to interact with graphical user interfaces (GUIs) like buttons, menus, and text fields. Operator can see and interact with a browser, allowing it to take action on the web without requiring custom API integrations. In case of challenges or mistakes, Operator can use its reasoning capabilities to self-correct. If it encounters a task it cannot complete, it hands control back to the user, ensuring a smooth and collaborative experience. Despite being in its early stages, CUA has achieved new benchmark results in WebArena and WebVoyager, two key browser use benchmarks. To use Operator, users simply need to describe the task they would like done. Users can take over control of the remote browser at any point, and Operator is designed to ask the user to take over for tasks that require login, payment details, or when solving CAPTCHAs. Users can personalize their workflows in Operator by adding custom instructions for all sites or specific ones. Operator also allows users to save prompts for quick access on the homepage, ideal for repeated tasks. Users can have Operator run multiple tasks simultaneously by creating new conversations. Operator transforms AI from a passive tool to an active participant in the digital ecosystem. It aims to streamline tasks for users and offer benefits to companies that seek innovative customer experiences and higher conversion rates. Collaborations with companies like DoorDash (NASDAQ:DASH), Instacart (NASDAQ:CART), OpenTable, Priceline, StubHub, Thumbtack, Uber (NYSE:UBER), and others are underway to ensure Operator addresses real-world needs while respecting established norms. Efforts are also being made to improve accessibility and efficiency of certain workflows, particularly in public sector applications, by working with organizations like the City of Stockton to simplify enrollment in city services and programs.
[38]
OpenAI unveils 'Operator' agent that handles web tasks
San Francisco (AFP) - OpenAI on Thursday introduced an artificial intelligence program called "Operator" that can tend to online tasks such as ordering items or filling out forms. Operator can look up web pages and interact with them by typing, clicking, or scrolling the way a person might, according to OpenAI. "Operator can be asked to handle a wide variety of repetitive browser tasks such as filling out forms, ordering groceries, and even creating memes," OpenAI said in an online post. "The ability to use the same interfaces and tools that humans interact with on a daily basis broadens the utility of AI, helping people save time on everyday tasks while opening up new engagement opportunities for businesses." An AI "agent," the latest Silicon Valley trend, is a digital helper that is supposed to sense surroundings, make decisions, and take actions to achieve specific goals. Google in December announced agent capabilities with the launch of Gemini 2.0, its most advanced artificial intelligence model to date. AI race rival Anthropic two months earlier added a "computer use" feature to its Claude frontier AI model in an experimental public beta phase. "Developers can direct Claude to use computers the way people do -- by looking at a screen, moving a cursor, clicking buttons, and typing text," Anthropic said in a post at the time, cautioning that it was a work in progress. OpenAI described Operator as one of its first AI agents capable of doing work for people independently, designed to complete tasks it is given. Operator is available only to US users who pay for Pro subscriptions to the OpenAI service "to ensure a safe and iterative rollout," OpenAI said. "If it encounters challenges or makes mistakes, Operator can leverage its reasoning capabilities to self-correct," OpenAI said. "When it gets stuck and needs assistance, it simply hands control back to the user." Operator is trained to ask the user to take over for tasks that require login, payment details, or when solving "CAPTCHA" security challenges intended to distinguish between people and software online, according to OpenAI. "Users can have Operator run multiple tasks simultaneously by creating new conversations, like ordering a personalized enamel mug on Etsy while booking a campsite on Hipcamp," OpenAI said.
[39]
OpenAI Launches Operator, Its First AI Agent
In a live-streamed video on Thursday, OpenAI CEO Sam Altman introduced Operator as a system that can search the web, move a cursor, click on web pages, and generally do much of what a human can do while operating a computer. Operator is technically known as a Computer Using Agent (or CUA) built on GPT-4o OpenAI's current flagship model. As an example of how the tech could be used to complete everyday tasks, Altman and a group of OpenAI researchers uploaded an image of a shopping list to Operator, then asked it to search Instacart for the ingredients. Operator began searching Instacart's website and adding the ingredients to a cart. After confirming the price, Operator asked if they were ready to place the order. The team also used Operator to find NBA tickets on StubHub and order lunch on DoorDash. The team also showed that if Operator gets tripped up, users can easily take over from Operator by pressing a button labeled "take control." This could also be used to add payment information for orders or complete CAPTCHA requests. Users can also customize Operator, which the company says could be used to define preferences, like always booking a specific airline or buying a specific brand of ice cream.
[40]
OpenAI's agent tool may be nearing release
OpenAI may be close to releasing an AI tool that can take control of your PC and perform actions on your behalf. Tibor Blaho, a software engineer with a reputation for accurately leaking upcoming AI products, claims to have uncovered evidence of OpenAI's long-rumored Operator tool. Publications including Bloomberg have previously reported on Operator, which is said to be an "agentic" system capable of autonomously handling tasks like writing code and booking travel. According to The Information, OpenAI is targeting January as Operator's release month. Code uncovered by Blaho this weekend adds credence to that reporting. OpenAI's ChatGPT client for macOS has gained options, hidden for now, to define shortcuts to "Toggle Operator" and "Force Quit Operator," per Blaho. And OpenAI has added references to Operator on its website, Blaho said -- albeit references that aren't yet publicly visible. According to Blaho, OpenAI's site also contains not-yet-public tables comparing the performance of Operator to other computer-using AI systems. The tables may well be placeholders. But if the numbers are accurate, they suggest that Operator isn't 100% reliable, depending on the task. On OSWorld, a benchmark that tries to mimic a real computer environment, "OpenAI Computer Use Agent (CUA)" -- possibly the AI model powering Operator -- scores 38.1%, ahead of Anthropic's computer-controlling model but well short of the 72.4% humans score. OpenAI CUA surpases human performance on WebVoyager, which evaluates an AI's ability to navigate and interact with websites. But the model falls short of human-level scores on another web-based benchmark, WebArena, according to the leaked benchmarks. Operator also struggles with tasks a human could perform easily, if the leak is to be believed. In a test that tasked Operator with signing up with a cloud provider and launching a virtual machine, Operator was only successful 60% of the time. Tasked with creating a Bitcoin wallet, Operator succeeded only 10% of the time. OpenAI's imminent entry into the AI agent space comes as rivals including the aforementioned Anthropic, Google, and others make plays for the nascent segment. AI agents may be risky and speculative, but tech giants are already touting them as the next big thing in AI. According to analytics firm Markets and Markets, the market for AI agents could be worth $47.1 billion by 2030. Agents today are rather primitive. But some experts have raised concerns about their safety, should the technology rapidly improve. One of the leaked charts shows Operator performing well on selected safety evaluations, including tests that try to get the system to perform "illicit activities" and search for "sensitive personal data." Reportedly, safety testing is among the reasons for Operator's long development cycle. In a recent X post, OpenAI co-founder Wojciech Zaremba criticized Anthropic for releasing an agent he claims lacks safety mitigations. "I can only imagine the negative reactions if OpenAI made a similar release," Zaremba wrote. It's worth noting that OpenAI has been criticized by AI researchers, including ex-staff, for allegedly de-emphasizing safety work in favor of quickly productizing its technology.
[41]
OpenAI launches Operator, an AI agent that can operate your computer
On Thursday, OpenAI released a research preview of "Operator," a web automation tool that uses a new AI model called Computer-Using Agent (CUA) to control computers through a visual interface. The system performs tasks by viewing and interacting with on-screen elements like buttons and text fields similar to how a human would. Operator is available today for subscribers of the $200 per month ChatGPT Pro plan at operator.chatgpt.com. The company plans to expand to Plus, Team, and Enterprise users later. OpenAI intends to integrate these capabilities directly into ChatGPT and later release CUA through its API for developers. Operator watches on-screen content while you use your computer and executes tasks through simulated keyboard and mouse inputs. The Computer-Using Agent processes screenshots to understand the computer's state and then makes decisions about clicking, typing, and scrolling based on its observations. OpenAI's release follows other tech companies as they push into what are often called "agentic" AI systems, which can take actions on a user's behalf. Google announced Project Mariner in December 2024, which performs automated tasks through the Chrome browser, and two months earlier, in October 2024, Anthropic launched a web automation tool called "Computer Use" focused on developers that can control a user's mouse cursor and take actions on a computer. "The Operator interface looks very similar to Anthropic's Claude Computer Use demo from October," wrote AI researcher Simon Willison on his blog, "even down to the interface with a chat panel on the left and a visible interface being interacted with on the right." To use your PC like you would, the Computer-Using Agent works in multiple steps. First, it captures screenshots to monitor your screen, then analyzes those images (using GPT-4o's vision capabilities with additional reinforcement learning) to process raw pixel data. Next, it determines what actions to take and then performs virtual inputs to control the computer. This iterative loop design reportedly lets the system recover from errors and handle complex tasks across different applications.
[42]
OpenAI's Operator can surf the web for you
OpenAI has begun previewing a new tool called Operator that can navigate within a web browser. According to a blog post published Thursday, the software is powered by what the company calls a Computer-Using Agent. "CUA is trained to interact with graphical user interfaces (GUIs) -- the buttons, menus, and text fields people see on a screen -- just as humans do," says OpenAI of the model. "This gives it the flexibility to perform digital tasks without using OS- or web-specific APIs." The current release of Operator builds on OpenAI's GPT-4o model. It combines the vision capabilities of that algorithm with "advanced reasoning" trained through reinforcement learning. Operator has the ability to "break tasks into multi-step plans and adaptively self-correct when challenges arise." According to OpenAI, that capability represents the next stage in AI development. As with past research previews, OpenAI warns that Operator is "still early and has limitations," and that it won't "perform reliably in all scenarios just yet." For instance, depending on the complexity of the task and interface involved, the agent greatly benefits from the user taking a few extra moments to write a more detailed prompt. Per The Verge, Operator will give the user control if it ever gets stuck on a task. It will also hand control over whenever a website asks for sensitive information, including login credentials. The company says it designed the tool to "refuse harmful requests and block disallowed content." OpenAI is making Operator first available to users of its $200 per month ChatGPT Pro subscription. It is also partnering with companies like Instacart to offer the agent on their platforms, though there again you'll need a ChatGPT Pro subscription to test the integration. Operator joins a growing list of AI agents that can either navigate a web browser or an entire operating system. Anthropic was the first to offer the capability with the release of its Claude 3.5 Sonnet model in October, followed more recently by Google with its Gemini 2.0 model and Project Mariner.
[43]
OpenAI unveils 'Operator' agent that handles web tasks
SAN FRANCISCO (AFP) - OpenAI on Thursday introduced an artificial intelligence program called "Operator" that can tend to online tasks such as ordering items or filling out forms. Operator can look up web pages and interact with them by typing, clicking, or scrolling the way a person might, according to OpenAI. "Operator can be asked to handle a wide variety of repetitive browser tasks such as filling out forms, ordering groceries, and even creating memes," OpenAI said in an online post. "The ability to use the same interfaces and tools that humans interact with on a daily basis broadens the utility of AI, helping people save time on everyday tasks while opening up new engagement opportunities for businesses." An AI "agent," the latest Silicon Valley trend, is a digital helper that is supposed to sense surroundings, make decisions, and take actions to achieve specific goals. Google in December announced agent capabilities with the launch of Gemini 2.0, its most advanced artificial intelligence model to date. AI race rival Anthropic two months earlier added a "computer use" feature to its Claude frontier AI model in an experimental public beta phase. "Developers can direct Claude to use computers the way people do-by looking at a screen, moving a cursor, clicking buttons, and typing text," Anthropic said in a post at the time, cautioning that it was a work in progress. OpenAI described Operator as one of its first AI agents capable of doing work for people independently, designed to complete tasks it is given. Operator is available only to US users who pay for Pro subscriptions to the OpenAI service "to ensure a safe and iterative rollout," OpenAI said. "If it encounters challenges or makes mistakes, Operator can leverage its reasoning capabilities to self-correct," OpenAI said. "When it gets stuck and needs assistance, it simply hands control back to the user." Operator is trained to ask the user to take over for tasks that require login, payment details, or when solving "CAPTCHA" security challenges intended to distinguish between people and software online, according to OpenAI. "Users can have Operator run multiple tasks simultaneously by creating new conversations, like ordering a personalised enamel mug on Etsy while booking a campsite on Hipcamp," OpenAI said.
[44]
OpenAI introduces Operator to automate tasks like vacation planning, restaurant reservations
OpenAI CEO Sam Altman visits "Making Money With Charles Payne" at Fox Business Network Studios in New York on Dec. 4, 2024. OpenAI is taking its ChatGPT chatbot to the next level, adding a feature to automate tasks like planning vacations, filling out forms, making restaurant reservations and ordering groceries. The tool, announced on Thursday, is called Operator. OpenAI describes it as "an agent that can go to the web to perform tasks for you" and added that it's trained to interact with "the buttons, menus, and text fields that people use daily" on the web. It can also ask follow-up questions to further personalize the tasks it's completing, such as login information for other websites. Users can take control of the screen at any time. "Operator is one of our first agents, which are AIs capable of doing work for you independently," OpenAI wrote in a blog post on Thursday. "You give it a task and it will execute it." For now, Operator is only available to ChatGPT Pro users. It can be accessed at Operator.ChatGPT.com. OpenAI said it eventually plans to expand to Plus, Team and Enterprise users and to integrate Operator into ChatGPT. The company also said it currently has trouble with some tasks, such as managing calendars and creating slideshows. OpenAI, which is backed by Microsoft, said users can opt out of some of the company's training data collection by turning off the "Improve the model for everyone" setting in ChatGPT, meaning data in Operator will not be used to train its models. The company also said users can delete all browsing data and log out of all sites "with one click" in the privacy section. Operator directly competes with an earlier release from Anthropic, the Amazon-backed AI startup behind the Claude chatbot that was founded by ex-OpenAI research executives. In October, Anthropic introduced "Computer Use," a capability that allowed its AI agents to use computers like humans to complete complex tasks. Anthropic said it can interpret what's on a computer screen, select buttons, enter text, navigate websites and execute tasks through any software and real-time internet browsing. The tool can "use computers in basically the same way that we do," Jared Kaplan, Anthropic's chief science officer, told CNBC in an interview at the time. He said it can do tasks with "tens or even hundreds of steps." The generative AI market, which includes OpenAI and Anthropic as well as Google, Amazon, Microsoft and Meta, is predicted to top $1 trillion in revenue within a decade. Google recently agreed to a new investment of more than $1 billion in Anthropic, a source familiar with the situation confirmed to CNBC this week. Anthropic is in late-stage talks to raise a funding round of $2 billion at a $60 billion valuation led by Lightspeed Venture Partners, CNBC reported earlier this month. OpenAI is pushing towards a potential future of artificial general intelligence. AGI is a vaguely defined benchmark referring to AI that equals or surpasses human intellect on a wide range of tasks. Scale AI CEO Alexandr Wang, whose company provides training data to key AI players, said Thursday in an interview with CNBC that he defines AGI as "powerful AI systems that are able to use a computer just like you or I could." He said it will likely take two to four years to reach that level of the technology.
[45]
OpenAI's new 'Operator' touted as the next breakthrough in artificial intelligence
TL;DR: In 2025, OpenAI plans to launch "Operator," an autonomous AI agent capable of performing tasks like coding and travel arrangements without user oversight. Unlike ChatGPT, Operator aims for full task autonomy. Amidst the AI arms race of 2025, OpenAI has detailed its plans to release an autonomous artificial intelligence agent entitled 'Operator.' As reported by Bloomberg, the fancy new AI agent is designed to take actions on behalf of the user. For example, writing code or making your travel arrangements. The key distinction of agents from a tool like ChatGPT is that agents represent a shift to fully autonomous task management. Meaning, you set it up, give it a task, and let it figure out the rest. No hand-holding, minimal oversight. Given how common it is for AI features to simply... not work. I'm reluctant to entrust an AI for my own travel plans. However, OpenAI Chief Executive Officer Sam Altman has been persistent in stating that agents represent the next key breakthrough for artificial intelligence. NVIDIA's Chief Executive Officer, Jensen Huang, echoes this sentiment, proclaiming that 'IT will become the HR of AI Agents.' The reported capabilities of Operator include 'Ph.D. level intelligence', with the potential to exceed human capabilities. The functionality also appears to, essentially, allow it to work from your computer. Large firms and shareholders will be undoubtedly salivating at the notion of supplementing a human workforce with competent, autonomous AI agents. However, as to whether the tool works as advertised, we'll discover when it becomes available as a research tool in late January.
[46]
OpenAI Debuts 'Operator' -- a New Milestone for Computer-Using Agents
OpenAI said it collaborated with DoorDash, Instacart, OpenTable, Uber, and others to ensure Operator addresses real-world needs. Since mid-2024, AI developers have increasingly equipped their models with the ability to use computers as the technology evolves from chatbots that speak to agents that act. Although Google and Anthropic were the first to release computer-using agents, industry leader OpenAI has just dropped "Operator," setting new performance standards for the emerging class of AI. OpenAI Introduces Computer-Using Agent Equipped with its own dedicated web browser, Operator can view webpages and interact with them by typing, clicking, and scrolling, opening up a whole host of new possibilities compared to previous chat-based OpenAI platforms. According to OpenAI's announcement , "Operator can be asked to handle a wide variety of repetitive browser tasks such as filling out forms, ordering groceries, and even creating memes." As the company put it: "The ability to use the same interfaces and tools that humans interact with on a daily basis broadens the utility of AI, helping people save time on everyday tasks while opening up new engagement opportunities for businesses." Operator vs. Claude Computer Use vs. Google Mariner The launch of Operator points to growing competition in the computer-using agent sector, where it is vying for a position with Claude Computer Use and Google's Project Mariner -- although all three are technically still in the beta stages. Performance benchmarks reveal that OpenAI's entry has set a new bar. The new agent scored 38.1% on the OSWorld computer-interaction-focused benchmark, surpassing Anthropic's Claude Computer Use, which only managed 22.0%. On WebVoyager, which assesses agents' ability to navigate and retrieve data from the internet, Operator scored 87%, pulling ahead of Google Mariner's 83.5%. Meanwhile, Anthropic's Claude Computer Use lagged behind by just 56%. Use Cases for AI Computer Use The ability of AI agents to interact with computers holds huge potential for both everyday AI users and commercial or industrial applications. Foreseeing potential use cases, OpenAI said it collaborated with DoorDash, Instacart, OpenTable, Uber, and others to ensure Operator addresses real-world needs. Meanwhile, the new platform lets users personalize their workflows by adding custom instructions such as setting website preferences. As the technology evolves, perhaps the greatest potential for AI computer use lies in its integration with popular smart assistants like Siri and Alexa. These platforms have traditionally integrated third-party applications on an app-by-app basis. But if they are enhanced with Operator-style functionality, the smart assistants of tomorrow will be able to interact with a potentially limitless array of apps and services.
[47]
ChatGPT Gets a Personal Assistant Upgrade -- If You're Willing to Pay $200 a Month - Decrypt
On Thursday, OpenAI unveiled a new feature, dubbed Operator, that lets ChatGPT take control of a virtual browser to perform real-world tasks like ordering food or booking flights. But so far, it's aimed at rich people. The tool, currently available only to Pro subscribers ($200/month) in the U.S., marks the company's first venture into autonomous web browsing. It highlights the emergence of a tiered financial system, where those who pay more gain access to the best AI features. At the same time, lower-paying users are limited to less capable models with restricted functionality -- arguably not that democratic. The system works through operator.chatgpt.com, where users can ask ChatGPT to handle various online chores. There have been some attempts to do similar things in the past, from the OpenAI plugin store to the promise of Large Action Models popularized by Rabbit. Still, their reliance on APIs made them inconvenient and challenging to set up. What makes this different is how it works. Instead of relying on APIs as its predecessors, Operator controls a cloud-based browser, clicking buttons and filling forms just like a human would. Every time Operator makes a move, it snaps a screenshot to show you what it's doing. For example, if you need to book a ticket to a game, the AI will open up its own browser, go to a specific site, look for the game in question, and find the best options before asking you to confirm the payment. It will also walk you through its decision-making process with visual proof. If things go sideways, there's a "Take Control" button that lets humans grab the wheel. To succeed where others failed, OpenAI had to build its own AI model to visually understand the information shown by a web browser and control actions with keyboard and mouse inputs. The new model, powered by GPT-4o, was named Computer User Agent (CUA). This isn't just about following scripts. The AI can read and understand website layouts, adapt to different designs, and even handle unexpected pop-ups or error messages. The system shows off some impressive party tricks. Hand it a photo of your messy handwritten shopping list, and it'll not only use GPT-Vision to read it but actually order everything from your preferred grocery store. OpenAI has partnered with several companies to ensure smooth operations across their platforms. When booking a ride or ordering food, the AI can navigate services like Uber and DoorDash without hiccups since it's preconfigured to have an understanding of their interfaces. However, for unsupported websites, the system still attempts to complete tasks using its browser control capabilities. This is where Operator beats other alternatives. As usual, OpenAI shared some benchmarks: It beats other State-of-the-art models, scoring 38.1% on OSWorld (proficiency at handling standard Operating Systems) vs. 22% by the best competitor and 58.1% on WebArena (handling of e-commerce sites) vs. 36.2% by the competitors. That said, the team emphasized Operator is still a research preview, so errors and bugs are expected. One potential sticking point might make security-minded users pause: you need to trust Operator with your login credentials. The cloud browser requires access to your accounts to get anything done, and since it's not compatible with local browsers, logging in with a remote web browser trusting on OpenAI's pinky promise to not store sensitive data may seem like a bit of a red flag. The feature is set for a broader rollout soon, with Plus subscribers next in line. Developers won't be left out either -- OpenAI plans to release Operator through its API in the coming weeks, potentially spawning a new generation of AI-powered automation tools. OpenAI says more instances are coming beyond cloud web browsing control. The team said during their demonstration that they're also working on expanding the roster of AI agents beyond the current general-purpose assistant.
[48]
OpenAI Operator leak suggests it's coming to the ChatGPT Mac app soon - here's why it's a big deal
Aside from the possible introduction of artificial general intelligence (AGI), AI agents, autonomous processes that you can instruct to perform complex tasks for you on your computer, will be perhaps the biggest new AI feature in 2025. Agents could be essential for turning your mobile phone into a true AI assistant, capable of doing whatever you ask it without you needing to get involved. OpenAI has been teasing us with the release of its first AI agent, called Operator, for a while now, but the latest code leak suggests that it could arrive very soon and on the Mac. A new leak on X from Tibor Blaho claims to have revealed evidence that OpenAI's Operator agent is coming to the ChatGPT Mac app. Tobor has discovered hidden options to define shortcuts for the desktop launcher to "Toggle Operator" and "Force Quit Operator," which might indicate that you might need a quick way to shut it down if it gets out of control! Tobor also claims to have found code in the browser version of ChatGPT that references Operator with references to an "Operator System Card Table," "Operator Research Eval Table," and "Operator Refusal Rate Table." The last entry indicates that perhaps the Operator fails to perform the tasks it is asked to do enough to require a refusal rate. Recently, one of the founders of OpenAI, Wojciech Zaremba, slammed rival Anthropic in a post on X for releasing its AI agent without the necessary safety precautions in place. His post read: "Anthropic -- just released a computer-using agent without any safety mitigations. I can only imagine the negative reactions if OpenAI made a similar release". It's an AI agent's ability to integrate into your daily computer tasks that makes it such a big step forward for AI and has the potential to change how we interact with our devices entirely. Just imagine if you didn't have to book hotel rooms yourself, pay bills, or even write code. Obviously, a lot of work is going to need to be done before people will trust an AI agent to perform such tasks autonomously, and privacy will be a key issue.
[49]
Meet 'Operator,' OpenAI's Game-Changing Autonomous AI Super Agent For Tackling Complex Tasks
There is no doubt that OpenAI has revolutionized AI and redefined the potential of the technology. Its ambitious approach to evolving and bringing more cutting-edge solutions has the company constantly working on either a new project or an upgrade. While Sam Altman has recently expressed his confidence in the development of AGI and how superintelligence is the next big goal, reports are now coming in about the ChatGPT maker gearing up to launch a new artificial intelligence agent, namely, Operator, which is meant to handle tasks autonomously on behalf of users. Ever since the inception of ChatGPT, the tech community has been hooked on what OpenAI has in store. The company has not slowed down and has been vigorously working towards bringing forward novel and innovative solutions for its users. It keeps on challenging itself and other companies to bring forward advancements in AI. Now, a report from Bloomberg has emerged, sharing OpenAI's plans to bring forward an AI super-agent, codenamed Operator, that is able to handle complex tasks that require deep knowledge and understanding. It is expected that the AI tool will be first made available as a research preview and a developer tool in January and is being compared to Ph.D.-level intelligence as it is said to be a breakthrough in artificial intelligence and could even go beyond human intelligence. The announcement marks a pivotal moment in the evolution of AI as the leap is meant to enable autonomous handling of tasks and move towards machine-performing tasks more seamlessly. It has also intensified competition as other tech giants are rushing to bring forward their own super agents. Anthropic, for instance, demonstrated its AI's agents capabilities, and Google is also said to be working on a similar offering meant to come out in December. While it is still ambiguous when OpenAI will release the Operator to the public, the announcement of its super agent marks the evolution of its systems and puts overall competitive pressure on companies to bring ahead active engagement with computer interfaces capable of handling complex tasks. OpenAI's CEO Sam Altman, earlier on, also took to Reddit during the 'Ask Me Anything' session to share his future plans with its users. He stated: We will have better and better models, but I think the thing that will feel like the next giant breakthrough will be agents. Given how OpenAI and other tech giants are pursuing AI agents actively, it seems like this year the big wave would be autonomous artificial intelligence systems.
[50]
Leaked: OpenAI's Operator might take over your PC soon
OpenAI may soon release an AI tool capable of taking control of users' PCs and performing actions on their behalf, referred to as the Operator tool. Software engineer Tibor Blaho, known for accurately leaking upcoming AI products, claims to have found evidence supporting this development. OpenAI is reportedly aiming for a January launch of Operator. Blaho's recent discoveries include hidden options in OpenAI's ChatGPT client for macOS that allow users to define shortcuts to "Toggle Operator" and "Force Quit Operator." Furthermore, Blaho notes that OpenAI has added references to Operator on its website, although these references are not yet publicly visible. OpenAI to launch autonomous AI agent Operator in January According to Blaho, the website also contains unpublished tables comparing the performance of Operator with other computer-using AI systems. If the numbers are accurate, they indicate that Operator is not entirely reliable, depending on the task. For instance, in a benchmark on OSWorld, which simulates a real computer environment, the "OpenAI Computer Use Agent (CUA)" scored 38.1%, better than Anthropic's model but significantly below the 72.4% score achieved by humans. The OpenAI CUA does outperform human agents on the WebVoyager test, which assesses an AI's web navigation skills, but it underperforms on another benchmark, WebArena. Operator appears to struggle with tasks typically easy for humans. In tests requiring Operator to sign up for a cloud provider and launch a virtual machine, it succeeded 60% of the time. Meanwhile, it managed to create a Bitcoin wallet only 10% of the time, according to the leaked benchmarks. OpenAI is entering the AI agent space at a time when competitors like Anthropic and Google are also advancing in this area. Analytics firm Markets and Markets projects the market for AI agents could reach $47.1 billion by 2030. While AI agents remain in a primitive stage of development, some experts express concerns about their safety, especially if technology improves rapidly. One leaked chart indicates that Operator performs well in certain safety evaluations, particularly in resisting attempts to engage in illicit activities and search for sensitive personal data. Reportedly, safety testing has contributed to the lengthy development cycle of Operator. OpenAI co-founder Wojciech Zaremba criticized Anthropic's recent agent release for lacking safety measures, indicating potential backlash if OpenAI were to expedite a similar release. Criticism has been directed at OpenAI by AI researchers and former staff for allegedly prioritizing the rapid productization of technology over safety measures.
[51]
OpenAI releases Operator agent as rivals enhance their AI services - SiliconANGLE
OpenAI releases Operator agent as rivals enhance their AI services OpenAI today introduced Operator, an artificial intelligence agent that can automatically perform tasks on users' behalf. Two of the company's highest-profile rivals announced their own product updates in conjunction. Perplexity AI Inc., a startup with a popular AI search engine, introduced an agent similar to Operator for its Android app. Anthropic PBC, which already offers such automation capabilities, debuted a tool that will enable its AI models to include better citations in prompt responses. OpenAI's new Operator agent is initially available in the top-end Pro tier of ChatGPT as a research preview. It can order groceries, book flights, fill forms and perform other multistep tasks. Users can instruct Operator what tasks to perform by entering natural language prompts. Under the hood, the agent is powered by a newly revealed OpenAI model known as CUA. It's partly based on the company's multimodal GPT-4o large language model. OpenAI says that CUA combines the LLM with "advanced reasoning through reinforcement learning." When users ask Operator to perform a task in a website, the agent navigates to the relevant URL using a built-in browser. It can type, click and scroll to carry out the requested action. Operator regularly takes screenshots of the website to check that everything is working as expected. The user can take over at any point during the workflow, OpenAI detailed. Operator proactively asks users to switch to manual mode for sensitive actions such as entering login credentials into a webpage. According to OpenAI, the agent stops taking screenshots until the task is completed. The company has built several data protection features into Operator. Users can log it out of all their accounts with one click and prevent OpenAI from using their data for AI training. Additionally, there's a system that detects when malicious websites attempts to trick Operator into disclosing sensitive data. Some of the agent's features are customizable. A user could, for example, save a shopping list and have Operator buy the specified items every time it visits a certain e-commerce site. It's also possible to create customization settings that apply to all the websites the agent visits. Going forward, OpenAI plans to expand the availability of Operator beyond ChatGPT Pro to the chatbot's other tiers. The company will also offer the agent through its application programming interface. Under the hood, OpenAI plans to add enhancements that will make Operator better at completing complex tasks. "Operator is currently in an early research preview, and while it's already capable of handling a wide range of tasks, it's still learning, evolving and may make mistakes," OpenAI researchers wrote in a blog post. "Early user feedback will play a vital role in enhancing its accuracy, reliability, and safety." OpenAI rival Perplexity AI today debuted an agent of its own, Perplexity Assistant, that is accessible in its Android app. It can make e-commerce purchases, book a taxi and perform other tasks in an automated manner. A multimodal processing feature enables Perplexity Assistant to analyze smartphone camera footage and the content on the user's screen. On launch, the agent can perform actions in Spotify, YouTube and Uber along with email, messaging and clock apps. Perplexity AI plans to add support for more services over time.
[52]
Why OpenAI's Agent Tool May be the First AI Gizmo to Improve Your Workplace
Many of us have by now chatted to one of the current generation of smart AI chatbots, like OpenAI's market-leading ChatGPT, either for fun or for genuine help at work. Office uses include assistance with a tricky coding task, or getting the wording just right on that all important PowerPoint briefing that the CEO wants. The notable thing about all these interactions is that they're one way: the AI waits for users to query it before responding. Tech luminaries insist that next-gen "agentic" AIs are different and can actually act with a degree of autonomy on their user's behalf. Now rumors say that OpenAI's agent tool, dubbed Operator, may be ready for imminent release. It could be a game changer. The news comes from a software engineer that news site TechCrunch says has a "reputation for accurately leaking upcoming AI products," Tibor Blaho. Blaho says he's found evidence of Operator inside the desktop computer version of OpenAI's ChatGPT app, and publicly hidden information on OpenAI's website, including data comparing Operator's performance to other AI systems. AI agents are snippets of AI-powered code that can be given the ability to "act" in digital environments. This means giving an agent the ability to control a users' computer, for example, which means it can fill in information on a webform, or even write code. According to OpenAI's CEO Sam Altman, agents are the next big thing in AI, and they could totally change the way many officer workers spend their day.
[53]
OpenAI releases preview of Operator AI agent in the US
The release coincides with rival AI company Anthropic unveiling its Citations feature. OpenAI yesterday (23 January) released a research preview of its Operator artificial intelligence (AI) agent, launching in the US to its Pro subscribers. AI rival Anthrophic also yesterday revealed its Citations feature, which aims to provide detailed references for its AI bot Claude's responses. OpenAI's latest agent is powered by a new model called Computer-Using Agent (CUA), a combination of GPT-4o's vision capabilities with "advanced reasoning through reinforcement learning". According to the company, Operator can go to the web to perform tasks for its users. "Using its own browser, it can look at a webpage and interact with it by typing, clicking and scrolling," OpenAI explained. "Operator is one of our first agents, which are AIs capable of doing work for you independently - you give it a task and it will execute it." It claimed that Operator can handle a variety of "repetitive browser tasks", such as filling out forms, ordering groceries and even creating memes. "The ability to use the same interfaces and tools that humans interact with on a daily basis broadens the utility of AI, helping people save time on everyday tasks while opening up new engagement opportunities for businesses." The company further claimed that the research preview will allow it to learn from its users and the broader ecosystem and will therefore allow it to improve Operator. Deleted data and limitations OpenAI also revealed that it may store deleted Operator data for up to 90 days, even after a user manually deletes them. It added that its policies around data retention for Operator are designed to combat abuse. And with any product in the early stages of its life cycle, OpenAI has acknowledged that Operator currently has some limitations. "While it's already capable of handling a wide range of tasks, it's still learning, evolving and may make mistakes. For instance, it currently encounters challenges with complex interfaces like creating slideshows or managing calendars." As such, early user feedback will help to enhance the "accuracy, reliability and safety for Operator", OpenAI asserted. Earlier this week, recently appointed US president Donald Trump announced that private sector funding of $500bn would be invested into OpenAI's infrastructure over the next four years. The Stargate Project's initial equity funders, which includes OpenAI, also includes Oracle, MGX and SoftBank, with Microsoft, Nvidia and Arm among the key technology partners. Don't miss out on the knowledge you need to succeed. Sign up for the Daily Brief, Silicon Republic's digest of need-to-know sci-tech news.
[54]
OpenAI's Operator beats Google's Project Mariner to the punch
If Google wants to fill our phones with AI, it needs to give Pixels more storage Summary OpenAI's new agentic chatbot called Operator offers a digital personal assistant that can handle tasks for you, combining vision capabilities with superior reasoning. Operator can handle browser-based tasks like form-filling and online booking, with a focus on user privacy and supervision. Currently limited to OpenAI Pro users in the US as a limited test, Operator may launch ahead of Google's Project Mariner with swift testing. OpenAI pioneered the popularization of conversational AI chatbots with ChatGPT, forcing the likes of Google to take notice and step up their own efforts. While Gemini and Chat GPT might be at par in basic operations, the Sam Altman-led company has once again beaten the search giant to the punch, launching its agentic AI called Operator while Google's implementation is still a concept called Project Mariner, last time we checked. Related Google's Project Mariner AI can surf the web for you, with a huge caveat Don't expect to free up any spare time Posts 1 OpenAI's latest creation called Operator is the next logical step in the development of AI -- a digital personal assistant that can execute moderately complex tasks for you. This time, the company is using a new model behind the scenes, called Computer-using Agent (CUA), confirming this is indeed the agentic AI Google is aiming for with Project Mariner. This model combines the vision capabilities of GPT-4o users already know, with advanced reasoning. Operator is already available, and adept at filling out forms for you, booking a spa appointment, or placing grocery orders online. A common thread between these activities is that they are all browser-based, so the AI agent is also equipped with the skill to operate a browser like a human would, and it can chain these activities together too. A video demo shows how you can look up the recipe for pasta and have the required ingredients auto-added to your grocery shopping cart online. The process saves your effort, but is still rather manual, since the AI interacts with webpages you would, by scrolling and clicking around as you do. Human intervention still plays a key role with Operator Focus remains on privacy Importantly, you will still remain involved in the process of using this agent, since it seeks confirmation before any major actions, including password handling and financial transactions. Moreover, OpenAI realizes users might be skeptical about these details, so you can opt out of data collection, delete your browsing data with Operator, or use the suite of privacy settings available. To refine the user experience further, OpenAI is collaborating with companies such as Instacart. You will also get the option to save frequently repeated actions, websites you visit, or prompts for these tasks. A tool like this can be invaluable for people with sensory impairment, not to mention the sheer convenience it can add to your everyday life with cross-functional automation. While eventual integration with ChatGPT is certainly in the cards, the Operator experience is only available to a OpenAI Pro users now in the form of a research preview. THe experiment is limited to the US as well, but with a strong head start like this, it is easy to see how the company might be first to market as well, beating Google's Project Mariner with swift testing.
[55]
OpenAI unveils tool to automate web tasks as AI agents take center-stage
(Reuters) - Generative artificial intelligence heavyweight OpenAI on Thursday previewed an AI agent that can carry out tasks on the web for users, as it seeks to enhance its chatbot amid intensifying competition. The tool, called Operator, is powered by a model that allows it to interact with on-screen buttons, menus and text fields. "This capability marks the next step in AI development, allowing models to use the same tools humans rely on daily and opening the door to a vast range of new applications," the company said in a blog post. Operator can perform a variety of tasks, like creating to-do lists or assisting with vacation planning. It also takes user input once it decides a task is complete and seeks confirmation for some tasks, such as entering login details on a website. The tool is currently available to Pro users in the U.S. as a research preview, the Microsoft-backed startup said. Agents, which are systems that can execute actions such as making purchases and scheduling meetings without direct human intervention, are now at the forefront of companies' AI agenda. OpenAI competitor Perplexity launched an agent-based assistant for Android devices earlier on Thursday. This assistant can book dinner reservations, hail rides on apps and set reminders, among other tasks. Last year, Apple incorporated Apple Intelligence into its voice assistant, Siri, and -- in a partnership with OpenAI -- the iPhone maker also introduced the use of ChatGPT, with user permission. While such agents had long been elusive to researchers, the emergence of step-by-step reasoning approaches like those used in OpenAI's o1 model could make such tasks possible, business executives told Reuters in December. (Reporting by Arsheeya Bajwa in Bengaluru; Editing by Alan Barona)
[56]
OpenAI could release agentic AI tool Operator soon
OpenAI could release its agentic AI tool that could, in theory, autonomously do tasks for you by taking control of your device. Reports have suggested the tool is getting close to being released. Techcrunch wrote, for instance, that the tool dubbed Operator looks to be "nearing release," citing, in part, Tibor Blaho, a software engineer who often leaks or uncovers AI news. Blaho wrote on X that he noticed "ChatGPT macOS desktop app has hidden options to define shortcuts for the desktop launcher to 'Toggle Operator' and 'Force Quit Operator.'" The Information has reported that OpenAI targeted January as a release date for Operator. It would reportedly first release a research preview and developer tool. The exact release date remains unknown. The idea behind the tool is that it can act autonomously to make your life easier -- in other words, it could take over your computer and do tasks on your behalf. That means it could things like book flights or write code. Of course, it could also make mistakes on your behalf, which is a bit worrying.
[57]
OpenAI is releasing an AI that can control your PC -- if you cough up $200
OpenAI may be one step closer to releasing its agent tool, called Operator, which is on track for January 2024 availability. The artificial intelligence company first announced the Operator AI agent in November 2024, explaining that the browser-based tool is autonomous and is able to complete tasks on a computer without human assistance. OpenAI added that Operator would be first available as a research preview within the $200 ChatGPT Pro subscription plan. Recommended Videos Enthusiasts have now discovered recent information about the tool that suggest that it may in fact be close to launch, and that the information OpenAI previously shared is correct. An X user by the name of Choi, discovered updates to the client-side code of ChatCPT, which was later verified by TechCrunch. The code includes several references to the URL operator.chatgpt.com- a live page that redirects to the chatgpt.com main page if you interact with any of the it's content. The code details that there will be a popup within ChatGPT, explaining how you can access Operator, which will read "Operator is currently only available to Pro users as an early research preview." OpenAI will also add "access to research preview of Operator," to the list of perks of the ChatGPT Pro plan. According to The Information, the tool works similar to most AI services, starting with a prompt. From there, Operator will be able to take care of the rest of the function. Users can imagine Operator completing tasks such researching travel and activities, or completing productivity tasks within commonly used programs and services, such as Asana, LinkedIn, or Salesforce. The publication noted that Operator could be available before the week's end. Hopefully with a release, OpenAI will have more to say about the tool and its underlying technology. Notably, OpenAI's Operator has its competitors. Anthropic recently released its "Computer Use" API that is currently a developer's beta. Google also announced its own AI Agents in December 2024 as an experimental tool.
[58]
ChatGPT Operator is the next step in letting AI organize your life, and it's expected to launch this week
Rumors have been circulating about the launch of ChatGPT's AI agent for months, but it looks like we could be saying ChatGPT Operator this week. According to a new report from The Information, OpenAI is gearing up to release ChatGPT's Operator feature this week, and considering it's already Thursday we might not have long to wait. Operator is an AI agent that can automate tasks you'd usually do yourself via a web browser. Think booking vacations, making restaurant reservations, or buying clothes in the sale. ChatGPT Operator is expected to completely revolutionize the way we interact with AI, letting ChatGPT do the work for you. While we've heard rumblings of ChatGPT Operator's launch for a while, including rumors that it's also coming to ChatGPT's Mac app, this report from The Information gives us our first glimpse at how it's actually going to work. The Information claims Operator "provides users with different categories of tasks, like dining and events, delivery, shopping and travel, as well as suggested prompts within each category. When users enter a prompt, a miniature screen opens up in the chatbot that displays a browser and the actions the Operator agent is taking. The agent will also ask follow-up questions, like the time and number of people for a restaurant reservation." Sounds pretty cool, right? Well, you'll also be able to take control of Operator while it's working, just in case, you know... AI goes rogue. I've been waiting for ChatGPT's artificial intelligence agent for over a year, so this news of an imminent launch is music to my ears. If done correctly, Operator will allow you to focus on the tasks that are most important while giving AI freedom to complete the mundane. That might sound a little terrifying at first, but it could completely revolutionize the way we interact with AI. OpenAI isn't the only company to be working on an AI agent, Google accidentally leaked its Jarvis AI late last year, so we expect to see the Gemini equivalent in 2025, too. The Information is usually very reliable with its reports but launch windows are often subject to change. That said, we don't have long to wait for the week to come to a close, so if these reports are accurate we might all be using AI to book our next trip to Cancún this weekend.
[59]
OpenAI Releases AI Agent That Helps Book Flights, Order Food for Users
OpenAI is rolling out an artificial intelligence tool that can help book flights, plan grocery orders and even complete purchases for users, joining a growing number of tech companies betting on so-called AI agents that act on a person's behalf. The service, called Operator, can carry out a wide range of tasks by using the internet much in the way a human would, including navigating to a website, typing and clicking buttons, OpenAI said on Thursday. Operator's software works by combining some of OpenAI's computer-vision features with multi-step problem-solving capabilities meant to mimic how people reason, the company said. Bloomberg News first reported on OpenAI's plans for Operator in November. Initially, OpenAI is releasing what it calls a "research preview" of Operator online to a limited number of US customers who pay $200 per month for the recently introduced ChatGPT Pro subscription. The company said it hopes to learn from Operator's early users so it can improve the product and plans to offer it to more paid customers over time. The Operator rollout is part of a broader industry push toward agents, or AI software that can complete multi-step tasks for users with minimal supervision. OpenAI-backer Microsoft Corp. and rival Anthropic have launched their own takes on agent software, as have a number of other startups. The companies hope such tools can save users time with their personal and professional tasks and thereby live up to the long-held promise that AI will make people more productive. Sam Altman, OpenAI's chief executive officer, previously said agents will be "the next giant breakthrough" for AI. In a demonstration of the tool on Wednesday, Peter Welinder, OpenAI's vice president of product, and Yash Kumar, who leads product and engineering for Operator, showed how the tool could look for a restaurant reservation or recognize the items on a handwritten list to prep an online grocery order. Kumar said OpenAI partnered with a number of companies on the tool, including Instacart, OpenTable, Uber and StubHub, in part to ensure Operator works well on their websites.
[60]
OpenAI may preview its agent tool for users on the $200 per month Pro plan
We may see OpenAI's agent tool, Operator, released sooner rather than later. Changes to ChatGPT's code base suggest that Operator will be available as an early research preview to users on the $200 Pro subscription plan. The changes aren't yet publicly visible, but a user on X who goes by Choi spotted these updates in ChatGPT's client-side code. TechCrunch separately identified the same references to Operator on OpenAI's website. Here are the three interesting tidbits we spotted: Bloomberg previously reported that OpenAI was working on a general-purpose agent that can perform tasks in a web browser for you. While this sounds a bit abstract, think about all the mundane things you do regularly in your web browser with quite a few clicks -- following someone on LinkedIn, adding an expense in Concur, assigning a task to someone in Asana, or changing the status of a prospect on Salesforce. An agent could perform such multi-step tasks based on an instruction set. More recently, The Information reported that OpenAI could launch Operator as early as this week. With today's changes, it seems like everything is ready for a public launch. Anthropic has released an AI model that can control your PC using a "Computer Use" API and local tools that control your mouse and keyboard. It is currently available as a beta feature for developers. It looks like Operator is going to be usable on ChatGPT's website, meaning that it won't interact with your local computer. Instead, OpenAI will likely run a web browser on its own servers to perform tasks for you. Nevertheless, it indicates that OpenAI's ability to interact with computers is progressing. Operator is a specific sandboxed implementation of the company's underlying agentic framework. It's going to be interesting to see if the company has more information to share on the technology that powers Operator.
[61]
OpenAI may be able to control your PC for you soon -- what we know about the new agent tool
Your PC's new overlord? OpenAI's Operator agent tool may arrive soon Agentic AI is very much the next big thing -- taking control of your device and doing stuff on your behalf. It's everything we wanted our old smart assistants to be, but were just to dumb to do. Now, OpenAI is reportedly entering this arena, and it may come sooner than you think. Software engineer Tibor Blaho, someone who has been rather accurate at AI product leaks, has found evidence of OpenAI Operator in the ChatGPT macOS desktop app. This is the long rumored agentic system that we've been waiting for, and it ties in nicely with earlier reports that Operator would land this month. Blaho found signs of Operator in the ChatGPT desktop app for macOS, where there are hidden options to assign keyboard shortcuts for "Toggle Operator" and "Force Quit Operator." As we already know from our time using various AI agents, the purpose of these will be to operate on your behalf on your system -- working on multi-step tasks you assign with a prompt. This will mean you can have the Operator working on your more menial tasks like setting up a spreadsheet. Moreover, it could be the next step in that classic example AI companies use: making a travel plan. Instead of just giving you suggestions of what to do with your time, Operator may be able to go one step further and help by actually booking flights and accommodation based on the preferences you give it. Of course, we don't know exactly how far OpenAI's tool will stretch into the PC like this. This is us putting pieces together from what we've seen in automation tools from the likes of Anthropic. But I don't think it's too far of a stretch to assume we may be getting close to this. Now for the big question mark. You see, these settings were spotted in the macOS app, but nothing has been said about them in the Windows version of ChatGPT. As you can probably see in the responses under the Blaho's X post, the community is essentially praying that we don't get the situation where it's an indefinite wait for a brand new feature to come to Windows. I know that Microsoft's OS is a totally different beast to macOS, but please. I beg of you, OpenAI, don't put this behind the Cupertino wall. Let the Redmond crowd have a go, too! If I could ask it to defrag my drives, that would be a godsend!
[62]
OpenAI launches first AI agent but it won't be coming to Europe yet
ChatGPT maker OpenAI is kicking off its first real attempt at an artificial intelligence (AI) agent. The company on Thursday launched a research preview of a product called Operator, which can go to the web and perform tasks such as filling out forms, ordering groceries, booking travel, and creating memes. For now, it is only available in the United States to users on ChatGPT's Pro subscription plan and will not be coming to Europe anytime soon. "[Operator] will be [in] other countries soon," OpenAI CEO Sam Altman said during a livestream on Thursday. "Europe will, unfortunately, take a while," he added, without explaining why. AI agents are designed to take autonomous actions to assist humans and do not require a human to tell them what to do, as they gather data based on user preference. It is different from an AI chatbot such as ChatGPT, which is designed for human-like conversations and serves as more of a co-pilot in assisting humans. Instead, Operator can use its own browser, look at a webpage, and interact with it by typing and using the functions of a computer mouse such as clicking and scrolling, OpenAI said. The company also said it can "see" through screenshots. The AI agent is powered by Computer-Using Agent (CUA), a model combining OpenAI's GPT-4o's vision capabilities with advanced reasoning through reinforcement learning. OpenAI said that it will expand Operator to Plus, Team, and Enterprise subscription users and integrate these capabilities into ChatGPT in the future. The company said that it is limited at the moment and is "still learning, evolving and may make mistakes" and currently "encounters challenges with complex interfaces like creating slideshows or managing calendars". OpenAI is not the first company to launch an AI agent. Microsoft, Google, and Slack have also launched their own agents. Enterprises are tipped to have a slew of AI agents in 2025, which can help with customer service, human resources, data security, and public sector organisations. They are, in general, designed to save users time.
[63]
OpenAI's New AI Agent Can Order Groceries for You. Analysts Say That's Bad News For Google.
Analysts suggested Operator could provide a boost to the gig economy, but could also shrink Google Search traffic and digital ad exposure. ChatGPT-maker OpenAI's new Operator AI agent could have huge implications for Google Search, gig economy companies like Uber (UBER), and digital advertisers, according to analysts. Operator, released Thursday to users of OpenAI's $200 monthly Pro plan, can order groceries and book travel through a user's web browser, according to OpenAI. The company said it expects to eventually roll Operator out to users of its $20 monthly Plus tier, and ultimately integrate it with ChatGPT. Instacart (CART), DoorDash (DASH), Uber, and other gig-economy companies could be among the biggest beneficiaries if Operator can streamline the ordering process, analysts at Bank of America told clients in a note Friday. OpenAI is collaborating with those companies, along with Booking Holdings' (BKNG) OpenTable and Priceline, among others. BofA projects aggregate gig economy bookings in the U.S. could reach $240 billion in 2025, and said AI agents like Operator may boost growth. If Operator proves successful, that might lead to higher conversion rates for those companies, the bank said. However, JPMorgan analysts warned it could also lead users to spend less time on retail websites, which may negatively impact opportunities for new product discovery and cross-selling. For instance, if a user asks Operator to find a recipe and order the ingredients from Instacart, that could eliminate the search traffic that might have otherwise gone to Alphabet's (GOOGL) Google and stop users from seeing digital ads along the way, JPMorgan said. Notably, the analysts speculated Operator is blocked by Alphabet's YouTube and Reddit (RDDT), the latter of which gets up to half its traffic from Google. Operator also presents competition for Google's own AI agent ambitions, which include an agent called Project Mariner announced in December. It will also compete with Meta (META), which JPMorgan expects to ramp up its Llama 4 generative AI platform this year with some services similar to Operator.
[64]
OpenAI Debuts 'Operator' AI Agent That Performs Web Tasks, But Early Error Has Sam Altman Scrambling For A Quick Fix - Alphabet (NASDAQ:GOOG), Alphabet (NASDAQ:GOOGL)
On Thursday, ChatGPT-parent OpenAI released a "research preview" of a new AI agent called "Operator," designed to autonomously perform web tasks. What Happened: Operator is initially available in the U.S. for subscribers of OpenAI's $200 per month ChatGPT Pro tier. The AI agent employs a "Computer-Using Agent" model, integrating GPT-4o's vision capabilities with advanced reasoning through reinforcement learning, allowing it to interact with graphical user interfaces (GUIs). Shortly after its release, a user on X, formerly Twitter, noted a limitation in Operator's news site access, prompting Sam Altman, OpenAI's CEO, to acknowledge the issue and promise a swift fix. See Also: Google Will Continue To Fight Epic Games In Court To Halt Epic-Requested Changes To Play Store: 'Undercut Android's Ability To Compete With Apple's iOS' OpenAI states that Operator can "see" through screenshots and "interact" using mouse and keyboard actions, enabling it to perform tasks on the web without custom API integrations. The AI startup is working with companies like DoorDash, Instacart, and Uber to ensure Operator meets real-world needs. However, OpenAI cautions that the tool may struggle with complex interfaces. Subscribe to the Benzinga Tech Trends newsletter to get all the latest tech developments delivered to your inbox. Why It Matters: The launch of Operator comes on the heels of OpenAI's valuation surge to $157 billion, following a $6.6 billion funding round aimed at advancing AI research and expanding compute capacity. Despite this financial boost, OpenAI faces challenges. Earlier this month, Altman said that the company is incurring losses on the ChatGPT Pro plan. The plan, intended to generate revenue, has not met profit expectations due to higher-than-anticipated usage. Other major tech companies, such as Microsoft Corp. MSFT and Alphabet Inc. GOOG GOOGL, have either launched or are gearing up to release similar AI agent tools. Meta Platforms Inc. META CEO Mark Zuckerberg has also previously discussed the shift from simple chatbots to more advanced AI agents capable of managing complex tasks and objectives. Check out more of Benzinga's Consumer Tech coverage by following this link. Read Next: Mark Cuban Offers To Help Elon Musk Boost X Revenue, Says Tesla CEO Is Concerned About Declining Users: 'I Wish He Would Call Me' Disclaimer: This content was partially produced with the help of Benzinga Neuro and was reviewed and published by Benzinga editors. Photo courtesy: Shutterstock GOOGAlphabet Inc$199.890.16%Overview Rating:Good62.5%Technicals Analysis1000100Financials Analysis400100WatchlistOverviewGOOGLAlphabet Inc$198.420.22%METAMeta Platforms Inc$638.900.38%MSFTMicrosoft Corp$446.31-0.09%Market News and Data brought to you by Benzinga APIs
[65]
AI Models and Tools: OpenAI to Launch AI Agent, Microsoft Loses Cloud Exclusivity | PYMNTS.com
OpenAI has developed an artificial intelligence (AI)-powered agent that can use a web browser and do tasks, just like a human user. Called "Operator," the AI agent is set to be released this week, according to The Information. Operator will provide users with suggested prompts, such as finding a flight from Los Angeles to New York that leaves in the morning. Operator will not complete these tasks autonomously; the user still needs to approve any transaction. Bloomberg first reported on Operator last November, noting a January release date. OpenAI's move follows in the footsteps of its competitors. Last October, Anthropic announced a similar capability for Claude 3.5 Sonnet, an AI model that is part of its flagship LLM family. The OpenAI competitor called the capability "computer use." "Developers can direct Claude to use computers the way people do -- by looking at a screen, moving a cursor, clicking buttons and typing text," Anthropic said in the news release announcing the AI agent. However, Anthropic warned that the experimental features are "at times cumbersome and error-prone." In December, Google announced its own web-browsing AI agent, called "Project Mariner." Built with Gemini 2.0, Google's latest version of its flagship multimodal models, Project Mariner can "understand and reason across information in your browser screen, including pixels and web elements like text, code, images and forms." It then uses that information to complete tasks for the user by navigating the web browser. Like Anthropic, Google warns that it is "not always accurate and slow to complete tasks." Microsoft will no longer be OpenAI's exclusive cloud provider for its AI models, the software giant disclosed in a Tuesday (Jan. 21) blog post. Instead, Microsoft will have the right of first refusal to host OpenAI's AI workloads in Azure. This is a change from their 2019 agreement, when Microsoft became OpenAI's exclusive cloud provider after investing $1 billion in OpenAI. The investment was also three years before ChatGPT and generative AI took the world by storm. Since then, Microsoft reportedly has invested a total of nearly $14 billion in OpenAI. However, Microsoft will retain the exclusive rights to OpenAI's application programming interface (API). The API is a set of rules, protocols and tools that let software applications communicate with each other. OpenAI's API is how most companies access and integrate the startup's AI models into their own applications, products or services. The change in terms come OpenAI on Tuesday named Microsoft as a technology partner in its new AI infrastructure project called Stargate, which aims to spend from $100 billion to $500 billion to build physical and virtual AI infrastructure. This includes the building of 500,000-square-foot data centers for AI processing. Ten data centers are under construction in Texas and will expand to 20. The first one is being built in Abilene, Texas. Fresh off its success with NotebookLM, which turns documents into audio podcasts featuring two AI-generated hosts discussing the content, Google has opened the waitlist for its spinoff, Daily Listen. Daily Listen brings the two AI podcast hosts back. This time, they will give you a daily update on things that matter to you, based on your interests. You'll also get links to stories from around the web. For now, Daily Listens is only available in the Google app on your mobile device, in the U.S. Microsoft researchers recently introduced an AI model that can create new, inorganic materials with specific properties. It can do so much faster than the traditional way, where scientists would spend years in research. Called MatterGen, the AI model could lead to breakthroughs in things like batteries, magnets, semiconductors and other technologies by creating materials that are more efficient, stronger or cheaper, for example. MatterGen could also help tackle sustainability challenges by creating eco-friendly materials with less environmental impact. It can design materials that don't rely on expensive or rare elements to reduce environmental harm. Scientists can use MatterGen to customize materials for nearly any purpose -- whether to build stronger buildings, make better electronics or improve on medical devices.
[66]
OpenAI Might Be Close to Launching Advanced AI Agents
ChatGPT recently received a Tasks feature that uses agentic capabilities OpenAI might be getting close to releasing its first artificial intelligence (AI) agent. Sam Altman, the company CEO, has reportedly scheduled a briefing with US officials on January 30 where he might showcase the capabilities of its AI agents. The AI firm has been rumoured to be working on an AI agent called the Operator for quite some time now, and it is believed that the tool could be released in the first half of the year. The company has also published an Economic Blueprint that shares strategies to maintain the leadership position of the US in the AI space. Currently, Altman is in Washington attending the inauguration of Donald Trump as the 47th President of the US, scheduled for Monday. However, according to an Axios report, the OpenAI CEO has also requested a closed-door briefing with US government officials on January 30. The agenda of the meeting was not disclosed. However, the meeting likely has two objectives. First, it could establish how OpenAI can help the US maintain its leadership edge in AI. Last week, the AI firm released an Economic Blueprint that detailed how AI can bring about an era of "re-industrialism" in the US. Notably, Microsoft released a paper on a similar topic earlier this month. The second objective could be to showcase the capabilities of its under-development AI agent. Altman has spoken about the transformative power of AI agents multiple times in the past, and the company is also rumoured to be building its own AI agents. These highly sophisticated agents can perform various complex tasks locally on a device or the cloud and Internet infrastructure. Notably, OpenAI released a Tasks feature for ChatGPT last week, which is said to use agentic capabilities in a limited scope. The feature, which allows the AI to schedule and execute a task in the future, could be the first step taken by the company towards AI agents. While Microsoft, Amazon, Google, and Claude have released AI agents and AI agent platforms that can perform specific tasks by connecting to different data structures and enterprise systems, these do not explore the full scope of an AI agent, which is described as an autonomous system that can perform tasks end-to-end with zero to minimal human intervention.
[67]
New OpenAI super-agent 'Operator' set to challenge PhD-level experts
Rumors are floating around that OpenAI is set to unleash a new Ph.D.-level intelligence super-agent very soon. According to reports, this new artificial intelligence (AI) model could be available as soon as January 30. While development news around OpenAI has, until now, focused on its upcoming ChatGPT-5 and advanced humanoid robot, this new super-agent AI is, according to some, a much bigger deal. While detailed information is thin on the ground, the new AI is allegedly designed to perform complex tasks autonomously, reducing the need for human intervention.
[68]
OpenAI CEO to brief US officials on advanced AI agents capable of complex tasks - SiliconANGLE
OpenAI CEO to brief US officials on advanced AI agents capable of complex tasks OpenAI Chief Executive Officer Sam Altman is reportedly set to meet with U.S. government officials in Washington D.C. on Jan. 30 for a closed-door briefing on new technology, possibly "Ph.D.-level super-agents" that do complex human tasks. The latest claim comes from Axios, be it that details of the same meeting were reportedly shared with readers of Transformer, a weekly briefing on artificial intelligence published by the New York Times a week ago. Whoever was first, what has been known for some time is that OpenAI is planning on launching AI Agents. An AI agent is a software system capable of autonomously performing tasks, making decisions and interacting with its environment or users to achieve specific goals. In November, it was reported that OpenAI's new agent will be called "Operator" and will be able to perform web browser-related tasks on behalf of users. The report claimed that Operator is expected to launch in January as a research preview and will be made available via its developer application programming interface before later being rolled out to other users. The exact breadth and functionality of OpenAI's agents are yet to be seen, but the technology itself is not new and is currently available from other companies, such as Anthropic PBC. In Antropic's case, the agent can interact with computers by moving the mouse, typing text and clicking buttons to interact with the user interface - an early implementation of the technology, for sure, but one that gives a taste of some of what is likely to come. The rise of AI Agents that surf the web and undertake tasks autonomously, what some call the rise of "agentic AI," is where arguably AI will switch from what is now an increasingly useful research tool and a chatbot that can replace some jobs and people, to a future where AI can fully replace human employees and even go as far as running a company. That OpenAI's offering is said to include "Ph.D.-level super-agents" might also suggest that OpenAI has taken the technology beyond being able to automate tasks through to something more. Axios claims that several OpenAI staff have been telling friends that they are both "jazzed and spooked by recent progress," suggesting that it might be something more that has been previously seen. However, the AI industry has a tendency to embrace hype and there are already skeptics ahead of the launch. Previous OpenAI launches have fallen short - the Sora generative AI text-to-video service, when it finally launched in December, was found to lack features that other text-to-video companies already had on the market and was not as good as some had hoped for. Hopefully, Sam Altman and OpenAI won't make the same hype mistake with its Agents that it did with Sora.
[69]
OpenAI's Sam Altman to brief US officials on 'PhD-level' AI agents
Gemini Advanced: Everything you need to know about Google's premium AI Summary OpenAI will unveil new PhD-level super agent AI to US officials in a closed-door meeting on January 30. These AI models are considered a generational leap for the technology and can handle complex tasks such as managing a global supply chain. Despite warnings to lower expectations, AI's impact on job displacement is a concern for legislators. Move over Google Gemini . Forget Microsoft Copilot. There are reports that OpenAI is about to unleash an AI so powerful, its CEO, Sam Altman, has to first give a demonstration of it to US federal officials. Siri who? Related 5 reasons why I prefer Gemini Advanced over ChatGPT Plus Gemini Advanced hits all the sweet spots for my use cases Posts 10 According to these reports, Altman will brief US officials in a closed-door meeting on January 30, where he will unveil PhD-level artificial intelligence agents (via Windows Central). These AI agents, dubbed 'super agents,' are so powerful they threaten entire labor markets. We're talking far beyond helping organize a spreadsheet or code a website. What is a super-agent AI? Super agents are described as goal-oriented artificial intelligence models that synthesize enormous sets of data to deliver actionable results. That's a mouthful, and sounds suspiciously like how AI would describe itself. In short, these PhD-level super bots could: Design new software from scratch, building and testing it autonomously from start to finish. Master complex global logistics chains and keep trucks, ships, and planes moving on time. Conduct deep research and analysis into complex problems at incredible speeds. Those are just a few examples of what these new AI models could potentially accomplish. Sources told Axios that the people who created these super agents were "jazzed and spooked" at what they had unleashed. Don't get too excited, Altman tweeted Altman posted on X, urging people to calm down and cut their expectations by 100 times. But no amount of posting by billionaire tech bros can hide the fact that the emergence of AI is already having an impact on people's jobs. Meta's Mark Zuckerberg and Salesforce's Marc Benioff have already said they plan to reduce hiring and replace workers with AI. There are calls in Congress to address the potential fallout of widespread automation. Legislators are looking at a major AI infrastructure bill, but let's be honest, super-agent artificial intelligence is too far beyond their technical grasp.Congress is still trying to figure out if Singapore is part of communist China or not (it's not). This means the potential for these super agents to disrupt the job market is massive. While they could unlock untold potential , especially in areas such as health research and genetics, they could also lead to a dystopian future of widespread job displacement. Related Gemini Advanced: Everything you need to know about Google's premium AI Google's premium AI explained Posts 2
[70]
Sam Altman to brief officials on "PhD-level" super AI
Sam Altman, CEO of OpenAI, is scheduled to meet with U.S. government officials on January 30, 2025, to discuss advancements in artificial intelligence, specifically the unveiling of "PhD-level" super AI agents capable of complex human tasks. Recent reports from Axios indicate a growing enthusiasm and concern among OpenAI staff regarding breakthroughs in agentic AI, which focuses on specialized, task-specific AI agents. While errors and hallucinations have previously limited AI capabilities largely to low-stakes tasks, advancements suggest a possible shift towards higher-stakes applications. The Axios report elaborated that these "PhD-level" artificial intelligence agents could potentially execute responsibilities currently reserved for highly educated professionals. The potential implications of such developments have sparked discussions among industry leaders, including Mark Zuckerberg of Meta and Salesforce CEO, about the ongoing trend of AI replacing mid-level human jobs. Zuckerberg specifically noted that by 2025, Meta would likely have AI capable of functioning as mid-level engineers, stating, "Over time, we'll get to the point where a lot of the code... is actually going to be built by AI engineers instead of people engineers." In the broader context, companies like Microsoft are actively competing to mainstream super agentic AI technologies. Microsoft Azure has begun offering limited agentic AI solutions primarily focused on customer support. Future enhancements could enable Microsoft Copilot to handle intricate tasks, such as creating complex spreadsheets or even generating personalized software applications through intuitive prompts. Furthermore, advancements in AI could allow for the conceptualization and execution of entire gaming environments based on user's commands, effectively streamlining what has typically required extensive human effort. OpenAI has also recently released an "Economic Blueprint," highlighting that with appropriate regulations and infrastructure, AI could significantly contribute to reindustrialization in the U.S. However, concerns over tech literacy among elected officials remain, particularly regarding their ability to navigate impending social changes due to these technologies. Despite the potential for AI to offer substantial productivity gains in various sectors, significant challenges persist, particularly in terms of reliability and the tendency for generative AI to produce incorrect information. Sources from within OpenAI indicate a mix of excitement and apprehension about these advancements, suggesting that the forthcoming innovations could have profound implications for labor markets and societal structure.
Share
Share
Copy Link
OpenAI launches Operator, an AI agent capable of performing web-based tasks autonomously, sparking discussions about its implications for AGI and potential risks.
OpenAI has introduced Operator, its first semi-autonomous AI agent designed to perform web-based tasks on behalf of users. This development marks a significant step towards Artificial General Intelligence (AGI) and has sparked discussions about the potential benefits and risks of increasingly autonomous AI systems [1][2].
Operator is built on OpenAI's Computer-Using Agent (CUA) technology, which combines GPT-4o's vision capabilities with reinforcement learning. The agent can:
The system has demonstrated impressive performance on several benchmarks, including an 87% success rate on WebVoyager for web navigation tasks [2].
OpenAI is collaborating with companies like DoorDash, Instacart, OpenTable, and others to ensure Operator addresses real-world needs while respecting established norms. This partnership approach aims to optimize the agent's functionality across various online services [2][4].
Despite its potential, Operator faces several challenges and limitations:
Operator represents a significant step towards AGI, with OpenAI explicitly stating its role in removing bottlenecks on the path to more advanced AI systems [1]. However, this progress has also intensified debates about the potential risks associated with increasingly autonomous AI agents.
AI pioneer Yoshua Bengio has warned about the potential catastrophic consequences of agentic AI systems, suggesting that non-agentic approaches to AGI development might be safer [1]. This highlights the ongoing tension between advancing AI capabilities and ensuring responsible development.
Operator is currently available as a research preview to ChatGPT Pro subscribers, who pay $200 per month for access [2][4]. As the technology evolves, it is expected to become more widely available and potentially integrate with a broader range of online services and tasks.
The launch of Operator signifies a new era in AI development, where agents can increasingly interact with digital interfaces in human-like ways. While this opens up exciting possibilities for automation and assistance, it also underscores the need for careful consideration of the ethical, safety, and societal implications of such advanced AI systems.
Reference
[1]
[2]
[3]
[4]
OpenAI's new AI agent, Operator, shows potential in automating online tasks but faces challenges in reliability and user experience.
7 Sources
OpenAI's new AI agent, Operator, shows promise in automating online tasks but still requires significant human intervention, highlighting both the potential and current limitations of AI agents.
2 Sources
OpenAI is set to launch "Operator," an advanced AI agent capable of autonomously performing complex tasks, in January 2025. This development marks a significant shift towards agentic AI and has far-reaching implications for various industries.
23 Sources
OpenAI introduces Operator, an AI agent that automates web browsing tasks, but its $200 monthly subscription and limited capabilities raise questions about its current value and functionality.
2 Sources
AI super-agents, capable of performing complex tasks autonomously, are poised to transform industries. While promising increased efficiency, they also raise concerns about job displacement and ethical implications.
6 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved