Curated by THEOUTPOST
On Thu, 12 Dec, 12:03 AM UTC
15 Sources
[1]
Google reveals Project Mariner -- a new browser agent that can automate your digital life
In true holiday spirit, Google has joined the throng of AI announcements leading up to the end of the year with the unveiling of its new agentic web AI product, Project Mariner -- built on the powerful Gemini 2.0. This experimental Chrome extension works in the sidebar of the browser to autonomously navigate, search and conduct actions for the user. Interested users can sign up for early viewing at the Project Mariner website, although access is currently limited to the U.S. only. The product has been developed by Google's DeepMind AI team, and follows hard on the launch of the multimodal Gemini 2.0 model, also released this week. It also comes as Google updates Gemini's Project Astra. On the surface, the new product looks very similar to Anthropic Claude's Computer Use feature, which was released a while ago. However, a crucial difference between the two is the fact that the Google tool includes reasoning as part of its process, which takes agentic AI to a whole new level. Not only can Mariner manipulate the user's browser to click links and understand what it's seeing, but it has the ability to do this across multiple websites, whilst at the same time showing the reasoning behind the actions it's taking. The product also has the ability to handle multi-modal manipulation, including video and audio as part of its underlying technology. In typical Google fashion Project Mariner has been released to a small group of test users on a very tight leash, as the company evaluates use cases and irons out any potential bugs. "We're working with trusted testers to make it faster and smoother, and it's so important to keep a human in the loop," says Google product manager Jaclyn Konzelmann. One launch demo video shows the user accessing a spreadsheet full of company names, asking Mariner to locate contact details for each company to add to an outreach list. The agent successfully navigates several websites, identifying the company name and contact details, and delivers exactly what was requested with no user intervention. It's very slick, and an impressive demonstration of the power of the new Gemini 2.0 model. As we predicted earlier this year, it looks as though the next several months will indeed see the launch of a raft of agentic AI products from around the world. Of course, all of this technology is still in the very early stages, and it remains to be seen exactly how useful these tools will be for businesses and individuals in day-to-day situations. In the meantime, further information can be obtained from the Google project website directly.
[2]
What is Google's Project Mariner? This AI agent can navigate the web for you
This experimental Chrome extension can handle all sorts of web tasks on your behalf. Here's how to try it out. Following OpenAI's lead, Google released a slew of AI announcements ahead of the holiday season, focusing on agentic AI. The announcements included the highly anticipated Gemini 2.0, which is powering other advanced AI products, including Project Mariner. Project Mariner is a research prototype built on Gemini 2.0 that explores human-agent interactions. It can automate tasks in your browser, acting like an AI agent by carrying out tasks delegated to it on the web for you. If it seems a bit complicated, the examples will really help bring it to life. Also: Google's Gemini 2.0 AI promises to be faster and smarter via agentic advances How does it work? Project Mariner lives in the web browser as an experimental Chrome extension and understands the contents of your screen, including images, code, forms, and more. On the right-hand window, you can type into the chatbot whatever task you want the agent to perform for you. Once it understands your query, Mariner will navigate websites on your behalf in real time, reason through what it needs to do to carry out your task, and give you a view of its plan before making a process. According to Google, Project Mariner scored 83.5% working as a single agent setup when evaluated against the WebVoyager benchmark. In a Google demo, the user has a Google Sheet open with four companies listed. Then, to find the contact information for each, the user asks the extension to do so instead of looking each one up for themself. They ask Mariner to "Memorize this list of companies. Then, find their websites and look up a contact email I can use to reach them. Remember these so I can use them later." Then, the agent shows step-by-step how it plans to tackle the prompt while navigating the web so the user can see its reasoning process. It only works in the active tab, never in the background, and Google recommends there always be human supervision. What you see as an end user is similar to when IT takes remote control of your computer, seeing the cursor move across your screen and actions being performed for you. Before performing sensitive actions, like placing an order, Mariner asks for user confirmation. Google reassures users that it's actively researching new types of risks and mitigations to make sure the tool is built responsibly. Also: Google Labs just got a redesign. Here are 6 reasons to check it out Since Mariner is still in its early phases, the extension is only available to a small group of trusted testers, but you can sign up for the trusted tester waitlist on the Project Mariner site. Until all users are given access, you can find many other AI experiments on Google Labs that are available today.
[3]
Google unveils Project Mariner: AI agents to use the web for you | TechCrunch
Google unveiled its first-ever AI agent that can take actions on the web on Wednesday, a research prototype from the company's DeepMind division called Project Mariner. The Gemini-powered agent takes control of your Chrome browser, moves the cursor on your screen, clicks buttons, and fills out forms, allowing it to use and navigate websites much like a human would. The company is starting out by releasing its AI agent to a small group of pre-selected testers on Wednesday, Google says. Google is continuing to experiment with new ways for Gemini to read, summarize, and, now, use websites. A Google executive tells TechCrunch this is part of a "fundamentally new UX paradigm shift": moving users away from directly interacting with websites, and instead, interacting with a generative AI system that does it for you. These shifts could affect millions of businesses - from publishers like TechCrunch, to retailers like Walmart - which have historically relied on Google to send real people to visit and use their websites. In a demo with TechCrunch, Google Labs Director Jaclyn Konzelmann showed how Project Mariner works. After setting up the AI agent with an extension in Chrome, a chat window pops up to the right of your browser. You can instruct the agent to do things like "create a shopping cart from a grocery store based on this list." From there, the AI agent navigated to a grocery store's website - in this case, Safeway - and then searched for and added items to a virtual shopping cart. One thing that's immediately evident is how slow the agent is - there were about 5 seconds of delay in-between each cursor movement. At times, the agent stopped its task and reverted back to the chat window, asking for clarification about certain items (how many carrots, etc.). Google's agent cannot check out, as it's not supposed to fill out credit card numbers or billing information. Project Mariner also won't accept cookies for users, or sign a terms of service agreement. Google says it purposefully doesn't allow the agent to do these things, in order to give users more control. Behind the scenes, Google's agent is taking screenshots of your browser window, something users must agree to in the terms of service, and sending them to Gemini in the cloud for processing. Gemini then sends instructions back to your computer to navigate the webpage. Project Mariner can also be used to find flights and hotels, shop for household items, find recipes, and other tasks that currently require users to click through the web. One major caveat is that Project Mariner only works on a Chrome browser's foremost active tab, which means you can't use your computer for other things while the agent works in the background - you need to watch Gemini slowly click around. Google DeepMind's Chief Technology Officer, Koray Kavukcuoglu, says this was a very intentional decision so that users know what Google's AI agent is doing. "Because [Gemini] is now taking actions on a user's behalf, it's important to take this step-by-step," said Kavukcuoglu in an interview with TechCrunch. "It's complementary. You, as an individual, can use websites, and now your agent can do everything that you do on a website as well." Website owners may be relieved to hear that Google's AI agent works on your computer screen, because that means publishers and retailers still get your eyeballs on their pages. However, Google's AI agent could mean that users are less engaged with the websites they visit, and one day, it may not require users to use these websites at all. "[Project Mariner] is a fundamentally new UX paradigm shift that we're seeing right now," Konzelmann told TechCrunch. "We need to figure out what is the right way for all of this to change the way users interact with the web, and the way publishers can create experiences for users, as well as for agents, in the future." Besides Project Mariner, Google also unveiled several other AI agents for more specific tasks on Wednesday. One AI agent, Deep Research, aims to help users explore complex topics by creating multi-step research plans. It seems to compete with OpenAI's o1, which can also do multi-step reasoning. However, a Google spokesperson notes the agent is not designed to solve math and logical reasoning problems, write code, or do data analysis. The AI agent is rolling out in Gemini Advanced today, and will come to the Gemini app in 2025. When prompted with a difficult or large question, Deep Research will create a multi-step action plan to answer it. After the user approves the plan, Deep Research takes a few minutes to answer the question and search the web, and then generates a lengthy report on its findings. Another new AI agent from Google, Jules, aims to help developers with coding tasks. It integrates directly into GitHub workflows, allowing Jules to view your existing work and make changes directly in GitHub. Jules is rolling out to a select group of beta testers today, and will be available later in 2025. Finally, Google DeepMind says its working on an AI agent to help you navigate video games, building on its long history creating game-playing AI. Google is working with game developers, like Supercell, to test Gemini's ability to interpret gaming worlds such as "Clash of Clans." Google didn't offer any release date for this prototype, but says this work is helping them build AI agents that help navigate physical worlds, as well as virtual ones. It's unclear when Project Mariner will roll out to Google's massive userbase, but when they do, these agents will have a significant impact on the broader web. The web is designed for humans to use it, but Google's AI agents could change that standard.
[4]
Google's Project Mariner AI can surf the web for you, with a huge caveat
Summary Google's new AI agent, Project Mariner, tackles internet tasks via the Chrome browser for Windows, based on user prompts. Users must watch Mariner work in the active tab, limiting its time-saving capabilities for now. Google recognizes it's a work in progress, and is actively bringing it up to speed. ✕ Remove Ads When a Chrome extension codenamed Project Jarvis leaked, its ability to completely take over your browser inspired a mixture of awe and worry. A couple of weeks later, Google accidentally pushed a non-working Jarvis listing to the Chrome Web Store, reaffirming its upcoming launch. After several opportunistic AI apps swiping the name "Jarvis," the Gemini 2.0-powered AI agent has re-emerged as Project Mariner, presumably because it can surf the digital sea without (much of) your help (via @techcrunch.com on BlueSky). But you won't yet be able to multitask while it navigates, so don't get too excited. Related Google unleashes Gemini 2.0 with new image and audio powers for the AI agent era Almost exactly one year after Gemini 1.0 Posts The all-knowing Mariner will still need your oversight Prepare to have patience staring at web pages Source: Google ✕ Remove Ads Users have dreamed of letting AI do the menial stuff like shopping for staples and finding cheap lodging since it first went mainstream. Typically, each AI tool works exclusively within its own interface and at the user's behest, but that's changing. Beginning with "trusted users" and slowly expanding via waitlist, Project Mariner slogs through the internet for you, doing its best to tackle busywork. But, as is so often the case, there's a catch. After receiving your commands via chat window, it gets to work -- and you have to watch it. For now, you won't be able to set it and forget it while satisfying your social media addiction in another tab. As Google explains, "Project Mariner can only type, scroll or click in the active tab on your browser," so it won't be much of a time-saver yet. ✕ Remove Ads To be fair to Google, it makes Mariner's prototype status clear on the Gemini 2.0 introductory blog post. "It's still early, but Project Mariner shows that it's becoming technically possible to navigate within a browser," the blog notes, "even though it's not always accurate and slow to complete tasks today, which will improve rapidly over time." It goes on to explain how the team is "conducting active research" on testers use of "an experimental Chrome extension." Related I'm convinced AI will take over, but not in the way you think We're outsourcing people skills Posts5 Google Mariner still needs a first mate Guess who's responsible for guiding it Source: Google ✕ Remove Ads One could reasonably guess that, in addition to speeding up, Project Mariner may eventually be able to browse in the background, although Google hasn't mentioned that yet. Jaclyn Konzelmann, Google Labs Director, sensibly justified the limitation in a conversation with TechCrunch, calling it "a fundamentally new UX paradigm shift," and highlighting the "need to figure out the right way" for Mariner to take over a PC. That's reasonable on its surface, as is Mariner's inability to submit sensitive information like payment details, addresses, and consent to privacy policies. Like nearly every new AI tool, though, Mariner's early restrictions, slow performance, and generally cautious implementation underscore how users aren't just the sources of data, we're also the guinea pigs, beta testers, and product. Nonetheless, while Mariner isn't ready for prime time, the much-heralded tool does provide a window into the utilitarian potential of Gemini AI, which can still seem like a solution in search of a problem at some points. Related 5 reasons Google's AI is leagues ahead of Apple Intelligence Even comparing the two feels unfair Posts1 ✕ Remove Ads
[5]
Project Mariner: Google's AI Agent Can Perform Tasks For You in Chrome
Jules is an AI coding agent by Google that can tackle GitHub issues, make a plan, and execute it under the developer's supervision. Last month, we reported that Google is developing an AI agent in the form of a browser extension that can perform actions for you in the web browser. And in today's Gemini 2.0 announcement, Google has finally unveiled Project Mariner, an early prototype that can unlock the future of human-agent interaction. Powered by the latest Gemini 2.0 model, Project Mariner can understand what it sees on your browser screen and uses that information to perform tasks for you. It can understand web elements like forms, text fields, code, images, and more. The web extension, powered by Project Mariner can type, scroll, and click in the active tab, but for sensitive tasks like purchasing something, it requires final confirmation from the user. Google says the early prototype is currently slow and not always accurate, but it will rapidly improve over time. In a demo that Google showcased, Project Mariner can remember company names from a Google Sheet, browse the web, find the websites of companies, and extract the contact details. In the WebVoyager benchmark that tests the agentic capability of models on real-world web tasks, Project Mariner achieved 83.5% which is the highest score to date. Google says it's working with trusted testers to improve Project Mariner, but there is no information on its release date. As for Project Astra which was announced at Google I/O 2024, Google says it can now understand multiple languages and use tools like Google Search, Maps, and Lens to deliver a better experience. Project Astra is also getting better at remembering things. It can now remember 10 minutes of in-session memory for improved personalization. Google has significantly reduced the latency too. Project Astra's release date is unknown, but Google says its capabilities will be integrated into the Gemini app, and other form factors like glasses. Apart from that, Google also announced that it's working with game developers to explore how its AI agents behave in games like Clash of Clans and Hay Day. Google's Gemini 2.0-powered AI agents can see the screen and offer suggestions in real-time. These AI agents can also use Google Search and offer gaming knowledge on the go. Finally, Google introduced Jules, an AI code agent for developers that integrates directly into a GitHub workflow. It can find issues, develop a plan, and execute it under the developer's supervision. You can find more details about Jules from here.
[6]
Google's new experimental AI agent can browse the web for you
In brief: Google recently unveiled Gemini 2.0, the next generation of its GenAI toolchain. The company is gradually introducing multiple GenAI agents that leverage the new model for various tasks. One of these, Mariner, can automatically control web browsers to retrieve information, make purchases, and perform other actions. Google has begun early testing on a new AI agent that can automate web browsing tasks. While the company admits that the software isn't perfect and is taking safety precautions, deploying it might raise questions about the future of the web. Project Mariner, an extension for an experimental build of Chrome, can execute multi-step commands to browse websites, use Google search, retrieve specified information, go shopping, and more. The company claims the agent can assist with tasks that are usually tedious for humans. In one example, a tester shows Mariner a spreadsheet listing the names of multiple companies and asks the AI to find each of their contact email addresses. Mariner then Googles each company's official website, browses through them, copies their contact emails, and pastes them into the chat window. Another demonstration tasks the agent with identifying the most famous impressionist painter, retrieving a selection of their works, and adding similar paint to a user's Etsy cart. In response, it presents a few Vincent Van Gogh paintings and stops just short of purchasing a palette on the art website. To preserve transparency, Mariner displays its entire logic chain in the chat window on the right side of the browser window. Users can pause the agent at any point and have the final say before it completes purchases. Furthermore, the AI only controls the browser window's active tab. Google admits that Mariner isn't extremely fast or perfectly accurate, so it's unclear when it might see a public release. The Van Gogh search took around five minutes, and the company had to speed up the video demonstrating the contact email retrieval. Mariner is likely a test build for Project Jarvis, an AI agent that The Information leaked in October. The report indicated that Jarvis could enter text, take screenshots, interpret information, and control the mouse cursor. Interestingly, Mariner resembles an idea that Microsoft AI CEO Mustafa Suleyman recently proposed. He believes that AI assistants might make manual web browsing obsolete within a few years and that websites could be redesigned so AI agents representing businesses can talk to AI agents representing customers. Other new Gemini 2.0 tools can describe real-world objects in numerous languages, assist developers, and advise users while playing video games.
[7]
Google Explores AI Computer Use With Project Mariner
Microsoft and Anthropic have introduced similar features enabling AI computer use. When Google unveiled Gemini 2.0 on Wednesday, Dec.11, CEO Sundar Pichai described it as the firm's "most capable model yet" that will help usher in a new "agentic era" for AI. Joining other AI labs that are shifting the emphasis from chatbots to sophisticated AI agents that can use digital tools and carry out requests, Project Mariner is equipping Gemini 2.0 with the ability to use computers. Beyond Chatbots Project Mariner rides a wave of interest in AI assistants that can do more than just answer questions. Microsoft made a significant move in this direction when it introduced new Copilot capabilities earlier this year. The firm boasted that the new independent agents "can automate and orchestrate complex, long-running business processes with more autonomy and less human intervention." A week later, Anthropic unveiled a Claude update that lets the AI control a computer's cursor and input information via a virtual keyboard. In a similar vein, Project Mariner "can understand and reason across everything on your browser screen," letting it navigate complex websites in real time, Google said. "The Future of Human-Agent Interaction" The development of new, more capable agents that can interact with a potentially limitless array of digital interfaces promises to transform the way people use AI. The field has advanced significantly in a short space of time. When Claude unveiled its computer use demo in October, it scored 56% on WebVoyager, an AI benchmark that assesses agents' ability to manipulate websites. Less than two months later, Project Voyager has pushed the bar much higher, scoring 83.5%. Kura's multi-agent setup performs even better. Google emphasized that Project Mariner is an early research prototype. But as the technology progresses, such initiatives will "explore the future of human-agent interaction." Moreover, web browsers are just the start. As action bots become more capable, they will be able to use a greater range of software applications. Meanwhile, the work of integrating and connecting different systems will become much easier. User Adoption While Project Mariner is currently only available to early testers, the race to deliver agents that can perform more complex tasks to a wider audience has already started. Commenting on the new technology to CCN, SOCi's Director of Market Insights Damian Rollison said the potential applications span a diverse range of use cases: "Ideally, you'd want an AI agent to be able to book your flight, hotel, and rental car for you, not just look up travel info, or compile relevant statistics into a report or slide deck to save you the trouble." However, "it's notable that Mariner is not yet publicly available and that Google acknowledges the mistakes it can still make," he said and added: "The state of the AI arms race being what it is, the company probably feels it must rush these products out in order to stay competitive, but their utility will only be proven by broad consumer adoption."
[8]
Google unveils AI agent that can use websites on its own
Google's new prototype, called Mariner, is based on Gemini 2.0, which the company also unveiled Wednesday. Gemini is the core technology that underpins many of the company's AI products and research experiments. Versions of the system will power the company's chatbot of the same name and AI Overviews, a Google search tool that directly answers user questions.Today, chatbots can answer questions, write poems and generate images. In the future, they could also autonomously perform tasks like online shopping and work with tools like spreadsheets. Google on Wednesday unveiled a prototype of this technology, which artificial intelligence researchers call an AI agent. Google is among the many tech companies building AI agents. Various AI startups, including OpenAI and Anthropic, have unveiled similar prototypes that can use software apps, websites and other online tools. Google's new prototype, called Mariner, is based on Gemini 2.0, which the company also unveiled Wednesday. Gemini is the core technology that underpins many of the company's AI products and research experiments. Versions of the system will power the company's chatbot of the same name and AI Overviews, a Google search tool that directly answers user questions. "We're basically allowing users to type requests into their web browser and have Mariner take actions on their behalf," Jaclyn Konzelmann, a Google project manager, said in an interview with The New York Times. Gemini is what AI researchers call a neural network -- a mathematical system that can learn skills by analyzing enormous amounts of data. By recognizing patterns in articles and books culled from across the internet, for instance, a neural network can learn to generate text on its own. The latest version of Gemini learns from a wide range of data, from text to images to sounds. That might include images showing how people use spreadsheets, shopping sites and other online services. Drawing on what Gemini has learned, Mariner can use similar services on behalf of computer users. "It can understand that it needs to press a button to make something happen," Demis Hassabis, who oversees Google's core AI lab, said in an interview with the Times. "It can take action in the world." Mariner is designed to be used "with a human in the loop," Konzelmann said. For instance, it can fill a virtual shopping cart with groceries if a user is in an active browser tab, but it will not actually buy the groceries. The user must make the purchase. Sundar Pichai, Google's CEO, said in a blog post that the developments "bring us closer to our vision of a universal assistant." The project was developed as an extension for Google's popular web browser, Chrome, making it an important platform for the company's future AI ambitions. But those plans could face a setback. The Justice Department has asked a federal judge to force Google to sell or spin off Chrome after a landmark ruling that the company's search engine is an illegal monopoly. There are other challenges as well. Konzelmann acknowledged that, like other chatbots, Mariner makes mistakes. Because such systems operate according to patterns found in vast amounts of data, they sometimes go awry. The mistakes that chatbots make when generating text sometimes go unnoticed, but errors are more problematic when systems are trying to use websites and take other actions. "Is it always accurate? Not yet," Konzelmann said. "It is still an experimental technology." Google is sharing Mariner with a small number of testers outside the company but has not yet shared plans for a wider release. On Wednesday, the company also showed off a new version of Project Astra, a smartphone digital assistant that responds to images and text as well as verbal commands. Like technology unveiled by OpenAI earlier this year, Astra is a more powerful version of a digital assistant like Apple's Siri. It also is not yet available to the general public.
[9]
Google details Gemini 2.0 Project Astra capabilities, 'Mariner' browser agent
In announcing Gemini 2.0, Google today shared the latest on Project Astra, while unveiling Project Mariner as an agent that can browse the web for you. Google says a "new class of agentic experiences" are made possible by Gemini 2.0 Flash's "native user interface action-capabilities." It also credits improvements to "multimodal reasoning, long context understanding, complex instruction following and planning, compositional function-calling, native tool use and improved latency." All built on Gemini 2.0, these projects/prototypes are still in the "early stages of development," but "trusted testers" now have access to them and are providing feedback. With Gemini 2.0, there are a number of updates to Project Astra -- Google's effort to build an assistant or "universal AI agent that is helpful in everyday life" -- since it was shown off at I/O 2024 in May: In a demo video that Google shared today, you see a Project Astra Android app with a viewfinder UI and the ability to analyze (screen sharing) what's on your display, while it remains active as a chathead. This application is just for testing purposes. When Project Astra launches for consumers, it will be through the Gemini (Live) app. Google is also testing Astra on prototype glasses. Meanwhile, Project Mariner is an agent that can browse and navigate (type, scroll, or click) the web to perform a broader task specified by the user. Specifically, it can "understand and reason across information in your browser screen, including pixels and web elements like text, code, images and forms." At the moment, it exists as a Chrome Extension that makes use of the existing side panel UI. Google demoed a small business use case and for shopping. When evaluated against the WebVoyager benchmark, which tests agent performance on end-to-end real world web tasks, Project Mariner achieved a state-of-the-art result of 83.5% working as a single agent setup. On the safety front, Project Mariner can only perform actions in the active browser tab. It will have users confirm "certain sensitive actions, like purchasing something." It's also being designed to "identify potentially malicious instructions from external sources and prevent misuse" from fraud and phishing. Trusted testers are starting to test Project Mariner using an experimental Chrome extension now, and we're beginning conversations with the web ecosystem in parallel. Google also discussed an "experimental AI-powered code agent that integrates directly into a GitHub workflow" called Jules. It can tackle an issue, develop a plan and execute it, all under a developer's direction and supervision. This effort is part of our long-term goal of building AI agents that are helpful in all domains, including coding. The last prototype is Gemini 2.0 for Games that can serve as a "virtual gaming companion" that sees your mobile phone screen and can answer your questions. It's being tested with games like Clash of Clans.
[10]
Google Crashes Copilot Vision, Computer Use Party with Mariner
But how will it compete with OpenAI's 'Project Operator', that is set to be released soon? Google has announced an early-stage research prototype, Project Mariner, which will understand and reason based on information that can be accessed while a user navigates on a web browser. This feature is built on top of Google's latest Gemini 2.0. Google also says that the agent uses information it sees on the screen through a Google Chrome extension to complete related tasks. The agent will be able to read information, like text, code, images, forms and even voice-based instructions. The agent is also capable of navigating and interacting with websites on the user's behalf and automating certain tasks. The company, in a demo video, showcased Project Mariner's capabilities. The agent was prompted to find a painting of 'the most famous post-impressionist' from Google Arts and Culture and clubbed it with an unrelated task, which involved adding 'colourful paints' to an Etsy cart. Project Mariner then fed the instructions to Gemini to find the artist and the painting, fetched details, and then automatically redirected the user to Google Arts and Culture. Later, it searched for the painting on the website. For the next task, it navigated to Etsy and added a set of watercolours to the shopping cart. During the process, Project Mariner understood the instructions and further broke them into step-by-step actionable tasks. The tool performed actions in the active tab and not through any background activity. Project Mariner is available through a 'Trusted Tester Waitlist'. Along with this announcement, Google also officially unveiled the Gemini 2.0 family of models, starting with Gemini 2.0 Flash. Google also announced updates to Project Astra, such as better dialogue and memory capabilities and the ability to use external tools. Along with Project Mariner, Google also unveiled Jules, an AI code agent that can be directly integrated into a GitHub workflow. That said, Google's agent arrived just days after Microsoft announced Copilot Vision as an experimental feature. Copilot Vision can read and analyse web pages and can provide relevant summaries and information to the user. However, unlike Project Mariner, Copilot Vision cannot take actions on behalf of the user. Therefore, Google's only real competitor is Anthropic's Computer Use, which not only performs autonomous actions but is also not restricted to a browser environment. Many developers are already experimenting with Computer Use, and most recently, Hume AI explored a capability that lets you control your desktop just by using your Voice. It will be interesting to see what OpenAI's rumoured 'Project Operator' is going to look like. A few days ago, OpenAI demonstrated an agent based on GPT 4o at the GenerationAI Conference in Paris, where it assisted in customer issues. It is possible that OpenAI will officially announce features along these lines at the ongoing 12 Days of OpenAI events.
[11]
Google Introduces AI Agent Prototype That Can Browse The Web - Alphabet (NASDAQ:GOOG), Alphabet (NASDAQ:GOOGL)
The experimental Chrome extension is not yet available to the general public. As the race to introduce artificial intelligence features rages on, Alphabet Inc GOOG GOOGL unveiled a new prototype on Wednesday that could reshape the AI marketplace. Google's Mariner: "Project Mariner" is an AI prototype that operates within users' browsers using Google's Gemini 2.0 technology. The experimental Chrome extension, not yet available to the public, "combines strong multimodal understanding and reasoning capabilities" to automate users' tasks. Google says that Mariner can "follow complex instructions and reason across websites." A promotional video showed the tool utilizing a Google Sheet of company names to scour the Internet for the companies' contact information. The tool shows the user its reasoning when searching the web, but it cannot operate in the background of a user's browser. Google says it is working with developers to speed up the tool to make the user experience smoother. It also emphasized that Mariner could be prone to making mistakes. Users interested in accessing the tool can add their names to a "trusted tester" waitlist. Why it Matters: Though Big Tech companies have invested hundreds of billions of dollars into AI, it remains unclear how useful the technology will end up being. Some experts say that AI will not be able to find a foothold until it occupies a clear, "killer use case," meaning it's necessary that it proves the value of the technology as a whole. Past examples of killer applications include Microsoft's Excel and Apple's iTunes. Also Read: Apple Strengthens GenAI With iOS 18.2: Genmoji, Image Playground Come To iPhone Image: Shutterstock Market News and Data brought to you by Benzinga APIs
[12]
Google Reveals Gemini 2, AI Agents, and a Prototype Personal Assistant
A new version of Google's flagship AI model shows how the company sees AI transforming personal computing, web search, and perhaps the way people interact with the physical world. Google once only wanted to organize the world's information. Now it seems more intent on shoveling that information into artificial intelligence algorithms that become dutiful, ever-present, and increasingly powerful virtual helpers. Google today announced Gemini 2, a new version of its flagship AI model that has been trained to plan and execute tasks on a user's computers and the web, and which can chat like a person and make sense of the physical world as a virtual butler. "I've dreamed about a universal digital assistant for a long, long time, as a stepping stone on the path to artificial general intelligence," Demis Hassabis, the CEO of Google DeepMind told WIRED ahead of today's announcement, alluding to the idea of AI that can eventually do anything a human brain can. Gemini 2 is primarily another step up in AI's intelligence as measured by different benchmarks used to gauge such things. The model also has improved "multimodal" abilities, meaning it is more skilled at parsing video and audio, and at conversing in speech. The model has also been trained to plan and execute actions on computers. "Over the last year, we have been investing in developing more agentic models," Google's CEO, Sundar Pichai said in a statement today. These models, Pichai added, "can understand more about the world around you, think multiple steps ahead, and take action on your behalf, with your supervision." Tech companies believe that so-called AI agents could be the next big leap forward for the technology, with chatbots increasingly taking on chores for users. If successful, AI agents could revolutionize personal computing by routinely booking flights, arranging meetings, and analyzing and organizing documents. But getting the technology to follow open-ended commands reliably remains a challenge, with the risk that errors could translate into costly and hard-to-undo mistakes. Still Google thinks it is moving in the right direction, and is introducing two specialized AI agents to demonstrate Gemini 2 agentic potential: one for coding and another for data science. Rather than simply autocompleting sections of code, as current AI tools do, these agents can take on more complex work, such as checking code into repositories or combining data to enable analysis. The company is also showing off Project Mariner, an experimental Chrome extension that is capable of taking over web navigation in order to do useful chores for users. WIRED was given a live demo at Google DeepMind's headquarters in London. The agent was asked to help plan a meal, which saw it navigate to the website of the supermarket chain Sainsbury's, log into a user's account, and then add relevant items to their shopping basket. When certain items were unavailable the model chose suitable replacements based on its own knowledge about cooking. Google declined to perform other tasks, suggesting it remains a work in progress.
[13]
Google races to bring AI-powered 'agents' to consumers
Google has launched a more advanced version of its Gemini artificial intelligence model that enables it to take actions on users' behalf, as the US tech group races to bring AI-powered assistants to consumers. The Silicon Valley giant on Wednesday also unveiled its vision of two "AI agents" powered by the new model, that can answer real-time queries across text, video and audio. These have been tested by a small group of users in the US and the UK over the past few months. It comes as tech groups including OpenAI, Meta and Apple are rushing to launch AI-powered personal assistants, which can reason and complete complex tasks for people, as they look to generate revenue from their powerful but costly models. On Wednesday, Apple also rolled out an update to its operating systems that marked its first big foray into generative AI. It includes giving iPhone users free access to OpenAI's ChatGPT and its most advanced models via Siri, camera and writing tools. It timed the move with the launch of Apple Intelligence into markets outside of the US for the first time, including the UK. Google would not confirm when it would release the prototypes -- known as Project Astra and Project Mariner -- more widely to consumers, but said it had moved an important step closer with these working versions. "These are people in the real world, in a controlled environment . . . we want to start getting real world feedback as early as possible," said Praveen Srinivasan, technical director at DeepMind, who worked on the Astra project. Astra can be accessed either through a phone or via smart glasses, while Mariner can complete tasks on a user's Chrome browser, including adding groceries from a recipe to an online shopping basket, filling out forms or planning travel itineraries. Google's showcasing of its improvements highlights how AI agents have become the latest front in the battle between tech companies. In October, AI start-up Anthropic unveiled a tool that can conduct actions on users' behalf, aimed at the developer market. Meanwhile, Google, Amazon, Meta and OpenAI are among those developing general-purpose agents that can be used by anyone. OpenAI recently said it believed that AI agents would hit the mainstream in 2025. "Over the last year, we have been investing in developing more agentic models, meaning they can understand more about the world around you, think multiple steps ahead, and take action on your behalf, with your supervision," said Alphabet chief executive Sundar Pichai. In a live demo of the Astra assistant on a smartphone, the AI software answered a series of questions from the Financial Times about paintings that the phone camera was pointed at, including accessing memories of works it had seen recently. Due to its ten minutes of photographic memory, Astra was also able to memorise pages in a recipe book and then respond to questions about ingredients and wine pairings. A video demonstration of Astra showed a user wearing a pair of glasses, acting as a camera, which he could activate by tapping on its side. The product was reminiscent of Google Glass, an ambitious but failed attempt at wearable technology that the tech group announced in 2012 and shelved three years later. Mariner is a Chrome-based browser add-on that can read web pages, as well as type, click and scroll on your behalf. Megha Goel, product manager at Google, said the company had currently blocked off certain actions for users' safety, such as purchasing items online or accepting cookies on their behalf.
[14]
Gemini 2.0 and Project Astra Make Google's AI Your Know-It-All Assistant
Imad is a senior reporter covering Google and internet culture. Hailing from Texas, Imad started his journalism career in 2013 and has amassed bylines with The New York Times, The Washington Post, ESPN, Tom's Guide and Wired, among others. Google is entering its AI agents era with the introduction of Gemini 2.0 -- the company's next generation AI chatbot -- and a limited release of Project Astra, a computer vision-assisted AI agent that can see and analyze the world around you, the company said in a press release on Wednesday. Project Astra, which was shown off at Google I/O earlier this year, is a major leap in Google's AI research from its DeepMind team in London. Like the video from earlier this year demonstrated, Project Astra, which can work through your phone's camera or through camera-equipped glasses, can see and analyze the world around you and give answers to anything it recognizes. This includes being able to ask your glasses where a bus is headed, what the code is to your apartment complex is or where you left your book. Google says its latest advancements with Astra include better dialogue and conversibility in multiple languages, deeper integration with Google Lens and Maps, up to 10 minutes of memory, and better latency for faster responses. Project Astra will first land with people in its trusted tester program. A time frame for when it might go public wasn't given. Project Astra's ocular capabilities will certainly raise privacy concerns. Google said it's working with its Responsibility and Safety Committee, the company's interview review group, to flag potential risks. This includes flags to prevent users from unintentionally sharing sensitive information with agents and controls so that users can delete sessions. Google's latest wave of announcements come as major AI advancements have slowed and Wall Street investment has softened. At the same time, OpenAI, creators of ChatGPT, have been releasing newer, more advanced models and raising billions of dollars in investments. It's a race between Google and OpenAI, which is heavily backed by Microsoft, to see which Big Tech giant will lead the race in AI. Some analysts believe that AI development will be a winner-take-all race, with the best tech leading the market. Microsoft has already spent $19 billion, and DeepMind CEO Demis Hassabis says Google will spend $100 billion on AI. The AI market is expected to be valued at $1.8 trillion by 2030, according to a report by Grand View Research. While Project Astra certainly ignites imaginations of what's possible with this all-seeing AI tech, Gemini is also getting a big update. Gemini 2.0 will have advanced reasoning capabilities with better responses across the board, from general queries to coding questions and even math, according to Google. The company says Gemini 2.0 also works faster than previous versions. An early experimental version of Gemini 2.0 will be given to developers before going out to the wider public. Luckily, starting Wednesday, fans can play with the chat version of Gemini 2.0 Flash via the Gemini app on their phones. Gemini 2.0 Flash is a lighter version of Gemini 2.0. Google says Gemini 2.0 will expand to more Google products early next year. Along with news on Project Astra and Gemini 2.0, Google also unveiled Project Mariner, a prototype Chrome extension that can help with more complex tasks. Currently limited to Chrome for trusted testers, it can analyze text, images, graphs and other web elements at the pixel level and use that information to complete complex tasks. Google's still working on Mariner and admits that the tech isn't always accurate and is slow to complete tasks. Google's engineers have also been working on ways to have AI help with your gaming. In partnership with Supercell, creator of Clash of Clans and Brawl Stars, Google is working on an AI that can answer questions about the games you're playing, like what you need to do to beat a boss. Google didn't unveil how its AI will be able to deliver this information. Is it based on information provided from the game developer? Or is Google sucking up information from gaming guides published online? Google also announced Jules, another AI agent the company working on to help with coding. Jules integrates into a GitHub workflow.
[15]
Google's Gemini 2.0 Finally Shows You the Assistant in Your Future
We may earn a commission when you click links to retailers and purchase goods. More info. To close out 2024, Google announced Gemini 2.0, it's newest version of its AI model that will basically power all of its AI endeavors going forward. As a part of the announcement, Google showed off a bunch of future plans, like its Project Astra assistant that will likely like within a pair of smart glasses at some point soon. For now, Gemini 2.0 is really here for developers before it launches more openly to consumers. If you happen to be a developer in the AI space, you'll want to read this blog post from Google on what you can do with Gemini 2.0 and to get started. For the rest of us, Gemini 2.0 is currently only available as Gemini 2.0 Flash on desktop and the mobile web. It should launch to the Gemini mobile app "soon" and in more Google products you might use next year. As far as capabilities, the simplest description of this new model is that it is twice as fast as Gemini 1.5 Pro. That's something, since this is only Gemini 2.0 Flash. Here's how Google describes the new 2.0 release: Notably, 2.0 Flash even outperforms 1.5 Pro on key benchmarks, at twice the speed. 2.0 Flash also comes with new capabilities. In addition to supporting multimodal inputs like images, video and audio, 2.0 Flash now supports multimodal output like natively generated images mixed with text and steerable text-to-speech (TTS) multilingual audio. It can also natively call tools like Google Search, code execution as well as third-party user-defined functions. Google is prototyping several ideas with Gemini 2.0 that you should at least be aware of: Project Astra, their research prototype exploring future capabilities of a universal AI assistant; a new Project Mariner, that "explores the future of human-agent interaction, starting with your browser"; and Jules, an AI-powered code agent for developers. Project Astra Look, there's a lot of Gemini 2.0 stuff to cover and most of it is not the type of thing we would typically report on. But one thing is - Project Astra. You see, we all watched Google Assistant slowly dwindle down its feature set and frustrate long-time users, only to witness the rise of Gemini as a potential replacement, only Gemini pretty much sucks at doing all of the things Assistant could do. And that's where Project Astra comes in. Google believes this is the potential future of an Assistant in your hands, at least on Android for now. Project Astra was first introduced at Google I/O and it is now powered by Gemini 2.0, where it could live on your phone and be an assistant that accompanies you throughout a day. Here's how Google describes it living with you and what it can do: Since that's a bit nerd speak, the video below shows Project Astra in practice or the real world. It's basically an Assistant that can also use your camera to help you as you need it. And yes, Google plans to enable it to work within smart glasses at some point. In this video, you'll see those glasses that are in "prototype" form. What do you guys think? Is AI starting to finally look like the future? Are you mored scared than ever?
Share
Share
Copy Link
Google introduces Project Mariner, an experimental AI agent powered by Gemini 2.0 that can automate web tasks in Chrome. This prototype showcases the potential of agentic AI but comes with limitations and raises questions about the future of web interactions.
Google has unveiled Project Mariner, an experimental AI agent built on the recently released Gemini 2.0 model. This Chrome extension aims to automate various web tasks, potentially revolutionizing how users interact with the internet 123.
Project Mariner operates as a sidebar in the Chrome browser, allowing users to input tasks they want the AI to perform. The agent can navigate websites, search for information, and conduct actions autonomously. Key features include:
In a demonstration, Project Mariner successfully gathered contact information for multiple companies from a spreadsheet, navigating various websites without user intervention 1.
The AI agent takes screenshots of the browser window, which are sent to Gemini in the cloud for processing. Gemini then returns instructions to navigate the webpage 3. However, there are notable limitations:
Google's Chief Technology Officer, Koray Kavukcuoglu, describes Project Mariner as complementary to human web use. However, it represents a significant shift in user-web interaction 3. Jaclyn Konzelmann, Google Labs Director, acknowledges this as a "fundamentally new UX paradigm shift" that could change how users and publishers interact with web content 3.
Alongside Project Mariner, Google announced several other AI agents:
Project Mariner is currently available to a small group of trusted testers in the United States. Interested users can sign up for early access through the Project Mariner website 12. Google emphasizes the importance of responsible development and is actively researching potential risks and mitigations 2.
The release of Project Mariner follows similar developments in the AI industry, such as Anthropic's Claude with Computer Use feature. However, Google's implementation includes advanced reasoning capabilities, potentially setting a new standard for agentic AI 1.
As these technologies evolve, they raise questions about the future of web design, user engagement, and the role of AI in everyday digital tasks. While promising increased efficiency, they also present challenges related to user privacy, website traffic, and the changing landscape of digital interactions 34.
Reference
[1]
[4]
Google is developing an AI agent called Project Jarvis, which could automate web tasks within Chrome, potentially transforming how users interact with the internet.
35 Sources
35 Sources
Opera unveils Browser Operator, an innovative AI agent integrated into its web browser, capable of executing complex tasks autonomously while prioritizing user privacy and efficiency.
13 Sources
13 Sources
Google inadvertently revealed its upcoming AI project, Jarvis, through a brief listing on the Chrome Web Store. This AI agent, designed to automate web tasks, represents a significant advancement in Google's AI capabilities and hints at the future of web browsing.
13 Sources
13 Sources
Google has released an experimental version of Gemini 2.0 Advanced, offering improved performance in math, coding, and reasoning. The new model is available to Gemini Advanced subscribers and represents a significant step in AI development.
11 Sources
11 Sources
Google has launched Agentspace, a new AI-powered platform for enterprises that combines Gemini AI, Google search capabilities, and company data to enhance employee productivity and information access across organizational silos.
5 Sources
5 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved