2 Sources
2 Sources
[1]
Ai2 releases open-source web agent to rival closed systems from OpenAI, Google, and Anthropic
The Allen Institute for AI is releasing an open-source web agent that can navigate and complete tasks in a browser -- letting developers look under the hood to understand what's happening in ways not possible with closed systems from OpenAI, Google, and Anthropic. The nonprofit Seattle-based institute's new agent, MolmoWeb, is built on Ai2's Molmo 2 multimodal model family. It works by interpreting screenshots of webpages the way a person would, rather than relying on underlying page code, then deciding and executing actions like clicking, typing, and scrolling to complete a task. The release Tuesday comes at a time of transition for Ai2, with CEO Ali Farhadi and key researchers departing for Microsoft, where they are joining Mustafa Suleyman's Superintelligence team. Ai2's primary funder is shifting its focus away from model training toward real-world applications of AI, though all of Ai2's programs for 2026 are fully funded. Major tech companies are racing to build AI agents capable of navigating computers and the web on behalf of users. OpenAI, Google, and Anthropic have all released their own web or computer-use agents in recent months. Anthropic recently acquired Seattle-based startup Vercept, founded by Ai2 veterans, which was building similar screen-understanding agentic technology for Macs and PCs. "In many ways, web agents today are where LLMs were before Olmo -- the community needs an open foundation to build on," Ai2 says in a blog post, referring to its open large language model project that has served as a counterpoint to closed models from OpenAI and others. MolmoWeb comes in two sizes, 4B and 8B parameters. Ai2 says the models posted strong benchmark results, with the 8B version outperforming agents built on much larger proprietary models including GPT-4o on key web navigation tasks, according to the institute. It's available through Hugging Face and GitHub, along with a demo for testing the agent on a set of supported websites. Read more in this Ai2 post.
[2]
Ai2 releases open-source visual AI agent that can take control of web browsers - SiliconANGLE
Ai2 releases open-source visual AI agent that can take control of web browsers Allen Institute for AI, a prominent Seattle-based nonprofit research organization working on advancing artificial intelligence models and systems, launched a new open-source AI agent that can take control of web browsers on a user's behalf and automate tasks. Web agents represent the next step of what is called vision-language models, which move large language models from understanding images and text through captions and answering questions to taking actions. Today, the company announced MolmoWeb, built on the Molmo 2 multimodal model family, available in two sizes: 4B and 8B parameters. It will be available for free, along with the weights, training data, and code (coming soon), as well as the evaluation tools used to build it. It is designed to be self-hosted locally or in the cloud. To take actions, AI agents must interpret instructions from humans and what can be seen. That includes a set of tasks written in conversational language and a live web page. The AI model observes the web page through a series of screenshots and then interacts directly with it via the interface by predicting what will happen when it takes actions such as clicking, typing characters into text fields, or scrolling up and down. The company said that, unlike other open-weight web agents, MolmoWeb was trained without compressing a proprietary vision-based agent. The data comes from synthetically generated text-only accessibility agents and human usage of actual web browsing activities. The agent interface supports navigating URLs, clicking on screen coordinates, typing text into fields, scrolling through pages, opening and switching browser tabs and sending a message back to the user. All of these actions work directly within the browser, with click locations represented as coordinates in pixels when executed. Ai2 said the agent was designed this way so that it won't break if the underlying webpage code or HTML changes on the fly. For example, some web pages obfuscate, or hide, how they operate under the hood in order to protect themselves. Some of them use specialized JavaScript engines in order to detect bots, stop ad blockers, display animations, track users and more. Using the underlying code can also consume tens of thousands of tokens, the essential currency of AI operations. Visual interfaces also behave much more closely to how humans interact with web interfaces: What a person sees is how they will approach the page. It means it's easier to debug why the model did what it did. In spite of the compact size, Ai2 said MolmoWeb achieves state-of-the-art results among open-weight web agents. When tested on popular evaluation suites, the 8B model scored 78.2% on WebVoyager, on 42.3% DeepShop, and 49.5% on TailBench; it outperformed leading open-weight models such as Fara-7B across all four benchmarks. The company said that MolmoWeb can also outperform agents built on GPT-4 that rely on annotated and structured page data. Which Ai2 said is a particularly important result given that those models can "see" deeply into the very code of the webpage and also have substantially larger parameter sizes (by colossal orders of magnitude -- like comparing a mouse to an elephant). More access to open-weight browser AI agents will also help researchers and hobbyists develop their own web-using automations. Closed-source large language model providers have already dipped their toes into the market with agentic web browsers capable of automating web tasks, including OpenAI Group PBC and Perplexity AI Inc., with ChatGPT Atlas and Perplexity Comet, respectively.
Share
Share
Copy Link
The Allen Institute for AI unveiled MolmoWeb, an open-source web agent built on the Molmo 2 multimodal model that navigates browsers using screenshots rather than code. Available in 4B and 8B parameter versions, the 8B model outperformed GPT-4o on key web navigation tasks, offering developers and researchers a community-driven alternative to closed systems from major tech companies.
The Allen Institute for AI (Ai2) has released MolmoWeb, an open-source web agent designed to rival closed systems from OpenAI, Google, and Anthropic
1
. This visual AI agent can take control of web browsers and automate tasks by interpreting screenshots of webpages the way a person would, rather than relying on underlying page code2
. Built on Ai2's Molmo 2 multimodal model family, MolmoWeb arrives at a critical moment when major tech companies are racing to build AI agents capable of navigating computers and the web on behalf of users1
.
Source: GeekWire
The Seattle-based nonprofit institute released MolmoWeb in two sizes: 4B and 8B parameters, making it available for free along with weights, training data, code, and evaluation tools
2
. Developers can access the agent through Hugging Face and GitHub, along with a demo for testing on supported websites1
. This community-driven alternative lets researchers and developers look under the hood to understand what's happening in ways not possible with closed systems1
.MolmoWeb operates by observing web pages through a series of screenshots and then interacting directly via the interface by predicting what will happen when it takes actions such as clicking, typing characters into text fields, or scrolling up and down
2
. The web agent supports navigating URLs, clicking on screen coordinates, typing text into fields, scrolling through pages, opening and switching browser tabs, and sending messages back to users2
.This visual approach offers distinct advantages. Ai2 designed the agent this way so it won't break if the underlying webpage code or HTML changes on the fly, as some web pages obfuscate how they operate to protect themselves or use specialized JavaScript engines to detect bots and stop ad blockers
2
. Using underlying code can also consume tens of thousands of tokens, the essential currency of AI operations2
. Visual interfaces behave more closely to how humans interact with web interfaces, making it easier to debug why the model did what it did2
.Despite its compact size, MolmoWeb achieves state-of-the-art results among open-weight web agents. When tested on popular evaluation suites, the 8B model scored 78.2% on WebVoyager, 42.3% on DeepShop, and 49.5% on TailBench, outperforming leading open-weight models such as Fara-7B across all four benchmarks
2
. Ai2 reports that the 8B version outperformed agents built on much larger proprietary models including GPT-4o on key web navigation tasks1
.Ai2 highlighted that MolmoWeb can outperform agents built on GPT-4 that rely on annotated and structured page data, a particularly important result given that those models can "see" deeply into the very code of the webpage and have substantially larger parameter sizes
2
. Unlike other open-weight web agents, MolmoWeb was trained without compressing a proprietary vision-based agent, with data coming from synthetically generated text-only accessibility agents and human usage of actual web browsing activities2
.Related Stories
"In many ways, web agents today are where LLMs were before Olmo -- the community needs an open foundation to build on," Ai2 stated in a blog post, referring to its open large language model project that has served as a counterpoint to closed models from OpenAI and others
1
. This release provides more access to open-weight browser AI agents, helping researchers and hobbyists develop their own web automations2
.The release comes during a transition period for Ai2, with CEO Ali Farhadi and key researchers departing for Microsoft, where they are joining Mustafa Suleyman's Superintelligence team
1
. Ai2's primary funder is shifting its focus away from model training toward real-world applications of AI, though all of Ai2's programs for 2026 are fully funded1
. Anthropic recently acquired Seattle-based startup Vercept, founded by Ai2 veterans, which was building similar screen-understanding agentic technology for Macs and PCs1
. Closed-source providers including Perplexity AI have already entered the market with agentic web browsers capable of automating web tasks2
.Summarized by
Navi
[1]
1
Technology

2
Technology

3
Technology
