Molmo 2: Allen Institute for AI Vision Model

Allen Institute for AI Challenges Tech Giants with Molmo 2

The Allen Institute for AI has released Molmo 2, a new family of open-source AI vision models designed to analyze, track, and answer questions about video content with remarkable precision 1

. Building on the success of the original Molmo released in September 2024, this latest iteration brings advanced video understanding capabilities that rival closed systems from Google, OpenAI, and Meta. According to benchmark tests, Molmo 2 beats open-source models on short video analysis and surpasses Google's Gemini 3 on video tracking tasks 1

Source: SiliconANGLE

The Seattle-based nonprofit founded by late Microsoft co-founder Paul Allen has built a reputation for fully open-source AI development, contrasting sharply with the closed or partially open approaches of industry giants. Ali Farhadi, CEO of the Allen Institute for AI, emphasized the organization's commitment during a media briefing, stating that they're "basically building models that are competitive with the best things out there" while maintaining complete openness 1

Multimodal AI Models with Unprecedented Precision

The Molmo 2 family includes three distinct variants tailored for different use cases: Molmo 2 8B, Molmo 2 4B, and Molmo 2-O 7B 2

. The 8B and 4B models are based on Qwen 3, Alibaba's open-weights reasoning models, while the Molmo 2-O variant builds on OLMo, Ai2's own open-source model family focused on high intelligence and reasoning performance 2

What sets these models apart is their efficiency. The 8B model exceeds the original Molmo 72 billion-parameter model on key image understanding tasks, setting a new standard for performance relative to size 2

. The compact 4B variant excels at reasoning despite its small footprint, outperforming open models like Qwen 3-VL-8B while using significantly less training data.

Object Tracking in Complex Scenes and Real-World Applications

During demonstrations at Ai2's Seattle offices, researchers showcased Molmo 2's ability to handle video and multi-image understanding tasks with impressive accuracy 1

. In a soccer clip, the model identified defensive mistakes leading to a goal. When analyzing baseball footage, it recognized the Angels and Mariners, identified player #55 who scored, and explained how it determined the home team by reading uniforms and stadium branding 1

The model's tracking capabilities proved particularly robust. In one demonstration, it followed four penguins moving around a frame, maintaining consistent IDs even when they overlapped. When asked to count dancer flips, it didn't just provide a number—it returned timestamps and pixel coordinates for each flip 1

. In a racing scenario, the model understood the query "track the car that passes the #13 car in the end," watched the entire clip, then identified and tracked the correct vehicle even as cars moved in and out of frame.

Physical AI Applications and Industry Impact

Models like Molmo 2 form the foundation for Physical AI applications—systems that perceive, understand, and reason about the real world to interact meaningfully with it 2

. This capability is critical for robotics, autonomous vehicles, traffic cameras, retail item-tracking platforms, and safety monitoring systems. For machines to interact safely with their environment, they must first understand what they're observing—segmenting objects, tracking them over time, and assigning expected properties 2

The institute has seen more than 21 million downloads of its models this year and nearly 3 billion queries across its systems 1

. This year also brought $152 million in funding from the NSF and Nvidia, partnerships on AI cancer research with Seattle's Fred Hutch, and the release of OLMo 3, a text model rivaling Meta and DeepSeek 1

Open Datasets and Training Efficiency

Ai2 is releasing nine new open datasets totaling more than 9 million multimodal examples across dense video captions, video grounding, tracking, and multi-image reasoning 2

. The captioning dataset alone spans over 100,000 videos with detailed descriptions averaging more than 900 words each. This approach emphasizes quality over quantity—Molmo 2 used approximately 9 million training videos compared to Meta's PerceptronLM, which required 72.5 million 1

Unlike "open weight" models that release only the final product, Molmo 2 provides model weights, training code, and training data publicly 1

. This enables developers to trace a model's behavior back to its training data, customize it for specific uses, and avoid vendor lock-in. All models, datasets, and evaluation tools are now available on GitHub, Hugging Face, and Ai2 Playground for interactive testing 2

. The institute plans to release training code soon, further cementing its commitment to open-source AI development and advancing computer vision capabilities for the broader research community.

Allen Institute for AI releases Molmo 2, challenging Google and OpenAI with open video analysis

Allen Institute for AI Challenges Tech Giants with Molmo 2

Multimodal AI Models with Unprecedented Precision

Object Tracking in Complex Scenes and Real-World Applications

Physical AI Applications and Industry Impact

Open Datasets and Training Efficiency

References

Allen Institute for AI rivals Google, Meta and OpenAI with open-source AI vision model

Allen Institute for AI introduces Molmo 2, bringing open video understanding to AI systems - SiliconANGLE

Related Stories

Molmo: The Open-Source AI Model Challenging GPT-4 and Claude

Molmo: The Open-Source AI Model Challenging Industry Giants

AI2 Unveils MolmoAct: Open-Source AI Model Revolutionizes Robot Spatial Reasoning

Recent Highlights

OpenAI Releases GPT-5.4, New AI Model Built for Agents and Professional Work

Pentagon's Anthropic showdown exposes who controls AI guardrails in military contracts

Anthropic challenges Pentagon supply chain risk label in court over AI usage restrictions

Recent Highlights

Today's Top Stories

Microsoft Copilot Cowork brings Claude AI agents to handle multi-step workflows autonomously

OpenAI acquires Promptfoo to secure its AI agents as enterprise deployment accelerates

Age verification tech matures as governments push aggressive online safety laws for kids

Nscale raises $2 billion Series C funding, adds Sheryl Sandberg and Nick Clegg to board