Alibaba Unveils QVQ-72B: A Groundbreaking Open-Source Vision AI Model with Advanced Reasoning Capabilities

2 Sources

Alibaba's Qwen research team has released QVQ-72B, an experimental open-source AI model that combines visual analysis with advanced reasoning capabilities, potentially outperforming some closed-source competitors in specific benchmarks.

News article

Alibaba Introduces QVQ-72B: A New Frontier in Vision AI

Alibaba's Qwen research team has unveiled QVQ-72B, an experimental open-source artificial intelligence model that marks a significant advancement in the field of visual reasoning 1. This innovative model combines the capabilities of vision-based AI with reasoning-focused structures, enabling it to analyze visual information from images and tackle complex queries through step-by-step problem-solving.

Technical Capabilities and Performance

QVQ-72B is built upon Qwen2-VL-72B, an AI model known for advanced video analysis and reasoning 2. The new model demonstrates enhanced visual reasoning abilities, allowing it to break down problems, solve them methodically, and verify the output against a predefined standard.

In benchmark tests, QVQ-72B has shown promising results:

  • Scored 71.4% on the MathVista (mini) benchmark, surpassing OpenAI's o1 model (71.0%)
  • Achieved 70.3% on the Multimodal Massive Multi-task Understanding (MMMU) benchmark
  • Performed well in MathVision and OlympiadBench, a bilingual science benchmark

Practical Applications and User Interaction

The model operates by accepting an image and a prompt from users. It then provides a detailed, step-by-step analysis of the visual content, demonstrating its reasoning process. For instance, when presented with an image of fish in an aquarium, QVQ-72B can identify, describe, and count the fish, even considering potential obstructions or hidden elements 2.

Limitations and Future Development

Despite its advanced capabilities, QVQ-72B is still in the experimental stage and faces several challenges:

  1. Language mixing and unexpected switching between languages
  2. Proneness to recursive reasoning loops
  3. Tendency for verbose responses
  4. Need for stronger safety measures before widespread release

Open-Source Availability and Implications

Alibaba has released QVQ-72B-Preview under the open-source Qwen license on GitHub and Hugging Face 2. This move allows developers and researchers to customize and build upon the model, potentially accelerating advancements in AI visual reasoning capabilities.

Alibaba's AI Strategy

The release of QVQ-72B is part of Alibaba's broader strategy in the AI sector. The company has recently launched several open-source AI models, including QwQ-32B and Marco-o1, focusing on reasoning-centric large language models (LLMs) 1. This approach positions Alibaba as a significant player in the open-source AI community, challenging established closed-source models from companies like OpenAI and Google.

As AI continues to evolve, models like QVQ-72B represent a new frontier in combining visual analysis with advanced reasoning capabilities, potentially opening up new applications across various industries and research fields.

Explore today's top stories

Google's AlphaEarth Foundations: AI-Powered 'Virtual Satellite' Revolutionizes Earth Observation

Google DeepMind introduces AlphaEarth Foundations, an AI model that acts as a 'virtual satellite' to map and analyze Earth's surface with unprecedented accuracy and efficiency, potentially transforming environmental monitoring and resource management.

Wired logoThe Verge logoAndroid Police logo

5 Sources

Technology

3 hrs ago

Google's AlphaEarth Foundations: AI-Powered 'Virtual

Google to Sign EU's AI Code of Practice, Highlighting Big Tech Divide on AI Regulation

Google announces its intention to sign the European Union's AI Code of Practice, a voluntary framework aimed at helping companies comply with the EU's AI Act. This decision contrasts with Meta's refusal, highlighting a growing divide among tech giants on AI regulation.

Ars Technica logoTechCrunch logoReuters logo

11 Sources

Policy and Regulation

11 hrs ago

Google to Sign EU's AI Code of Practice, Highlighting Big

Palo Alto Networks Acquires CyberArk for $25 Billion, Targeting AI-Driven Cybersecurity Threats

Palo Alto Networks has agreed to acquire Israeli cybersecurity firm CyberArk for $25 billion, marking a significant move in the cybersecurity industry to address emerging AI-driven threats and identity security challenges.

The Register logoReuters logoAxios logo

12 Sources

Business and Economy

11 hrs ago

Palo Alto Networks Acquires CyberArk for $25 Billion,

Meta Shifts Stance on Open-Source AI as Zuckerberg Unveils 'Personal Superintelligence' Vision

Mark Zuckerberg signals a potential shift in Meta's approach to open-source AI, citing safety concerns as the company pursues 'superintelligence'. This marks a significant change in Meta's AI strategy and its competition with rivals like OpenAI and Google DeepMind.

TechCrunch logoPC Magazine logo

2 Sources

Technology

3 hrs ago

Meta Shifts Stance on Open-Source AI as Zuckerberg Unveils

TSMC's AI Chip Dominance Propels Global Ranking and Revenue Growth

Taiwan Semiconductor Manufacturing Company (TSMC) experiences significant growth and global recognition due to the AI boom, with its CEO meeting world leaders and the company climbing Fortune's Global 500 ranking.

Fortune logoThe Motley Fool logo

2 Sources

Business and Economy

11 hrs ago

TSMC's AI Chip Dominance Propels Global Ranking and Revenue
TheOutpost.ai

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

© 2025 Triveous Technologies Private Limited
Instagram logo
LinkedIn logo