The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved
Curated by THEOUTPOST
On Fri, 27 Dec, 12:01 AM UTC
2 Sources
[1]
Alibaba Releases Another AI Model, This One Specialises in Vision
Alibaba's Qwen research team has released another open-source artificial intelligence (AI) model in preview. Dubbed QVQ-72B, it is a vision-based reasoning model that can analyse visual information from images and understand the context behind them. The tech giant has also shared benchmark scores of the AI model and highlighted that on one specific test, it was able to outperform OpenAI's o1 model. Notably, Alibaba has released several open-source AI models recently, including the QwQ-32B and Marco-o1 reasoning-focused large language models (LLMs). In a Hugging Face listing, Alibaba's Qwen team detailed the new open-source AI model. Calling it an experimental research model, the researchers highlighted that the QVQ-72B comes with enhanced visual reasoning capabilities. Interestingly, these are two separate branches of performance, that the researchers have combined in this model. Vision-based AI models are plenty. These include an image encoder and can analyse the visual information and context behind them. Similarly, reasoning-focused models such as o1 and QwQ-32B come with test-time compute scaling capabilities that allow them to increase the processing time for the model. This enables the model to break down the problem, solve it in a step-by-step manner, assess the output and correct it against a verifier. With QVQ-72B's preview model, Alibaba has combined these two functionalities. It can now analyse information from images and answer complex queries by using reasoning-focused structures. The team highlights that it has significantly improved the performance of the model. Sharing evals from internal testing, the researchers claimed that the QVQ-72B was able to score 71.4 percent in the MathVista (mini) benchmark, outperforming the o1 model (71.0). It is also said to score 70.3 percent on the Multimodal Massive Multi-task Understanding (MMMU) benchmark. Despite the improved performance, there are several limitations, as is the case with most experimental models. The Qwen team stated that the AI model occasionally mixes different languages or unexpectedly switches between them. The code-switching issue is also prominent in the model. Additionally, the model is prone to getting caught in recursive reasoning loops, affecting the final output.
[2]
Alibaba announces advanced experimental visual reasoning QVQ-72B AI model - SiliconANGLE
Alibaba announces advanced experimental visual reasoning QVQ-72B AI model Alibaba Cloud, the cloud computing arm of China Alibaba Group Ltd., unveiled QVQ-72B-Preview Wednesday, an experimental open-source artificial intelligence model capable of reviewing images and drawing conclusions. The company said early benchmarks showed the model displayed promising capabilities at visual reasoning by solving problems by thinking them through step by step similar to other reasoning models such as OpenAI's o1 and Google LLC's Gemini Flash. The new model is part of the Qwen family of models and the company said it was built on Qwen2-VL-72B an AI model capable of advanced video analysis and reasoning released earlier this year. The company said it took the already existing analysis and reasoning capabilities of VL and made a "significant leap forward in understanding and complex problem solving" for QVQ. "Imagine an AI that can look at a complex physics problem, and methodically reason its way to a solution with the confidence of a master physicist," the Qwen team said about the release. "This vision inspired us to create QVQ - an open-weight model for multimodal reasoning." Users submit an image and a prompt to the model for its analysis and the model responds with a long-winded step-by-step answer. First, it will comment on the image and identify the subjects it can see while addressing the prompt. Then it will begin reasoning through its process, essentially showing its work in a single shot. For example, a user could upload an image of four fish in an aquarium, three bright orange and one white then ask the model to count the fish. The model would start by noting that it could see the aquarium and the fish, identify each of the fish, their various colors and count them. It might even count them another time by examining the image from another perspective (to determine if there were any hidden or partially obstructed fish). "Let me try to count them," the model said in one of its passes. "There's one big orange fish in the center, and then there are others around it. To the right, there's another fish that's a bit different in color, maybe a lighter shade or almost pink. Below the central fish, there's another orange one, and to the left, there's yet another orange fish. So, from what I can see, there are four fish in total." In total, the model counted the fish three times and came to the conclusion there were four fish each time. It even counted them using distinct pairs of eyes to avoid any miscounts. Currently, the model produces analysis in one shot and does not allow users to provide follow-up questions. In order to produce a new answer about an image, a new prompt would need to be submitted with the same image. The Qwen team said the experimental preview evaluated extremely well across four datasets, including MMMU, the university-level multimodal understanding benchmark; MathVista, the mathematics-focused visual reasoning test; MathVision, another math visual reasoning test; and OlympiadBench, a bilingual science benchmark. In the MMMU, the model achieved a 70.3, nearly reaching parity with Claude 3.5 Sonnet from Anthropic PBC. In the other three benchmarks, the model closed the gap with popular closed-source models such as OpenAI's o1. Although the model is capable of sophisticated reasoning, the team commented that it is still experimental and in preview so it still has limitations. For example, it can mix or switch languages when responding to analysis requests. It also has issues with recursive responses, especially because it tends to "drill down" when being particularly verbose - and the model can be very long-winded. The company also said the model must be outfitted with stronger safety measures before widespread release. QvQ-72B-Preview has been released under the open-source Qwen license on GitHub and Hugging Face. This will allow developers and researchers to customize and build on the model for their own goals.
Share
Share
Copy Link
Alibaba's Qwen research team has released QVQ-72B, an experimental open-source AI model that combines visual analysis with advanced reasoning capabilities, potentially outperforming some closed-source competitors in specific benchmarks.
Alibaba's Qwen research team has unveiled QVQ-72B, an experimental open-source artificial intelligence model that marks a significant advancement in the field of visual reasoning 1. This innovative model combines the capabilities of vision-based AI with reasoning-focused structures, enabling it to analyze visual information from images and tackle complex queries through step-by-step problem-solving.
QVQ-72B is built upon Qwen2-VL-72B, an AI model known for advanced video analysis and reasoning 2. The new model demonstrates enhanced visual reasoning abilities, allowing it to break down problems, solve them methodically, and verify the output against a predefined standard.
In benchmark tests, QVQ-72B has shown promising results:
The model operates by accepting an image and a prompt from users. It then provides a detailed, step-by-step analysis of the visual content, demonstrating its reasoning process. For instance, when presented with an image of fish in an aquarium, QVQ-72B can identify, describe, and count the fish, even considering potential obstructions or hidden elements 2.
Despite its advanced capabilities, QVQ-72B is still in the experimental stage and faces several challenges:
Alibaba has released QVQ-72B-Preview under the open-source Qwen license on GitHub and Hugging Face 2. This move allows developers and researchers to customize and build upon the model, potentially accelerating advancements in AI visual reasoning capabilities.
The release of QVQ-72B is part of Alibaba's broader strategy in the AI sector. The company has recently launched several open-source AI models, including QwQ-32B and Marco-o1, focusing on reasoning-centric large language models (LLMs) 1. This approach positions Alibaba as a significant player in the open-source AI community, challenging established closed-source models from companies like OpenAI and Google.
As AI continues to evolve, models like QVQ-72B represent a new frontier in combining visual analysis with advanced reasoning capabilities, potentially opening up new applications across various industries and research fields.
Reference
[1]
Alibaba releases QwQ-32B-Preview, an open-source AI model that rivals OpenAI's o1 in reasoning capabilities. The model outperforms o1 on specific benchmarks and is available for commercial use.
5 Sources
5 Sources
Alibaba's Qwen Team unveils QwQ-32B, an open-source AI model matching DeepSeek R1's performance with significantly lower computational requirements, showcasing advancements in reinforcement learning for AI reasoning.
3 Sources
3 Sources
Alibaba has released Wan 2.1, a suite of open-source AI video generation models, claiming superior performance to OpenAI's Sora. The models support text-to-video and image-to-video generation in multiple languages and resolutions.
8 Sources
8 Sources
Alibaba has released a new version of its AI model, Qwen 2.5-Max, claiming it outperforms competitors like DeepSeek, ChatGPT, and Meta's Llama. This move comes amid intense competition in the AI industry, particularly from the rapidly rising Chinese startup DeepSeek.
17 Sources
17 Sources
Alibaba Group has announced a significant expansion of its artificial intelligence capabilities, including the release of over 100 new AI models and a text-to-video generation tool. This move positions Alibaba as a major player in the global AI race.
8 Sources
8 Sources