2 Sources
[1]
Alibaba's proprietary Qwen3.7-Max can run for 35 hours autonomously and supports external harnesses like Anthropic's Claude Code
The AI industry has fully entered the "agent era," a paradigm where AI models do far more than generate text -- they now actively plan, execute, and course-correct complex tasks over days rather than seconds. Thus, it's perhaps unsurprising to see Chinese e-commerce giant Alibaba's famed Qwen Team of AI researchers release a model capable of performing autonomous agentic AI work over multiple days: that model has arrived in the form of Qwen3.7-Max which the company reports in a blog post achieved "~35 hours of continuous autonomous execution" -- albeit, in a proprietary, not open source format, as prior Qwen Team releases were. This is also to be expected -- it's what many analysts and industry experts feared in the wake of the departure of several key Qwen Team leaders earlier this year. But it makes sense for Alibaba financially, at least in the short term: training AI models, especially ones as powerful as Qwen3.7-Max, is expensive, and giving them away essentially for free, as open source models are, does not immediately help recoup any costs. In that sense, Alibaba is simply aligning its efforts with American AI giants like OpenAI and Google by offering the latest and greatest models only through paid APIs and subscription or paid web plan bundles, and slightly less performant ones through open source. Still, the arrival of Qwen3.7-Max offers further optionality to enterprises and individual users, and more competition for American AI labs -- rarely a bad thing for consumers at all budget levels. Yet, the fact that the model is only accessible from Chinese-based endpoints means it may be limited in its appeal to American and European enterprises seeking to maximize compliance and security posturing when fulfilling government contracts, or even just attempting to comply with all relevant state, local, and national data sovereignty regulations. The marathon AI era To understand why Qwen3.7-Max is a departure from previous models, one must look at how it was trained and how it operates in practice. Language models typically degrade when forced to maintain a single train of thought over thousands of conversational turns; they forget instructions, hallucinate variables, or simply get stuck in logical loops. Qwen3.7-Max was specifically designed as a "versatile agent foundation" capable of "long-horizon reasoning" to overcome this exact bottleneck. The starkest demonstration of this capability is an autonomous engineering task detailed by the Qwen team. The model was given access to an isolated server equipped with a T-Head ZW-M890 PPU -- a hardware architecture the model had never encountered during its training. Its task was to optimize an attention kernel. Over the course of 35 straight hours, Qwen3.7-Max operated entirely autonomously. It executed 1,158 distinct tool calls, performed 432 kernel evaluations, diagnosed compilation failures, and iteratively improved the code to achieve a 10.0x geometric mean speedup. By comparison, Chinese competitor models like z.ai's GLM-5.1 and Moonshot's Kimi K2.6 capped out at 7.3x and 5.0x speedups respectively, often voluntarily terminating their sessions when they failed to make progress. However, both are available open source. This endurance is achieved through what Alibaba calls "environment scaling". Just as early LLMs grew smarter by ingesting more diverse text, Qwen3.7-Max was trained across a vast, scaled array of dynamic agentic environments. It is capable of simulating a one-year lifecycle of a startup in the "YC-Bench" evaluation, navigating hundreds of decision-making rounds encompassing personnel management and contract screening. In this simulation, the model managed to generate $2.08 million in virtual revenue, nearly doubling the performance of the prior generation, Qwen3.6-Plus. Furthermore, the model has built-in reward-hacking self-monitoring, autonomously detecting when it attempts to cheat a training environment and adding heuristic rules to correct its own behavior. A brain for any scaffold From a product perspective, Qwen3.7-Max is designed to be the cognitive engine for modern software development and enterprise automation. The model offers a massive 1-million-token context window and a 64K maximum output limit, providing immense overhead for processing sprawling codebases or lengthy technical documents. One of its most compelling features is "cross-harness generalization". Rather than being hardcoded to work best within a specific proprietary interface, Qwen3.7-Max is built to act as a drop-in intelligence layer for diverse agent frameworks. It supports the Anthropic API protocol natively, allowing developers to plug it directly into existing tools like Claude Code or OpenClaw. The benchmark data provided by Alibaba indicates that this generalized approach has paid massive dividends. On the Apex Math Reasoning benchmark, Qwen3.7-Max scored 44.5, eclipsing Claude Opus-4.6 Max's score of 34.5 and DeepSeek V4-Pro Max's 38.3. It also posted dominant scores on Humanity's Last Exam (41.4) and the realistic coding agent benchmark MCP-Atlas (76.4). This translates into tangible utility for end-users. Through open source Model Context Protocol (MCP) integrations, the model can operate as an autonomous office assistant, capable of reading university formatting specs and automatically reformatting a messy Word document via command-line tools without human intervention. Running this level of intelligence comes at a distinct cost. Developers accessing the API via Alibaba Cloud Model Studio will pay $2.50 per 1 million input tokens and $7.50 per 1 million output tokens. The platform also features explicit cache creation and read pricing, as well as a $10 fee per 1,000 calls for integrated web searches, though code interpreter tools remain free for a limited time. Qwen3.7-Max occupies a strategic middle ground in the current API economy. While it demands a notable premium over aggressively priced domestic rivals -- costing nearly double DeepSeek V4 Pro ($5.22) and Z.ai's GLM-5.1 ($5.80) -- it drastically undercuts the Western frontier giants it routinely matches on benchmarks. For context, running heavy agentic workflows through OpenAI's GPT-5.4 or Anthropic's Claude Opus 4.7 will run developers $17.50 and $30.00 per million tokens, respectively. See VentureBeat's pricing chart below: By positioning Qwen3.7-Max just below Google's Gemini 3.5 Flash ($10.50) but well above budget-tier models, Alibaba is signaling that this isn't a commodity release; it's a flagship reasoning engine priced to lure enterprise workloads away from Silicon Valley's most expensive offerings. Licensing remains proprietary for now For all its technical brilliance, the most controversial aspect of Qwen3.7-Max is how it is distributed. Qwen is billing the release as a "proprietary model". It is strictly API-only. Historically, Alibaba's Qwen has been a hero to the open-source and local LLM communities. Previous iterations, like Qwen 2.5 and Qwen 3.6, released their weights publicly. Open weights allow developers, researchers, and enterprises to download the model, run it on their own hardware, and fine-tune it for highly specific or data-sensitive use cases without sending proprietary information to a third-party server. By locking Qwen3.7-Max behind an API, Alibaba is pivoting to the standard commercial playbook utilized by OpenAI (with GPT-4) and Anthropic (with Claude). For enterprise users, this means utilizing Qwen3.7-Max requires trusting Alibaba Cloud with their data streams and relying entirely on internet connectivity to run their agentic workflows. For the open-source community, it means losing access to what is currently one of the most capable models on the planet. Community reactions split between awe and disappointment The reaction from the developer community has been swift, characterized by a mix of profound respect for the engineering achievement and frustration over the licensing model. Prominent AI commentator Sudo su (@sudoingX) captured the prevailing sentiment on X (formerly Twitter). "qwen is unreal," they wrote. "they just dropped 3.7 max and it is beating opus 4.6 max on most of the benchmarks they ran". The technical metrics, particularly the model's endurance, have left many in the field stunned. "the apex math number, 44.5 against opus 34.5, that is not a small gap," Sudo su noted. "the 35 hours straight on a kernel optimization task with 1000+ tool calls is the part i keep rereading. that is the agent era thing actually happening, not a slide". The speed of Alibaba's iteration is also drawing notice. With Qwen 3.6 released just last month, the leap to 3.7-Max highlights a relentless development cadence. As Sudo su observed, "nobody else is moving like this". Yet, the praise is heavily caveated by the shift to a closed ecosystem. The loss of the model weights is seen as a blow to the localized AI movement, which relies on state-of-the-art open models to push the boundaries of what can be done on consumer hardware or private enterprise clusters. "one thing though, please open source this one too," Sudo su pleaded in their post. "3.6 dense made the entire local llm ecosystem better. the max tier going api only would close a door we have been keeping open. give us the weights eventually". Qwen3.7-Max proves that the autonomous agent era is no longer a theoretical projection; it is a present reality capable of executing complex engineering feats while humans sleep. The only question now is whether this new frontier of AI will be a democratized resource you can download to your laptop, or an intelligence utility rented strictly from the cloud. For now, with Qwen3.7-Max, it is undeniably the latter.
[2]
Why Alibaba's New Qwen 3.7 Max Just Dethroned the Top AI Models
Alibaba's latest AI model, Qwen 3.7 Max, has emerged as a standout performer in the competitive AI landscape, surpassing benchmarks set by models like Opus 4.6 and Gemini 3.1. With a remarkable score of 60.6 on Swaybench, a leading evaluation for long-term coding tasks, Qwen 3.7 Max demonstrates its capacity for handling complex, sustained challenges with precision. World of AI explores how this model combines advanced coding, debugging and workflow automation to meet the diverse needs of developers, researchers and businesses. Dive into this overview to uncover how Qwen 3.7 Max excels in areas like multi-agent orchestration, scientific reasoning and multilingual support. You'll also gain insight into its practical applications, from generating functional operating system clones to creating intricate 3D simulations and game environments. By the end, you'll have a clear understanding of how this model's capabilities can be applied across industries, as well as its limitations in multimedia tasks. Qwen 3.7 Max has set a new standard in AI performance, consistently outperforming its competitors in rigorous industry benchmarks. It achieved an impressive score of 60.6 on Swaybench, a widely recognized evaluation for long-horizon coding tasks, surpassing rivals such as Opus 4.7 and GPT 5.5. Furthermore, it secured the 8th position in the World of AI benchmark suite, demonstrating its adaptability across diverse domains and its ability to handle complex, sustained tasks with remarkable accuracy and coherence. These achievements highlight its potential to redefine expectations for AI-driven solutions. Qwen 3.7 Max offers a comprehensive suite of capabilities designed to address a wide range of technical and operational challenges. Its features make it a valuable asset for professionals across various industries: These features position Qwen 3.7 Max as a reliable and efficient solution for tackling intricate tasks, from software development to data analysis. Find more information on Qwen by browsing our extensive range of articles, guides and tutorials. The versatility of Qwen 3.7 Max is evident in its real-world applications, which span a variety of industries and use cases. Its ability to deliver high-quality outputs in complex scenarios underscores its practical value: These examples illustrate how Qwen 3.7 Max can be leveraged to drive innovation and efficiency in diverse fields, from software engineering to creative industries. Despite its impressive capabilities, Qwen 3.7 Max has certain limitations that users should consider. It is not a multimodal model, meaning it cannot process audio, image, or video inputs. This restricts its application in multimedia projects and tasks requiring cross-modal understanding. Additionally, while its front-end design outputs are generally strong, they occasionally exhibit inconsistencies in highly creative or abstract tasks. These constraints may limit its effectiveness in certain specialized use cases, particularly those requiring advanced multimedia processing or highly imaginative outputs. Qwen 3.7 Max is designed to balance power and affordability, making it accessible to a wide range of users. Input tokens are priced at $2.50 per 1 million, while output tokens cost $7.50 per 1 million. The model is available through both a chat interface and an API, with free account creation offered to new users. This pricing structure ensures that businesses and individuals can use its advanced capabilities without incurring prohibitive costs. By combining competitive pricing with robust performance, Qwen 3.7 Max appeals to cost-conscious users seeking high-quality AI solutions. The strengths of Qwen 3.7 Max lie in its ability to execute long-horizon tasks with precision and efficiency. Its accuracy in following detailed prompts and instructions makes it a dependable choice for complex projects requiring sustained focus and coherence. Additionally, its cost-effectiveness and versatility position it as a formidable competitor to offerings from industry leaders such as OpenAI and Google. These attributes solidify its reputation as a reliable and innovative tool for developers, researchers and businesses aiming to stay ahead in the rapidly evolving AI landscape. Disclosure: Some of our articles include affiliate links. If you buy something through one of these links, Geeky Gadgets may earn an affiliate commission. Learn about our Disclosure Policy.
Share
Copy Link
Alibaba launched Qwen3.7-Max, a proprietary AI model capable of 35 hours of continuous autonomous execution on complex engineering tasks. The model scored 60.6 on Swaybench for long-term coding tasks, surpassing competitors like Claude Opus-4.6 and DeepSeek V4-Pro. It supports external frameworks including Anthropic's Claude Code and offers a 1-million-token context window.
Alibaba has released Qwen3.7-Max, a proprietary AI model designed to execute autonomous AI tasks over extended periods, marking a significant shift from the company's previous open-source approach. The AI model achieved approximately 35 hours of continuous autonomous execution during testing, demonstrating capabilities that position it as a formidable competitor in what industry observers call the "agent era in AI"
1
. This departure from open-source releases reflects Alibaba's alignment with American AI giants like OpenAI and Google, who offer their most advanced models exclusively through paid APIs and subscription plans.
Source: Geeky Gadgets
The decision to release Qwen3.7-Max as a proprietary AI model rather than open-source represents a strategic pivot for Alibaba's Qwen Team. Training powerful AI models requires substantial investment, and the company appears focused on recouping costs through commercial offerings. However, this shift occurred following the departure of several key Qwen Team leaders earlier this year, a development that industry analysts had anticipated would influence the team's release strategy
1
.Qwen3.7-Max has set new benchmarks in handling long-term coding tasks, achieving a remarkable score of 60.6 on Swaybench, a leading evaluation framework for sustained coding challenges. This performance surpasses competing models including Opus 4.6, Gemini 3.1, Opus 4.7, and GPT 5.5
2
. The model also secured 8th position in the World of AI benchmark suite, demonstrating adaptability across diverse domains.In mathematical reasoning, the AI model scored 44.5 on the Apex Math Reasoning benchmark, eclipsing Claude Opus-4.6 Max's score of 34.5 and DeepSeek V4-Pro Max's 38.3
1
. These results underscore the model's capacity for complex reasoning and problem-solving across technical domains.The most striking demonstration of Qwen3.7-Max's capabilities involved an autonomous engineering task where the model optimized an attention kernel on a T-Head ZW-M890 PPU—hardware it had never encountered during training. Over 35 continuous hours, the AI model executed 1,158 distinct tool calls, performed 432 kernel evaluations, diagnosed compilation failures, and iteratively improved code to achieve a 10.0x geometric mean speedup
1
.Chinese competitor models like z.ai's GLM-5.1 and Moonshot's Kimi K2.6 achieved only 7.3x and 5.0x speedups respectively, often terminating sessions when progress stalled. This endurance stems from what Alibaba calls "environment scaling," where the model was trained across a vast array of dynamic agentic environments to maintain coherent reasoning over extended periods
1
.Qwen3.7-Max features cross-harness generalization, allowing it to function as a drop-in intelligence layer for diverse agent frameworks. The model supports the Anthropic API protocol natively, enabling developers to integrate it directly into existing tools like Anthropic's Claude Code or OpenClaw
1
. This flexibility positions it as a versatile cognitive engine for modern software development and enterprise automation.
Source: VentureBeat
The AI model offers a massive 1-million-token context window and a 64K maximum output limit, providing substantial overhead for processing sprawling codebases or lengthy technical documents
1
. These specifications enable the model to maintain context across complex, multi-stage projects without losing coherence.Related Stories
Qwen3.7-Max demonstrates versatility across multiple use cases, from generating functional operating system clones to creating intricate 3D simulations and game environments. The model excels in multi-agent orchestration, scientific reasoning, and multilingual support, making it valuable for developers, researchers, and businesses
2
.Alibaba has structured pricing information to balance power and affordability: input tokens cost $2.50 per 1 million, while output tokens are priced at $7.50 per 1 million. The model is accessible through both a chat interface and an API, with free account creation offered to new users
2
. This competitive pricing positions Qwen3.7-Max as an attractive option for cost-conscious users seeking advanced AI capabilities.Despite its strengths, Qwen3.7-Max has notable limitations. It is not a multimodal model and cannot process audio, image, or video inputs, restricting its application in multimedia processing projects
2
. Front-end design outputs occasionally exhibit inconsistencies in highly creative or abstract tasks.A significant consideration for Western enterprises is that the model is only accessible from Chinese-based endpoints. This limitation may affect its appeal to American and European organizations seeking to maximize compliance and security posturing for government contracts or to meet data sovereignty regulations
1
. As AI competition intensifies globally, enterprises will need to weigh Qwen3.7-Max's technical advantages against regulatory and compliance requirements when evaluating deployment options.Summarized by
Navi
[1]
[2]
23 Jul 2025•Technology

16 Feb 2026•Technology

29 Apr 2025•Technology

1
Technology

2
Science and Research

3
Science and Research
