Alibaba's Qwen3.7-Max AI model runs 35 hours autonomously, outperforms rivals on coding benchmarks

Reviewed byNidhi Govil

2 Sources

Share

Alibaba launched Qwen3.7-Max, a proprietary AI model capable of 35 hours of continuous autonomous execution on complex engineering tasks. The model scored 60.6 on Swaybench for long-term coding tasks, surpassing competitors like Claude Opus-4.6 and DeepSeek V4-Pro. It supports external frameworks including Anthropic's Claude Code and offers a 1-million-token context window.

Alibaba Enters the Agent Era with Qwen3.7-Max

Alibaba has released Qwen3.7-Max, a proprietary AI model designed to execute autonomous AI tasks over extended periods, marking a significant shift from the company's previous open-source approach. The AI model achieved approximately 35 hours of continuous autonomous execution during testing, demonstrating capabilities that position it as a formidable competitor in what industry observers call the "agent era in AI"

1

. This departure from open-source releases reflects Alibaba's alignment with American AI giants like OpenAI and Google, who offer their most advanced models exclusively through paid APIs and subscription plans.

Source: Geeky Gadgets

Source: Geeky Gadgets

The decision to release Qwen3.7-Max as a proprietary AI model rather than open-source represents a strategic pivot for Alibaba's Qwen Team. Training powerful AI models requires substantial investment, and the company appears focused on recouping costs through commercial offerings. However, this shift occurred following the departure of several key Qwen Team leaders earlier this year, a development that industry analysts had anticipated would influence the team's release strategy

1

.

Exceptional Performance on Long-Duration Tasks

Qwen3.7-Max has set new benchmarks in handling long-term coding tasks, achieving a remarkable score of 60.6 on Swaybench, a leading evaluation framework for sustained coding challenges. This performance surpasses competing models including Opus 4.6, Gemini 3.1, Opus 4.7, and GPT 5.5

2

. The model also secured 8th position in the World of AI benchmark suite, demonstrating adaptability across diverse domains.

In mathematical reasoning, the AI model scored 44.5 on the Apex Math Reasoning benchmark, eclipsing Claude Opus-4.6 Max's score of 34.5 and DeepSeek V4-Pro Max's 38.3

1

. These results underscore the model's capacity for complex reasoning and problem-solving across technical domains.

Engineering Task Demonstrates Autonomous Capabilities

The most striking demonstration of Qwen3.7-Max's capabilities involved an autonomous engineering task where the model optimized an attention kernel on a T-Head ZW-M890 PPU—hardware it had never encountered during training. Over 35 continuous hours, the AI model executed 1,158 distinct tool calls, performed 432 kernel evaluations, diagnosed compilation failures, and iteratively improved code to achieve a 10.0x geometric mean speedup

1

.

Chinese competitor models like z.ai's GLM-5.1 and Moonshot's Kimi K2.6 achieved only 7.3x and 5.0x speedups respectively, often terminating sessions when progress stalled. This endurance stems from what Alibaba calls "environment scaling," where the model was trained across a vast array of dynamic agentic environments to maintain coherent reasoning over extended periods

1

.

Cross-Harness Generalization and Technical Specifications

Qwen3.7-Max features cross-harness generalization, allowing it to function as a drop-in intelligence layer for diverse agent frameworks. The model supports the Anthropic API protocol natively, enabling developers to integrate it directly into existing tools like Anthropic's Claude Code or OpenClaw

1

. This flexibility positions it as a versatile cognitive engine for modern software development and enterprise automation.

Source: VentureBeat

Source: VentureBeat

The AI model offers a massive 1-million-token context window and a 64K maximum output limit, providing substantial overhead for processing sprawling codebases or lengthy technical documents

1

. These specifications enable the model to maintain context across complex, multi-stage projects without losing coherence.

Practical Applications and Pricing Structure

Qwen3.7-Max demonstrates versatility across multiple use cases, from generating functional operating system clones to creating intricate 3D simulations and game environments. The model excels in multi-agent orchestration, scientific reasoning, and multilingual support, making it valuable for developers, researchers, and businesses

2

.

Alibaba has structured pricing information to balance power and affordability: input tokens cost $2.50 per 1 million, while output tokens are priced at $7.50 per 1 million. The model is accessible through both a chat interface and an API, with free account creation offered to new users

2

. This competitive pricing positions Qwen3.7-Max as an attractive option for cost-conscious users seeking advanced AI capabilities.

Limitations and Deployment Considerations

Despite its strengths, Qwen3.7-Max has notable limitations. It is not a multimodal model and cannot process audio, image, or video inputs, restricting its application in multimedia processing projects

2

. Front-end design outputs occasionally exhibit inconsistencies in highly creative or abstract tasks.

A significant consideration for Western enterprises is that the model is only accessible from Chinese-based endpoints. This limitation may affect its appeal to American and European organizations seeking to maximize compliance and security posturing for government contracts or to meet data sovereignty regulations

1

. As AI competition intensifies globally, enterprises will need to weigh Qwen3.7-Max's technical advantages against regulatory and compliance requirements when evaluating deployment options.

Today's Top Stories

TheOutpost.ai

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

Instagram logo
LinkedIn logo
Youtube logo
© 2026 TheOutpost.AI All rights reserved