2 Sources
[1]
MiniMax M3 debuts, eclipsing GPT-5.5 and Gemini 3.1 Pro on key benchmark performance for just 5-10% of the cost
Big news in enterprise AI broke over the weekend as Chinese AI startup MiniMax released its highly anticipated M3 large language model on Sunday evening Eastern time, pairing frontier-tier coding and agentic performance with a 1-million-token context window and native multimodality for a fraction of the cost of leading proprietary models, with pricing starting at just $20 per month under its new subscription token plans. The company's leadership also announced plans to deliver the model under an open source license including "open weights," allowing for full enterprise downloading and customizability free-of-charge, coming sometime in the next 10 days. For now, it is available via the MiniMax API at a special discounted price of $0.3 per 1 million input tokens and $1.20 per million output tokens (on fresh cache) for the next week -- beating proprietary U.S. giants like Google, OpenAI and Anthropic handily on cost, while also eclipsing the performance of the latest models from the former two on selected benchmarks. Even at its full price of $0.6/$2.40 per million input/output tokens, MiniMax-M3 remains at just 8-20% the cost of the leading, proprietary U.S. models. The traditional matrix governing large language model development has long dictated a rigid choice: software developers can either access top-tier closed-source intelligence behind restrictive APIs, or deploy nimble, cost-effective open models that falter on multi-step reasoning, dense coding tasks, and massive data sequences. MiniMax-M3 fundamentally upends this paradigm. By unifying these three historically separated frontier capabilities, M3 introduces a level of comprehensive utility previously restricted to expensive, closed-source ecosystems, effectively shifting the baseline of open-weights systems while drastically minimizing the operational compute footprint required to execute complex development loops. VentureBeat Frontier AI Model API Pricing Snapshot New MiniMax Sparse Attention (MSA) technique helps keep the model's cost low At the core of the model's efficiency lies an architectural departure from classic Transformer networks. Standard attention mechanisms scale quadratically, meaning computational and financial costs explode as text inputs lengthen. To combat this "inherent flaw," the engineering team implements MiniMax Sparse Attention (MSA), a clean, extensible sparse attention blueprint. To visualize this innovation, think of traditional full attention as an editor reading an entire library from scratch every time they need to verify a single sentence. MSA acts as an intelligent indexing clerk, using a pre-filtering phase to partition Key-Value (KV) matrices into highly precise blocks. At the operator level, MSA uses a "KV outer gather Q" approach. The system treats KV blocks as an outer loop, dynamically aggregating only the specific queries that hit them. Because each data block is read exactly once and memory access remains strictly contiguous, hardware utilization skyrockets. In internal trials, MSA runs more than 4x faster than alternative open-source solutions like Flash-Sparse-Attention or flash-moba. When managing a maxed-out context length of 1 million tokens, M3's per-token compute demand drops to just 1/20th of the previous generation model, translating into a 9x acceleration in the prefilling stage and a 15x boost during decoding. Rather than taking a pretrained text network and fusing it with a separate vision model, MiniMax engineered M3 as a natively multimodal system from "Step Zero". The company overhauled its data ingest machinery to blend naturally interleaved sequences of text, images, and visual components, scaling the total pretraining corpus beyond 100 trillion tokens. This deep data alignment enables the model to translate complex visual geometries, such as programming charts or coordinate maps, into structural code without losing contextual fidelity. On standardized assessments, M3 validates this engineering path. The model records a 59.0% on SWE-Bench Pro, an autonomous agent metric, positioning it ahead of closed models like GPT-5.5 and Gemini 3.1 Pro. It achieves a 66.0% on Terminal Bench 2.1, a 74.2% on MCP Atlas, and an 83.5 on BrowseComp -- outstripping Claude Opus 4.7's benchmark score of 79.3 in autonomous browsing and information retrieval. However, when contrasted with Anthropic's newly released, premium frontier model, Claude Opus 4.8, from last week, the competitive ceiling of M3's efficient sparse-attention footprint becomes evident across directly comparable, tool-intensive agent benchmarks. In the domain of pure code modification on SWE-Bench Pro, M3's 59.0% score drops behind Opus 4.8's leading 69.2% threshold. A similar performance delta manifests in automated system environments via Terminal-Bench 2.1; while M3's 66.0% terminal execution score effectively runs neck-and-neck with the previous-generation Opus 4.7 baseline of 66.1%, it trails the upgraded Opus 4.8 architecture, which achieves 74.6%. Furthermore, evaluations tracking continuous GUI interaction on the OSWorld-Verified sandbox place M3's automated computer use at 70.0%, compared to a higher 83.4% validation rate secured by Opus 4.8. These standardized evaluations illustrate the structural trade-offs currently defining the ecosystem: closed-source systems like Opus 4.8 maintain absolute margin leads on hyper-complex reasoning vectors, yet M3 delivers a highly capable baseline of local, tier-one automated operation without the compounding premium of closed-door API subscription fees. When positioned alongside the heavy-duty inference metrics of the newly minted, fellow open weights model DeepSeek-V4 Pro Max, M3 holds its ground across core agentic categories while asserting narrow advantages in specialized code synthesis. On the software engineering matrix of SWE-Bench Pro, M3's 59.0% resolution efficiency edges past DeepSeek-V4 Pro Max's score of 55.4%. However, the competitive friction tightens in command-line environments; under Terminal Bench evaluations, DeepSeek-V4 Pro Max pulls slightly ahead with a 67.9% execution accuracy over M3's 66.0% mark. In web orchestration and open-world browsing simulations, the two architectures reach a virtual statistical parity, with M3 registering an 83.5% on BrowseComp compared to DeepSeek's 83.4%. Similarly, on the MCP Atlas tool-use framework, M3 secures a narrow lead at 74.2% against DeepSeek's 73.6%. This close alignment demonstrates that while DeepSeek handles a massive 1.6-trillion total parameter footprint with specialized high-effort reasoning modes, MiniMax's block-filtered sparse attention mechanism yields directly competitive execution efficiencies without requiring extensive parameter activation scaling. MiniMax Code AI agent offers Agentic Team capabilities MiniMax translates these architectural gains into immediate utility through an updated product suite divided between standalone applications, customizable subscription tiers, and raw developer infrastructure. For end-user orchestration, the flagship implementation is MiniMax Code, an AI agent product designed to maximize M3's multi-step capabilities. Operating via web or native desktop apps, MiniMax Code runs an "Agent Team" capable of breaking massive engineering tasks into multi-stage, concurrent workflows. The system relies on a "Producer + Verifier" adversarial harness loop. As one agent instance generates code, a secondary verifier instance aggressively tests and reflects upon execution outputs, allowing the network to self-correct and operate autonomously for days without human oversight. Because of its native visual grounding, MiniMax Code supports direct computer use. A developer can issue a cross-application voice prompt via their phone to have the model open a localized enterprise ERP client and batch-populate data tables directly from an open Excel spreadsheet. For custom setups, developers can pipeline M3 directly into existing workflows using an API key () compatible with common alternative IDE environments like Claude Code, Cursor, Roo Code, and Cline. The API introduces a toggleable "thinking mode". When enabled, M3 routes processing power into deep reasoning and long-horizon planning; when disabled, the model runs at minimal latency for quick text completion. The companion Token Plan models an aggressive pricing strategy structured around shared multimodal quotas. Billed annually, three options are available: * Plus ($20/month): Supplies ~1.7B tokens per month and handles 3-4 concurrent agents. * Max ($50/month): Supplies ~5.1B tokens per month, manages 4-5 concurrent agents, and adds 3 automated video clips per day via Hailuo 2.3. * Ultra ($120/month): Supplies ~9.8B tokens per month, facilitates 6-7 concurrent agents, and extends video capacity to 5 daily clips. Open weights makes M3 much more attractive for enterprise use MinMax's pledge to release M3 under an open-weights license model -- with weights and technical documentation launching on HuggingFace and GitHub within 10 days -- carries significant strategic weight for enterprise infrastructure managers. However, it is still to be determined precisely which license the weights will be available under, and whether or not it will be permissible for consumer usage, e.g. MIT, Apache 2.0 or the new OpenMDW license. If so, the calculus looks like this: By shipping the underlying model weights directly to the community, MiniMax departs from the closed-door approach favored by major American AI labs. For enterprise users bound by strict compliance and privacy rules, open weights mean they can run M3 locally on internal hardware. This setup completely removes the risk of data leakage associated with public APIs. Furthermore, it permits engineering teams to run bespoke fine-tuning passes, modify internal architectures, or embed specialized system prompts deep within the model layers -- transforming an off-the-shelf system into a highly targeted proprietary asset. Initial community reactions are resoundingly positive The developer ecosystem reacted immediately to M3's operational benchmarks, singling out its long-horizon autonomous behavior and cost-to-performance profile. A major focal point of discussion is a 12-hour automated verification test where M3 was tasked with reproducing an ICLR 2025 Outstanding Paper Award winner, titled "Learning Dynamics of LLM Finetuning". As MiniMax's own researcher @MikaStars39 highlighted on X: "M3 ran autonomously for nearly 12 hours, producing 18 commits and 23 experimental figures on its own, and got the core experiments working: * it matched the predicted probability trends in the SFT stage * clearly observed the squeezing effect central to the DPO experiments * validated the Extend mitigation method proposed in the original paper." Simultaneously, creators of developer tools highlighted the practical economic advantages of the model's new attention mechanism. The official team behind the agentic AI coding harness Cline posted an alert confirming day-one compatibility, stating: "The new MiniMax-M3 is their first model to have 1m context, multimodal, and agentic coding capability. Congratulations to @MiniMax_AI for the breakthrough in sparse-attention architecture cutting compute & cost to 1/20th their previous generation." This sharp drop in execution costs shifts how developers view the relationship between financial investment and capability. Tech commentator @jumperz mapped out this disruption, noting how M3 breaks a historical pattern in machine learning pricing: By addressing context scaling limitations through fundamental attention-level optimizations rather than brute-force hardware scaling, MiniMax has established a highly efficient open-source baseline. M3 demonstrates that the next phase of agent development will not just be driven by larger datasets, but by efficient architectural choices that make frontier-level performance accessible to the broader open-source community. For enterprises building autonomous software development or agent infrastructure, MiniMax M3 provides the ultimate "bang for the buck." While DeepSeek-V4 Pro holds a microscopic price advantage of $0.195 per million tokens, MiniMax M3 justifies its marginal premium by delivering superior autonomous software engineering resolution rates (59.0% SWE-Bench Pro). More importantly, because M3 is an open-weights model, the calculation extends far beyond the API chart. By deploying M3's weights locally inside private enterprise clouds, organizations completely bypass cloud data egress tracking, eliminate structural vendor lock-in, and can implement custom prefix-caching models on internal hardware. This technical approach transforms a highly efficient runtime budget into a permanent, privately owned corporate asset.
[2]
Open Source MiniMax M3 Outperforms Opus 4.7 for a Fraction of the Cost
The MiniMax M3 is making waves in the AI community as an open source model that combines advanced capabilities with affordability. Highlighted by World of AI, this model stands out for its ability to handle both text and visual data through multimodal reasoning, allowing applications like image captioning and multimedia generation. Its support for a 1 million token context window further enhances its utility in tasks requiring deep contextual understanding, such as document analysis and extended conversations. By integrating sparse attention and MSA architecture, MiniMax M3 achieves a balance between computational efficiency and scalability, making it suitable for both high-performance and cost-conscious environments. Explore how MiniMax M3 outperforms proprietary models like Opus 4.7 in key benchmarks, including Swaybench Pro and SVG Bench, while maintaining a fraction of their cost. Gain insight into its real-world applications, from front-end development and 3D simulations to CUDA kernel optimization. This guide will also cover its token-based pricing model and open source ecosystem, which make it accessible to a wide range of users. Whether you're a developer, researcher, or creative professional, this breakdown will help you understand how MiniMax M3 can elevate your AI-driven workflows. MiniMax M3 MiniMax M3 integrates innovative technologies to deliver exceptional results across diverse tasks. Its standout features include: * Multimodal Reasoning: Processes and analyzes both text and visual data, allowing precise handling of complex inputs. This capability allows seamless integration of textual and visual information, making it ideal for applications like image captioning, visual question answering and multimedia content generation. * Long-Context Processing: Supports up to a 1 million token context window, making it ideal for tasks like document analysis, extended conversations and large-scale data summarization. This feature ensures that the model can handle intricate workflows requiring deep contextual understanding. * Sparse Attention and MSA Architecture: Enhances computational efficiency and scalability, making sure the model performs well even in large-scale deployments. This architecture optimizes resource usage, making it suitable for both high-performance computing environments and cost-conscious applications. These features make MiniMax M3 a versatile tool for tackling intricate challenges in AI-driven workflows, offering both flexibility and precision. Performance Benchmarks MiniMax M3 consistently outperforms leading proprietary models in rigorous testing scenarios, demonstrating its reliability and efficiency. Its performance is validated through benchmarks such as: * Swaybench Pro: Highlights advanced reasoning and problem-solving capabilities, showcasing the model's ability to handle complex logical tasks. * SVG Bench: Excels in creating high-quality vector graphics and animations, proving its utility in design and creative industries. * Kernel Bench Hard: Delivers optimal results in CUDA kernel optimization tasks, making it a valuable asset for developers working on high-performance computing projects. Beyond benchmarks, the model exhibits strong coding skills, autonomous task decomposition and multi-step reasoning. These attributes make it a dependable choice for complex workflows, ranging from software development to data analysis. Gain further expertise in MiniMax by checking out these recommendations. Applications Across Industries MiniMax M3's versatility enables its use in a variety of industries and projects. Key applications include: * Front-End Development: Automates the creation of dynamic user interfaces and visually appealing designs, reducing development time while maintaining high-quality output. * 3D Development: Powers the development of interactive simulations and immersive web experiences, making it a valuable tool for gaming, virtual reality and architectural visualization. * SVG Generation: Produces intricate vector graphics and animations for diverse use cases, from marketing materials to technical illustrations. These capabilities make MiniMax M3 an indispensable tool for developers and designers working on innovative and high-impact projects, allowing them to push the boundaries of creativity and functionality. Affordability and Accessibility One of MiniMax M3's most compelling advantages is its affordability. Compared to proprietary models, it offers: * Token-Based Pricing: Delivers extensive usage at a fraction of the cost, making it accessible to developers, researchers and small businesses. This pricing model ensures that high-quality AI tools are no longer limited to organizations with large budgets. * Open source Community: Encourages collaboration and continuous improvement, making sure the model evolves and remains robust over time. The open source nature fosters innovation, as developers worldwide contribute to its growth and refinement. The model is available through multiple platforms, including APIs, the M Code platform and Open Router, allowing seamless integration into existing workflows. This accessibility ensures that users can adopt MiniMax M3 without significant infrastructure changes, making it a practical choice for a wide range of applications. Real-World Impact MiniMax M3 has proven its reliability in real-world scenarios, particularly in long workflows where precision is critical. By reducing hallucinations and improving task accuracy, it has become a dependable choice for applications requiring high levels of trust and performance. Industries such as healthcare, finance and education have already begun using its capabilities to enhance decision-making, streamline operations and improve outcomes. Its open source nature underscores the growing potential of community-driven AI models to rival, and even surpass, proprietary solutions. The collaborative ecosystem surrounding MiniMax M3 ensures that it remains at the forefront of AI innovation, adapting to emerging challenges and opportunities. Shaping the Future of AI The MiniMax M3 represents a significant step forward in the evolution of open source AI. By combining advanced features, exceptional performance and cost efficiency, it sets a new standard for AI development. Whether you're optimizing code, designing 3D simulations, or generating complex SVG graphics, MiniMax M3 offers a reliable, accessible and affordable solution. Its open source foundation ensures continuous evolution, driven by a global community of developers and researchers, making it a cornerstone of the future of AI innovation. Media Credit: WorldofAI Disclosure: Some of our articles include affiliate links. If you buy something through one of these links, Geeky Gadgets may earn an affiliate commission. Learn about our Disclosure Policy.
Share
Copy Link
Chinese AI startup MiniMax launched its M3 large language model, delivering frontier-tier performance that eclipses GPT-5.5 and Gemini 3.1 Pro on key benchmarks while costing just 5-10% as much. The model features a 1-million-token context window, native multimodality, and will be released as open-source with full weights within 10 days, fundamentally challenging the cost-performance trade-off in enterprise AI.
Chinese AI startup MiniMax released its highly anticipated M3 large language model over the weekend, introducing an AI model that combines frontier-tier coding and agentic capabilities with aggressive pricing that undercuts leading U.S. competitors by 80-95%. The cost-effective model pairs a 1-million-token context window with native multimodality, available now via API at $0.3 per 1 million input tokens and $1.20 per million output tokens during its introductory week
1
. Even at full pricing of $0.6/$2.40 per million tokens, MiniMax M3 remains at just 8-20% the cost of leading proprietary U.S. models from Google, OpenAI, and Anthropic. The company also announced plans to deliver the model as an open-source AI model with full weights within the next 10 days, allowing enterprise downloading and customizability free of charge1
.
Source: Geeky Gadgets
The benchmark performance of MiniMax M3 positions it ahead of several established competitors on critical metrics. The model achieved 59.0% on Swaybench Pro, an autonomous agent metric that measures advanced reasoning and problem-solving capabilities, surpassing both GPT-5.5 and Gemini 3.1 Pro
1
2
. On Terminal Bench 2.1, it recorded 66.0%, running neck-and-neck with Opus 4.7's 66.1% baseline. The model also scored 74.2% on MCP Atlas and 83.5 on BrowseComp, outstripping Claude Opus 4.7's score of 79.3 in autonomous browsing and information retrieval1
. On SVG Bench, the model excels in creating high-quality vector graphics and animations, demonstrating utility across creative industries2
. However, when compared to Anthropic's premium Opus 4.8 released last week, MiniMax M3's 59.0% on Swaybench Pro trails the newer model's 69.2% threshold1
.At the core of the model's cost advantage lies MiniMax Sparse Attention (MSA), an architectural innovation that addresses the quadratic scaling problem of traditional Transformer networks. The efficient architecture uses a "KV outer gather Q" approach, treating Key-Value blocks as an outer loop and dynamically aggregating only specific queries that hit them. Because each data block is read exactly once with strictly contiguous memory access, hardware utilization increases dramatically
1
. In internal trials, MSA runs more than 4x faster than alternative open-source solutions like Flash-Sparse-Attention. When managing the full 1-million-token context window, MiniMax M3's per-token compute demand drops to just 1/20th of the previous generation model, translating into a 9x acceleration in prefilling and a 15x boost during decoding1
. This sparse attention mechanism ensures the model performs well even in large-scale deployments while optimizing resource usage for cost-conscious applications2
.
Source: VentureBeat
Unlike models that retrofit vision capabilities onto pretrained text networks, MiniMax engineered M3 as a natively multimodal system from "Step Zero." The company overhauled its data ingest machinery to blend naturally interleaved sequences of text, images, and visual components, scaling the total pretraining corpus beyond 100 trillion tokens
1
. This deep data alignment enables advanced multimodal reasoning, allowing the model to translate complex visual geometries such as programming charts or coordinate maps into structural code without losing contextual fidelity. The multimodality feature processes and analyzes both text and visual data with precision, making it ideal for applications like image captioning, visual question answering, and multimedia content generation2
.Related Stories
The long-context processing capability supporting up to a 1-million-token context window positions MiniMax M3 for enterprise-scale deployments requiring deep contextual understanding. This feature proves essential for tasks like document analysis, extended conversations, and large-scale data summarization, ensuring the model can handle intricate workflows
2
. Real-world applications span multiple industries: front-end development teams can automate creation of dynamic user interfaces, 3D development projects benefit from interactive simulations and immersive web experiences, and SVG generation produces intricate vector graphics for marketing materials and technical illustrations2
. The model also delivers optimal results in CUDA kernel optimization tasks through Kernel Bench Hard testing, making it valuable for high-performance computing projects2
.The token-based pricing model fundamentally shifts the economics of enterprise AI deployment. At the special introductory rate, developers pay just $0.3 per 1 million input tokens, compared to significantly higher rates from established providers. A subscription option starting at $20 per month makes the technology accessible to developers, researchers, and small businesses previously priced out of frontier AI capabilities
1
. The open-source community aspect encourages collaboration and continuous improvement, ensuring the model evolves through contributions from developers worldwide2
. This combination of aggressive pricing and open weights challenges the traditional matrix that forced developers to choose between top-tier closed-source intelligence behind restrictive APIs or nimble, cost-effective open models that falter on multi-step reasoning and dense coding tasks. For organizations watching competitive dynamics in AI, the short-term implication centers on immediate cost savings for production deployments, while the long-term impact may reshape expectations around the price-performance relationship that has defined frontier model access. As the open-source release approaches within 10 days, enterprise teams should evaluate how downloadable weights might accelerate customization for domain-specific applications without ongoing API costs.Summarized by
Navi
[1]
27 Oct 2025•Technology

18 Jun 2025•Technology

22 May 2026•Technology

1
Policy and Regulation

2
Technology

3
Technology

1
Pope Leo XIV releases major AI encyclical calling for 'disarmament' of artificial intelligence

2
Apple's Siri overhaul for iOS 27 brings Gemini integration and standalone app to compete with ChatGPT

3
Nvidia unveils RTX Spark chip to chase $200B CPU market with AI agent PCs from Microsoft, Dell, and HP
