MiniMax M3 beats GPT-5.5 and Gemini 3.1 Pro on benchmarks for 5-10% of the cost

2 Sources

Share

Chinese AI startup MiniMax launched its M3 large language model, delivering frontier-tier performance that eclipses GPT-5.5 and Gemini 3.1 Pro on key benchmarks while costing just 5-10% as much. The model features a 1-million-token context window, native multimodality, and will be released as open-source with full weights within 10 days, fundamentally challenging the cost-performance trade-off in enterprise AI.

MiniMax M3 Launches with Frontier Performance at Fraction of Cost

Chinese AI startup MiniMax released its highly anticipated M3 large language model over the weekend, introducing an AI model that combines frontier-tier coding and agentic capabilities with aggressive pricing that undercuts leading U.S. competitors by 80-95%. The cost-effective model pairs a 1-million-token context window with native multimodality, available now via API at $0.3 per 1 million input tokens and $1.20 per million output tokens during its introductory week

1

. Even at full pricing of $0.6/$2.40 per million tokens, MiniMax M3 remains at just 8-20% the cost of leading proprietary U.S. models from Google, OpenAI, and Anthropic. The company also announced plans to deliver the model as an open-source AI model with full weights within the next 10 days, allowing enterprise downloading and customizability free of charge

1

.

Source: Geeky Gadgets

Source: Geeky Gadgets

MiniMax M3 Outperforms Proprietary Models on Key Benchmarks

The benchmark performance of MiniMax M3 positions it ahead of several established competitors on critical metrics. The model achieved 59.0% on Swaybench Pro, an autonomous agent metric that measures advanced reasoning and problem-solving capabilities, surpassing both GPT-5.5 and Gemini 3.1 Pro

1

2

. On Terminal Bench 2.1, it recorded 66.0%, running neck-and-neck with Opus 4.7's 66.1% baseline. The model also scored 74.2% on MCP Atlas and 83.5 on BrowseComp, outstripping Claude Opus 4.7's score of 79.3 in autonomous browsing and information retrieval

1

. On SVG Bench, the model excels in creating high-quality vector graphics and animations, demonstrating utility across creative industries

2

. However, when compared to Anthropic's premium Opus 4.8 released last week, MiniMax M3's 59.0% on Swaybench Pro trails the newer model's 69.2% threshold

1

.

MiniMax Sparse Attention Powers Efficient Architecture

At the core of the model's cost advantage lies MiniMax Sparse Attention (MSA), an architectural innovation that addresses the quadratic scaling problem of traditional Transformer networks. The efficient architecture uses a "KV outer gather Q" approach, treating Key-Value blocks as an outer loop and dynamically aggregating only specific queries that hit them. Because each data block is read exactly once with strictly contiguous memory access, hardware utilization increases dramatically

1

. In internal trials, MSA runs more than 4x faster than alternative open-source solutions like Flash-Sparse-Attention. When managing the full 1-million-token context window, MiniMax M3's per-token compute demand drops to just 1/20th of the previous generation model, translating into a 9x acceleration in prefilling and a 15x boost during decoding

1

. This sparse attention mechanism ensures the model performs well even in large-scale deployments while optimizing resource usage for cost-conscious applications

2

.

Source: VentureBeat

Source: VentureBeat

Advanced Multimodal Reasoning Built from Ground Up

Unlike models that retrofit vision capabilities onto pretrained text networks, MiniMax engineered M3 as a natively multimodal system from "Step Zero." The company overhauled its data ingest machinery to blend naturally interleaved sequences of text, images, and visual components, scaling the total pretraining corpus beyond 100 trillion tokens

1

. This deep data alignment enables advanced multimodal reasoning, allowing the model to translate complex visual geometries such as programming charts or coordinate maps into structural code without losing contextual fidelity. The multimodality feature processes and analyzes both text and visual data with precision, making it ideal for applications like image captioning, visual question answering, and multimedia content generation

2

.

Long-Context Processing Enables Enterprise Applications

The long-context processing capability supporting up to a 1-million-token context window positions MiniMax M3 for enterprise-scale deployments requiring deep contextual understanding. This feature proves essential for tasks like document analysis, extended conversations, and large-scale data summarization, ensuring the model can handle intricate workflows

2

. Real-world applications span multiple industries: front-end development teams can automate creation of dynamic user interfaces, 3D development projects benefit from interactive simulations and immersive web experiences, and SVG generation produces intricate vector graphics for marketing materials and technical illustrations

2

. The model also delivers optimal results in CUDA kernel optimization tasks through Kernel Bench Hard testing, making it valuable for high-performance computing projects

2

.

Token-Based Pricing Disrupts Enterprise AI Economics

The token-based pricing model fundamentally shifts the economics of enterprise AI deployment. At the special introductory rate, developers pay just $0.3 per 1 million input tokens, compared to significantly higher rates from established providers. A subscription option starting at $20 per month makes the technology accessible to developers, researchers, and small businesses previously priced out of frontier AI capabilities

1

. The open-source community aspect encourages collaboration and continuous improvement, ensuring the model evolves through contributions from developers worldwide

2

. This combination of aggressive pricing and open weights challenges the traditional matrix that forced developers to choose between top-tier closed-source intelligence behind restrictive APIs or nimble, cost-effective open models that falter on multi-step reasoning and dense coding tasks. For organizations watching competitive dynamics in AI, the short-term implication centers on immediate cost savings for production deployments, while the long-term impact may reshape expectations around the price-performance relationship that has defined frontier model access. As the open-source release approaches within 10 days, enterprise teams should evaluate how downloadable weights might accelerate customization for domain-specific applications without ongoing API costs.

Today's Top Stories

TheOutpost.ai

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

Instagram logo
LinkedIn logo
Youtube logo
© 2026 TheOutpost.AI All rights reserved