MiniMax M3 Beats GPT-5.5 at 5-10% of the Cost

MiniMax M3 Launches with Frontier Performance at Fraction of Cost

Chinese AI startup MiniMax released its highly anticipated M3 large language model over the weekend, introducing an AI model that combines frontier-tier coding and agentic capabilities with aggressive pricing that undercuts leading U.S. competitors by 80-95%. The cost-effective model pairs a 1-million-token context window with native multimodality, available now via API at $0.3 per 1 million input tokens and $1.20 per million output tokens during its introductory week1

. Even at full pricing of $0.6/$2.40 per million tokens, MiniMax M3 remains at just 8-20% the cost of leading proprietary U.S. models from Google, OpenAI, and Anthropic. The company also announced plans to deliver the model as an open-source AI model with full weights within the next 10 days, allowing enterprise downloading and customizability free of charge1

Source: Geeky Gadgets

MiniMax M3 Outperforms Proprietary Models on Key Benchmarks

The benchmark performance of MiniMax M3 positions it ahead of several established competitors on critical metrics. The model achieved 59.0% on Swaybench Pro, an autonomous agent metric that measures advanced reasoning and problem-solving capabilities, surpassing both GPT-5.5 and Gemini 3.1 Pro1

. On Terminal Bench 2.1, it recorded 66.0%, running neck-and-neck with Opus 4.7's 66.1% baseline. The model also scored 74.2% on MCP Atlas and 83.5 on BrowseComp, outstripping Claude Opus 4.7's score of 79.3 in autonomous browsing and information retrieval1

. On SVG Bench, the model excels in creating high-quality vector graphics and animations, demonstrating utility across creative industries2

. However, when compared to Anthropic's premium Opus 4.8 released last week, MiniMax M3's 59.0% on Swaybench Pro trails the newer model's 69.2% threshold1

MiniMax Sparse Attention Powers Efficient Architecture

At the core of the model's cost advantage lies MiniMax Sparse Attention (MSA), an architectural innovation that addresses the quadratic scaling problem of traditional Transformer networks. The efficient architecture uses a "KV outer gather Q" approach, treating Key-Value blocks as an outer loop and dynamically aggregating only specific queries that hit them. Because each data block is read exactly once with strictly contiguous memory access, hardware utilization increases dramatically1

. In internal trials, MSA runs more than 4x faster than alternative open-source solutions like Flash-Sparse-Attention. When managing the full 1-million-token context window, MiniMax M3's per-token compute demand drops to just 1/20th of the previous generation model, translating into a 9x acceleration in prefilling and a 15x boost during decoding1

. This sparse attention mechanism ensures the model performs well even in large-scale deployments while optimizing resource usage for cost-conscious applications2

Source: VentureBeat

Advanced Multimodal Reasoning Built from Ground Up

Unlike models that retrofit vision capabilities onto pretrained text networks, MiniMax engineered M3 as a natively multimodal system from "Step Zero." The company overhauled its data ingest machinery to blend naturally interleaved sequences of text, images, and visual components, scaling the total pretraining corpus beyond 100 trillion tokens1

. This deep data alignment enables advanced multimodal reasoning, allowing the model to translate complex visual geometries such as programming charts or coordinate maps into structural code without losing contextual fidelity. The multimodality feature processes and analyzes both text and visual data with precision, making it ideal for applications like image captioning, visual question answering, and multimedia content generation2

Long-Context Processing Enables Enterprise Applications

The long-context processing capability supporting up to a 1-million-token context window positions MiniMax M3 for enterprise-scale deployments requiring deep contextual understanding. This feature proves essential for tasks like document analysis, extended conversations, and large-scale data summarization, ensuring the model can handle intricate workflows2

. Real-world applications span multiple industries: front-end development teams can automate creation of dynamic user interfaces, 3D development projects benefit from interactive simulations and immersive web experiences, and SVG generation produces intricate vector graphics for marketing materials and technical illustrations2

. The model also delivers optimal results in CUDA kernel optimization tasks through Kernel Bench Hard testing, making it valuable for high-performance computing projects2

Token-Based Pricing Disrupts Enterprise AI Economics

The token-based pricing model fundamentally shifts the economics of enterprise AI deployment. At the special introductory rate, developers pay just $0.3 per 1 million input tokens, compared to significantly higher rates from established providers. A subscription option starting at $20 per month makes the technology accessible to developers, researchers, and small businesses previously priced out of frontier AI capabilities1

. The open-source community aspect encourages collaboration and continuous improvement, ensuring the model evolves through contributions from developers worldwide2

. This combination of aggressive pricing and open weights challenges the traditional matrix that forced developers to choose between top-tier closed-source intelligence behind restrictive APIs or nimble, cost-effective open models that falter on multi-step reasoning and dense coding tasks. For organizations watching competitive dynamics in AI, the short-term implication centers on immediate cost savings for production deployments, while the long-term impact may reshape expectations around the price-performance relationship that has defined frontier model access. As the open-source release approaches within 10 days, enterprise teams should evaluate how downloadable weights might accelerate customization for domain-specific applications without ongoing API costs.

MiniMax M3 beats GPT-5.5 and Gemini 3.1 Pro on benchmarks for 5-10% of the cost

MiniMax M3 Launches with Frontier Performance at Fraction of Cost

MiniMax M3 Outperforms Proprietary Models on Key Benchmarks

MiniMax Sparse Attention Powers Efficient Architecture

Advanced Multimodal Reasoning Built from Ground Up

Long-Context Processing Enables Enterprise Applications

Token-Based Pricing Disrupts Enterprise AI Economics

References

MiniMax M3 debuts, eclipsing GPT-5.5 and Gemini 3.1 Pro on key benchmark performance for just 5-10% of the cost

Open Source MiniMax M3 Outperforms Opus 4.7 for a Fraction of the Cost

Related Stories

MiniMax-M2: The New Open-Source AI Powerhouse Challenging Proprietary Models

MiniMax M1: China's New Open-Source AI Model Challenges Global Leaders with Efficiency and Performance

Alibaba's Qwen3.7-Max AI model runs 35 hours autonomously, outperforms rivals on coding benchmarks

Recent Highlights

OpenAI releases GPT-5.6 models after government review, unveils ChatGPT Work to compete in AI agent race

Apple sues OpenAI for allegedly stealing trade secrets as hardware rivalry intensifies

Apple Opens Siri AI to Everyone with iOS 27 Public Beta After Years of Delays

Recent Highlights

Today's Top Stories

OpenAI's first hardware device is a screenless smart speaker with mechanical movement

DeepMind's Demis Hassabis pushes for US-led AI standards body as AGI looms within years

Google Images gets Pinterest-like redesign and AI image generation for 25th anniversary

OpenAI's GPT-5.6 Sol is deleting files without permission, developers warn