Subquadratic claims 1,000x AI efficiency with 12 million token context window, researchers skeptical

Reviewed byNidhi Govil

2 Sources

Share

Miami-based Subquadratic emerged from stealth with $29 million in seed funding, claiming its SubQ model is the first large language model built on fully subquadratic architecture. The company says it reduces compute by almost 1,000 times at 12 million tokens compared to frontier models, but AI researchers are demanding independent proof of the extraordinary claims before validating what could be a major inflection point in how AI systems scale.

Subquadratic Emerges with Bold Claims About AI Efficiency

A Miami-based startup called Subquadratic emerged from stealth on Tuesday with a claim that has sent ripples through the AI research community: it has built the first large language model to fully escape the mathematical constraint that has defined every major AI system since 2017

1

. The company's SubQ 1M-Preview model is built on what it calls a fully subquadratic architecture, where compute grows linearly with context length rather than quadratically

2

. At 12 million tokens, the company claims its architecture reduces attention compute by almost 1,000 times compared to other frontier models.

The startup announced it has raised $29 million in seed funding from investors including Tinder co-founder Justin Mateen, former SoftBank Vision Fund partner Javier Villamizar, and early investors in Anthropic, OpenAI, Stripe, and Brex

1

. The New Stack reported the raise values the company at $500 million. Co-founders Justin Dangel, who serves as chief executive, and Chief Technology Officer Alexander Whedon are leading the effort to commercialize what they describe as a fundamental shift in how AI processes information.

Source: VentureBeat

Source: VentureBeat

Understanding the Quadratic Scaling Problem

Every transformer-based AI model, which includes virtually every frontier system from OpenAI, Anthropic, and Google, relies on an operation called "attention"

1

. In traditional transformer architecture, every token is compared against every other token, so as inputs grow, the number of interactions scales quadratically. Double the input size, and the cost doesn't double—it quadruples. This quadratic scaling problem has shaped the economics of the entire AI industry, dictating what gets built and what doesn't.

The industry standard context window is 128,000 tokens for many AI models and up to 1 million tokens for frontier cloud models such as Claude Sonnet 4.7 and Gemini 3.1 Pro

2

. Even at those sizes, the compute cost of processing long inputs becomes punishing. The industry has built an elaborate stack of workarounds to cope, including RAG systems that use search to pull relevant results before sending them to the model, because sending the full corpus isn't feasible. Developers layer retrieval pipelines, chunking strategies, prompts engineering, and multi-agent workflows on top of models—all to route around the fundamental constraint that the model can't efficiently process everything at once.

Source: SiliconANGLE

Source: SiliconANGLE

How Sparse Attention Changes the Game

Subquadratic's approach, called Subquadratic Sparse Attention or SSA, is built on a straightforward premise: most token-to-token comparisons in standard attention are wasted compute

1

. Instead of comparing every token to every other token with dense attention, SSA learns to identify which comparisons actually matter and computes attention only over those positions. The selection is content-dependent—the model decides where to look based on meaning, not on fixed positional patterns.

"If you double the input size with quadratic scaling laws, you need four times the compute; with linear scaling laws, you need just twice," Whedon explained to SiliconANGLE

2

. According to the company's technical blog, SSA achieves a 7.2x prefill speedup over dense attention at 128,000 tokens, rising to 52.2x at 1 million tokens. The practical payoff scales with context length—exactly the inverse of the problem it's trying to solve.

Performance Claims and Cost Reductions

According to Subquadratic, SubQ is more than 50 times faster and 50 times less expensive than leading frontier models at 1 million tokens, while maintaining higher accuracy

2

. On the RULER 128K long-context benchmarks, the company said SubQ scored 95% accuracy at a cost of $8, compared with 94% accuracy and about $2,600 for Claude Opus—representing roughly a 300-times reduction in cost. At its full 12 million-token context window, which would be around 9 million words or almost 120 books, the model reduces compute requirements by almost 1,000 times compared with other frontier models.

The company trained the model in three stages: pretraining, supervised fine-tuning, and a reinforcement learning stage specifically targeting long-context retrieval failures

1

. This teaches the model to aggressively use distant context rather than defaulting to nearby information, a subtle failure mode that quietly degrades performance in existing systems.

Three Products Enter Private Beta

Subquadratic is launching three products into private beta. The SubQ API exposes the full 12 million tokens context window to developers and enterprise teams

2

. SubQ Code is a command-line coding agent designed to load entire codebases into a single context window, so developers can plan, execute, and review across a repository without coordinating multiple agents. The company is also launching SubQ Search, a search tool that leverages the extended context capabilities.

"I used to manually curate prompts and retrieval systems and evals and conditional logic to chain together the workflows," Whedon told SiliconANGLE

2

. "And I think that that is kind of a waste of human intelligence and also limiting to the product quality." Subquadratic's vision is that AI is being constrained by the cost curve of dense-attention transformers, and that moving from quadratic to linear scaling will enable developers to build products that were previously too slow, too expensive, or too reliant on brittle data curation.

Research Community Demands Independent Validation

The reaction from the AI research community has been mixed, ranging from genuine curiosity to open accusations of vaporware

1

. The numbers Subquadratic is publishing are extraordinary, and if validated independently, would dwarf the efficiency gains of any existing approach. Researchers are calling for independent proof before accepting what could represent a genuine inflection point in how AI systems scale. What remains to be seen is whether the company's benchmarks hold up under scrutiny and whether the approach can maintain quality across the diverse range of tasks that define state-of-the-art large language model performance.

Today's Top Stories

TheOutpost.ai

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

Instagram logo
LinkedIn logo
Youtube logo
© 2026 TheOutpost.AI All rights reserved