Subquadratic AI Raises $29M for 12M Context Window

Subquadratic Emerges with Bold Claims About AI Efficiency

A Miami-based startup called Subquadratic emerged from stealth on Tuesday with a claim that has sent ripples through the AI research community: it has built the first large language model to fully escape the mathematical constraint that has defined every major AI system since 20171

. The company's SubQ 1M-Preview model is built on what it calls a fully subquadratic architecture, where compute grows linearly with context length rather than quadratically2

. At 12 million tokens, the company claims its architecture reduces attention compute by almost 1,000 times compared to other frontier models.

The startup announced it has raised $29 million in seed funding from investors including Tinder co-founder Justin Mateen, former SoftBank Vision Fund partner Javier Villamizar, and early investors in Anthropic, OpenAI, Stripe, and Brex1

. The New Stack reported the raise values the company at $500 million. Co-founders Justin Dangel, who serves as chief executive, and Chief Technology Officer Alexander Whedon are leading the effort to commercialize what they describe as a fundamental shift in how AI processes information.

Source: VentureBeat

Understanding the Quadratic Scaling Problem

Every transformer-based AI model, which includes virtually every frontier system from OpenAI, Anthropic, and Google, relies on an operation called "attention"1

. In traditional transformer architecture, every token is compared against every other token, so as inputs grow, the number of interactions scales quadratically. Double the input size, and the cost doesn't double—it quadruples. This quadratic scaling problem has shaped the economics of the entire AI industry, dictating what gets built and what doesn't.

The industry standard context window is 128,000 tokens for many AI models and up to 1 million tokens for frontier cloud models such as Claude Sonnet 4.7 and Gemini 3.1 Pro2

. Even at those sizes, the compute cost of processing long inputs becomes punishing. The industry has built an elaborate stack of workarounds to cope, including RAG systems that use search to pull relevant results before sending them to the model, because sending the full corpus isn't feasible. Developers layer retrieval pipelines, chunking strategies, prompts engineering, and multi-agent workflows on top of models—all to route around the fundamental constraint that the model can't efficiently process everything at once.

Source: SiliconANGLE

How Sparse Attention Changes the Game

Subquadratic's approach, called Subquadratic Sparse Attention or SSA, is built on a straightforward premise: most token-to-token comparisons in standard attention are wasted compute1

. Instead of comparing every token to every other token with dense attention, SSA learns to identify which comparisons actually matter and computes attention only over those positions. The selection is content-dependent—the model decides where to look based on meaning, not on fixed positional patterns.

"If you double the input size with quadratic scaling laws, you need four times the compute; with linear scaling laws, you need just twice," Whedon explained to SiliconANGLE2

. According to the company's technical blog, SSA achieves a 7.2x prefill speedup over dense attention at 128,000 tokens, rising to 52.2x at 1 million tokens. The practical payoff scales with context length—exactly the inverse of the problem it's trying to solve.

Performance Claims and Cost Reductions

According to Subquadratic, SubQ is more than 50 times faster and 50 times less expensive than leading frontier models at 1 million tokens, while maintaining higher accuracy2

. On the RULER 128K long-context benchmarks, the company said SubQ scored 95% accuracy at a cost of $8, compared with 94% accuracy and about $2,600 for Claude Opus—representing roughly a 300-times reduction in cost. At its full 12 million-token context window, which would be around 9 million words or almost 120 books, the model reduces compute requirements by almost 1,000 times compared with other frontier models.

The company trained the model in three stages: pretraining, supervised fine-tuning, and a reinforcement learning stage specifically targeting long-context retrieval failures1

. This teaches the model to aggressively use distant context rather than defaulting to nearby information, a subtle failure mode that quietly degrades performance in existing systems.

Three Products Enter Private Beta

Subquadratic is launching three products into private beta. The SubQ API exposes the full 12 million tokens context window to developers and enterprise teams2

. SubQ Code is a command-line coding agent designed to load entire codebases into a single context window, so developers can plan, execute, and review across a repository without coordinating multiple agents. The company is also launching SubQ Search, a search tool that leverages the extended context capabilities.

"I used to manually curate prompts and retrieval systems and evals and conditional logic to chain together the workflows," Whedon told SiliconANGLE2

. "And I think that that is kind of a waste of human intelligence and also limiting to the product quality." Subquadratic's vision is that AI is being constrained by the cost curve of dense-attention transformers, and that moving from quadratic to linear scaling will enable developers to build products that were previously too slow, too expensive, or too reliant on brittle data curation.

Research Community Demands Independent Validation

The reaction from the AI research community has been mixed, ranging from genuine curiosity to open accusations of vaporware1

. The numbers Subquadratic is publishing are extraordinary, and if validated independently, would dwarf the efficiency gains of any existing approach. Researchers are calling for independent proof before accepting what could represent a genuine inflection point in how AI systems scale. What remains to be seen is whether the company's benchmarks hold up under scrutiny and whether the approach can maintain quality across the diverse range of tasks that define state-of-the-art large language model performance.

Subquadratic claims 1,000x AI efficiency with 12 million token context window, researchers skeptical

Subquadratic Emerges with Bold Claims About AI Efficiency

Understanding the Quadratic Scaling Problem

How Sparse Attention Changes the Game

Performance Claims and Cost Reductions

Three Products Enter Private Beta

Research Community Demands Independent Validation

References

Miami startup Subquadratic claims 1,000x AI efficiency gain with SubQ model; researchers demand independent proof.

Subquadratic launches with $29M to bring 12M-token context windows to AI - SiliconANGLE

Related Stories

AI Model Race Heats Up: DeepSeek, Allen Institute, and Alibaba Push Boundaries

The Evolving Landscape of AI: Open Models Closing the Gap as LLMs Hit Scaling Limits

DeepSeek makes 75% price cut permanent, intensifying AI price war with OpenAI and Anthropic

Recent Highlights

Anthropic raises $65 billion, overtakes OpenAI as most valuable AI startup at $965 billion valuation

Apple's Siri overhaul for iOS 27 brings Gemini integration and standalone app to compete with ChatGPT

Google AI Search officially replaces traditional web search with Gemini-powered conversations

Recent Highlights

Today's Top Stories

ChatGPT vulnerability turns web page summaries into phishing traps via prompt injection

OpenAI retires GPT-4.5 and o3, closing the chapter on ChatGPT models that sparked the AI boom

Microsoft builds super app to unite fragmented Copilot AI tools under 'one Copilot' vision

Apple faces pressure to deliver on AI promises at WWDC 2026 with major Siri overhaul