Miami startup Subquadratic claims breakthrough on LLM bottleneck with sparse attention model

Reviewed byNidhi Govil

2 Sources

Share

Subquadratic emerged from stealth with bold claims about solving a decade-old mathematical bottleneck in large language models. The Miami-based AI startup says its SubQ model uses sparse attention instead of dense attention, making it 56 times faster than existing methods while slashing costs dramatically. Independent tests from Appen back many claims, though skeptics say more proof is needed before declaring the quadratic attention problem solved.

Miami-Based Startup Tackles Decade-Old Challenge in AI

Subquadratic, a Miami-based startup, emerged from stealth mode last month with $29 million in seed funding and an audacious claim: it has solved the LLM bottleneck that has plagued large language models for nearly a decade

2

. The company announced its SubQ language model, which it says is faster, cheaper, and consumes far less energy than competing models from Google DeepMind, OpenAI, and Anthropic

1

. The initial announcement drew immediate skepticism, with one AI engineer comparing it to Theranos on X: "SubQ is either the biggest breakthrough since the Transformer ... or it's AI Theranos"

1

.

Understanding the Quadratic Attention Bottleneck

To grasp why Subquadratic's claims matter, it helps to understand how the transformer architecture works. Most LLMs rely on a dense attention mechanism introduced by Google researchers in 2017. This process encodes each word or token with a number, then multiplies each number with every other number to capture the full meaning of the text

1

. For a 10,000-word text, this kicks off almost 50 million individual multiplications, creating massive computational overhead

1

. The problem intensifies as text length increases: double the words and you roughly quadruple the computations, a rate known as quadratic expansion

2

. This quadratic scaling is the primary reason LLMs consume enormous amounts of power and compute resources.

Source: MIT Tech Review

Source: MIT Tech Review

Sparse Attention as the Solution

Subquadratic's approach ditches the dense attention mechanism in favor of sparse attention, which dramatically slashes the number of computations required

1

. Instead of comparing every word with every other word, sparse attention keeps only the pairs that matter most

2

. While the concept isn't new, previous attempts failed to match the quality of dense attention. Subquadratic's version picks which words to focus on dynamically based on content rather than fixed patterns. "That's kind of where the secret sauce is," says co-founder and chief technology officer Alex Whedon

2

. The company claims SubQ can process up to 12 times as much text at once compared to most other models, enabling long-context processing of entire code bases or documents hundreds of pages long

1

.

Independent Benchmarks Validate Performance Claims

Facing widespread skepticism after providing only self-published test scores initially, Subquadratic brought in Appen, a third-party firm that evaluates AI models, to run independent tests

1

. The results proved striking. On raw speed tests, SubQ ran 56 times faster than FlashAttention, a leading existing method

2

. On a challenging coding benchmark, it scored 89.7 percent, approaching the performance of the best models available

2

. The cost differential appears even more dramatic: running one long-context test on Anthropic's top model costs approximately $2,600, while the same test on SubQ costs just eight dollars

2

. "That was really exciting to me, it validated their architecture," says Jeanine Sinanan-Singh, Appen's director of generative AI research

1

.

Skepticism Remains Despite Promising Results

Despite the independent validation, caution persists among AI researchers. Will Depue, an independent researcher who previously worked at OpenAI, notes that "the public evidence does not yet justify the stronger claim that they have solved the quadratic attention bottleneck"

2

. A key concern centers on SubQ's development approach: rather than training from scratch, Subquadratic started with an existing open-weight model and swapped in its new attention method

2

. While this practice is common in the industry, it sits awkwardly alongside claims of fundamentally reinventing how LLMs work. Additionally, SubQ isn't widely available yet—tens of thousands have joined the waitlist, but only a handful have access

2

. Benchmarks don't always translate to real-world performance, making broader testing critical.

What This Means for Energy-Efficient AI

If Subquadratic's results hold up under scrutiny, the implications for energy-efficient AI could be substantial. Cheaper, faster models with superior long-context processing capabilities could analyze entire codebases, contract sets, or document collections in a single pass

2

. This addresses a pressing need as AI already strains against spiraling economics, particularly for AI agents and other compute-intensive applications. Other startups like Thomas Reardon's Flourish are attacking efficiency from different angles, but Subquadratic believes its approach will reshape the entire field

2

. "We hope we're kicking off a new age of efficiency," says Justin Dangel, the firm's cofounder and CEO. "We don't think anybody will be building on transformers in a few years"

1

. Whether this AI startup can deliver on its ambitious vision remains to be seen, but the independent benchmarks suggest the technology deserves serious attention as the industry seeks alternatives to power-hungry transformer models.

Today's Top Stories

© 2026 TheOutpost.AI All rights reserved