Subquadratic Claims to Solve LLM Bottleneck Issue

Miami-Based Startup Tackles Decade-Old Challenge in AI

Subquadratic, a Miami-based startup, emerged from stealth mode last month with $29 million in seed funding and an audacious claim: it has solved the LLM bottleneck that has plagued large language models for nearly a decade 2

. The company announced its SubQ language model, which it says is faster, cheaper, and consumes far less energy than competing models from Google DeepMind, OpenAI, and Anthropic 1

. The initial announcement drew immediate skepticism, with one AI engineer comparing it to Theranos on X: "SubQ is either the biggest breakthrough since the Transformer ... or it's AI Theranos" 1

Understanding the Quadratic Attention Bottleneck

To grasp why Subquadratic's claims matter, it helps to understand how the transformer architecture works. Most LLMs rely on a dense attention mechanism introduced by Google researchers in 2017. This process encodes each word or token with a number, then multiplies each number with every other number to capture the full meaning of the text 1

. For a 10,000-word text, this kicks off almost 50 million individual multiplications, creating massive computational overhead 1

. The problem intensifies as text length increases: double the words and you roughly quadruple the computations, a rate known as quadratic expansion 2

. This quadratic scaling is the primary reason LLMs consume enormous amounts of power and compute resources.

Source: MIT Tech Review

Sparse Attention as the Solution

Subquadratic's approach ditches the dense attention mechanism in favor of sparse attention, which dramatically slashes the number of computations required 1

. Instead of comparing every word with every other word, sparse attention keeps only the pairs that matter most 2

. While the concept isn't new, previous attempts failed to match the quality of dense attention. Subquadratic's version picks which words to focus on dynamically based on content rather than fixed patterns. "That's kind of where the secret sauce is," says co-founder and chief technology officer Alex Whedon 2

. The company claims SubQ can process up to 12 times as much text at once compared to most other models, enabling long-context processing of entire code bases or documents hundreds of pages long 1

Independent Benchmarks Validate Performance Claims

Facing widespread skepticism after providing only self-published test scores initially, Subquadratic brought in Appen, a third-party firm that evaluates AI models, to run independent tests 1

. The results proved striking. On raw speed tests, SubQ ran 56 times faster than FlashAttention, a leading existing method 2

. On a challenging coding benchmark, it scored 89.7 percent, approaching the performance of the best models available 2

. The cost differential appears even more dramatic: running one long-context test on Anthropic's top model costs approximately $2,600, while the same test on SubQ costs just eight dollars 2

. "That was really exciting to me, it validated their architecture," says Jeanine Sinanan-Singh, Appen's director of generative AI research 1

Skepticism Remains Despite Promising Results

Despite the independent validation, caution persists among AI researchers. Will Depue, an independent researcher who previously worked at OpenAI, notes that "the public evidence does not yet justify the stronger claim that they have solved the quadratic attention bottleneck" 2

. A key concern centers on SubQ's development approach: rather than training from scratch, Subquadratic started with an existing open-weight model and swapped in its new attention method 2

. While this practice is common in the industry, it sits awkwardly alongside claims of fundamentally reinventing how LLMs work. Additionally, SubQ isn't widely available yet—tens of thousands have joined the waitlist, but only a handful have access 2

. Benchmarks don't always translate to real-world performance, making broader testing critical.

What This Means for Energy-Efficient AI

If Subquadratic's results hold up under scrutiny, the implications for energy-efficient AI could be substantial. Cheaper, faster models with superior long-context processing capabilities could analyze entire codebases, contract sets, or document collections in a single pass 2

. This addresses a pressing need as AI already strains against spiraling economics, particularly for AI agents and other compute-intensive applications. Other startups like Thomas Reardon's Flourish are attacking efficiency from different angles, but Subquadratic believes its approach will reshape the entire field 2

. "We hope we're kicking off a new age of efficiency," says Justin Dangel, the firm's cofounder and CEO. "We don't think anybody will be building on transformers in a few years" 1

. Whether this AI startup can deliver on its ambitious vision remains to be seen, but the independent benchmarks suggest the technology deserves serious attention as the industry seeks alternatives to power-hungry transformer models.

Miami startup Subquadratic claims breakthrough on LLM bottleneck with sparse attention model

Miami-Based Startup Tackles Decade-Old Challenge in AI

Understanding the Quadratic Attention Bottleneck

Sparse Attention as the Solution

Independent Benchmarks Validate Performance Claims

Skepticism Remains Despite Promising Results

What This Means for Energy-Efficient AI

References

A startup claims it broke through a bottleneck that's holding back LLMs

A startup says it cracked the bottleneck holding back AI

Related Stories

Subquadratic claims 1,000x AI efficiency with 12 million token context window, researchers skeptical

The Evolving Landscape of AI: Open Models Closing the Gap as LLMs Hit Scaling Limits

DeepSeek's AI Breakthrough Shakes Global Tech Industry and Markets

Recent Highlights

AI scores perfect 100% at International Mathematical Olympiad, matching elite human performance

OpenAI agent exploited exposed credentials at four services during Hugging Face breach

Anthropic AI cracks post-quantum cryptography and finds faster AES attack autonomously

Recent Highlights

Today's Top Stories

Google DeepMind dismantles Nobel Prize-winning AlphaFold team in strategic shift to Gemini

Google Gemini on macOS Gets Voice Commands via Fn Key for Hands-Free Productivity

ChatGPT nears 1 billion weekly users, seven months behind OpenAI's ambitious timeline

Major record labels demand AI-generated songs face strict rules to enter global music charts