2 Sources
[1]
Miami startup Subquadratic claims 1,000x AI efficiency gain with SubQ model; researchers demand independent proof.
A little-known Miami-based startup called Subquadratic emerged from stealth on Tuesday with a sweeping claim: that it has built the first large language model to fully escape the mathematical constraint that has defined -- and limited -- every major AI system since 2017. The company claims its first model, SubQ 1M-Preview, is the first LLM built on a fully subquadratic architecture -- one where compute grows linearly with context length. If that claim holds, it would be a genuine inflection point in how AI systems scale. At 12 million tokens, the company says, its architecture reduces attention compute by almost 1,000 times compared to other frontier models -- a figure that, if validated independently, would dwarf the efficiency gains of any existing approach. The company is also launching three products into private beta: an API exposing the full context window, a command-line coding agent called SubQ Code, and a search tool called SubQ Search. It has raised $29 million in seed funding from investors including Tinder co-founder Justin Mateen, former SoftBank Vision Fund partner Javier Villamizar, and early investors in Anthropic, OpenAI, Stripe, and Brex. The New Stack reported that the raise values the company at $500 million. The numbers Subquadratic is publishing are extraordinary. The reaction from the AI research community has been, to put it mildly, mixed -- ranging from genuine curiosity to open accusations of vaporware. Understanding why requires understanding what the company claims to have solved, and why so many prior attempts to solve the same problem have fallen short. The quadratic scaling problem has shaped the economics of the entire AI industry Every transformer-based AI model -- which includes virtually every frontier system from OpenAI, Anthropic, Google, and others -- relies on an operation called "attention." Every token is compared against every other token, so as inputs grow, the number of interactions -- and the compute required to process them -- scales quadratically. In plain terms: double the input size, and the cost doesn't double. It quadruples. This relationship has shaped what gets built and what doesn't. The industry standard is 128,000 tokens for many AI models and up to 1 million tokens for frontier cloud models such as Claude Sonnet 4.7 and Gemini 3.1 Pro. Even at those sizes, the cost of processing long inputs becomes punishing. The industry built an elaborate stack of workarounds to cope. RAG systems use a search engine to pull a small number of relevant results before sending them to the model, because sending the full corpus isn't feasible. Developers layer retrieval pipelines, chunking strategies, prompt engineering techniques, and multi-agent orchestration systems on top of models -- all to route around the fundamental constraint that the model itself can't efficiently process everything at once. Subquadratic's argument is that these workarounds are expensive, brittle, and ultimately limiting. As CTO Alexander Whedon told SiliconANGLE in an interview, "I used to manually curate prompts and retrieval systems and evals and conditional logic to chain together the workflows. And I think that that is kind of a waste of human intelligence and also limiting to the product quality." Subquadratic's fix is deceptively simple: stop doing the math that doesn't matter The company's approach, called Subquadratic Sparse Attention or SSA, is built on a straightforward premise: most of the token-to-token comparisons in standard attention are wasted compute. Instead of comparing every token to every other token, SSA learns to identify which comparisons actually matter and computes attention only over those positions. Crucially, the selection is content-dependent -- the model decides where to look based on meaning, not on fixed positional patterns. This allows it to retrieve specific information from arbitrary positions across a very long context without paying the quadratic tax. The practical payoff scales with context length -- exactly the inverse of the problem it's trying to solve. According to the company's technical blog, SSA achieves a 7.2x prefill speedup over dense attention at 128,000 tokens, rising to 52.2x at 1 million tokens. As Whedon put it: "If you double the input size with quadratic scaling laws, you need four times the compute; with linear scaling laws, you need just twice." The company says it trained the model in three stages -- pretraining, supervised fine-tuning, and a reinforcement learning stage specifically targeting long-context retrieval failures -- teaching the model to aggressively use distant context rather than defaulting to nearby information, a subtle failure mode that quietly degrades performance in existing systems. Three benchmarks paint a strong picture, but what they leave out may matter more On the surface, SubQ's benchmark numbers are competitive with or superior to models built by organizations spending billions of dollars. On SWE-Bench Verified, it scored 81.8% compared to Opus 4.6's 80.8% and DeepSeek 4.0 Pro's 80.0%. On RULER at 128,000 tokens, a standard benchmark for reasoning over extended inputs, SubQ scored 95% -- edging out Claude Opus 4.6 at 94.8%. On MRCR v2, a demanding test of multi-hop retrieval across long contexts, SubQ posted a third-party verified score of 65.9%, compared with Claude Opus 4.7 at 32.2%, GPT-5.5 at 74%, and Gemini 3.1 Pro at 26.3%. But several details warrant scrutiny. The benchmark selection is narrow -- exactly three tests, all emphasizing long-context retrieval and coding, the precise tasks SubQ is designed for. Broader evaluations across general reasoning, math, multilingual performance, and safety have not been published. The company says a comprehensive model card is "coming soon." According to The New Stack, each benchmark model was run only once due to high inference cost, and the SWE-Bench margin is, as the company's own paper acknowledges, "harness as much as model." In benchmark methodology, single runs without confidence intervals leave room for variance. There is also a significant gap between SubQ's research results and its production model. On MRCR v2, the company reported a research score of 83 -- but the third-party verified production model scored 65.9. That 17-point gap between the lab result and the shipping product is notable and largely unexplained. Subquadratic also told SiliconANGLE that on the RULER 128K benchmark, SubQ scored 95% accuracy at a cost of $8, compared with 94% accuracy and about $2,600 for Claude Opus -- a remarkable cost claim. But the company has not publicly disclosed specific API pricing, making it impossible to independently verify the cost-per-task comparisons. The AI research community's verdict ranges from 'genuine breakthrough' to 'AI Theranos' Within hours of the announcement, the AI research community erupted into a debate that crystallized around a single question: Is this real? AI commentator Dan McAteer captured the binary mood in a widely shared post: "SubQ is either the biggest breakthrough since the Transformer... or it's AI Theranos." The comparison to the infamous blood-testing fraud company may be unfair, but it reflects the scale of the claims being made. Skeptics zeroed in on several pressure points. Prominent AI engineer Will Depue initially noted that SubQ is "almost surely a sparse attention finetune of Kimi or DeepSeek," referring to existing open-source models. Whedon confirmed this on X, writing that the company is "using weights from open-source models as a starting point, as a function of our funding and maturity as a company." Depue later escalated his criticism, writing that the company's O(n) scaling claims and the speedup numbers "don't seem to line up" and called the communication "either incredibly poorly communicated or just not real." Others raised structural questions. One developer noted that if SubQ truly reduces compute by 1,000x and costs less than 5% of Opus, the company should have no trouble serving it at scale -- so why gate access through an early-access program? Developer Stepan Goncharov called the benchmarks "very interesting cherry-picked benchmarks," while another commenter described them as "suspiciously perfect." But not everyone was dismissive. AI researcher John Rysana pushed back on the Theranos framing, writing that the work is "just subquadratic attention done well which is very meaningful for long context workloads," and that "odds of it being BS are extremely low." Linus Ekenstam, a tech commentator, said he was "extremely intrigued to see the real-world implications" particularly for complex AI-powered software. Magic.dev made strikingly similar claims two years ago -- and then went quiet Perhaps the most pointed critique of SubQ's launch comes not from its specific claims but from recent history. Magic.dev announced a 100-million-token context-window model in August 2024, with a claimed 1,000x efficiency advantage, and raised roughly $500 million on the strength of those claims. As of early 2026, there is no public evidence of LTM-2-mini being used outside Magic. The parallels are uncomfortable. Both companies claimed massive context windows. Both touted roughly 1,000x efficiency gains. Both targeted software engineering as their primary use case. And both launched with limited external access. The broader research landscape reinforces the caution. Kimi Linear, DeepSeek Sparse Attention, Mamba, and RWKV all promised subquadratic scaling, and all faced the same problem: architectures that achieve linear complexity in theory often underperform quadratic attention on downstream benchmarks at frontier scale, or they end up hybrid -- mixing subquadratic layers with standard attention and losing the pure scaling benefits. A widely cited LessWrong analysis argued that these approaches "are all better thought of as 'incremental improvement number 93595 to the transformer architecture'" because practical implementations remain quadratic and "only improve attention by a constant factor." Subquadratic is directly aware of this history. Its own technical blog specifically addresses each prior approach -- fixed-pattern sparse attention, state space models, hybrid architectures, and DeepSeek Sparse Attention -- and argues that SSA avoids their tradeoffs. Whether it actually does remains an empirical question that only independent evaluation can settle. A five-time founder, a former Meta engineer, and $29 million to prove the doubters wrong The team behind the claims matters in evaluating them. CEO Justin Dangel is a five-time founder and CEO with a track record across health tech, insurancetech, and consumer goods, and his companies have scaled to hundreds of employees, attracted institutional backing, and reached liquidity. CTO Alexander Whedon previously worked as a software engineer at Meta and served as Head of Generative AI at TribeAI, where he led over 40 enterprise AI implementations. The team includes 11 PhD researchers with backgrounds from Meta, Google, Oxford, Cambridge, ByteDance, and Adobe. That is a credible collection of talent for an architecture-level research effort. But neither co-founder has published foundational AI research, and the company has not yet released a peer-reviewed paper. The technical report is listed as "coming soon." The funding profile is unusual for a company making frontier AI claims. Subquadratic raised $29 million at a reported $500 million valuation -- a steep price for a seed-stage company with no publicly available model, no peer-reviewed research, and no disclosed revenue. The investor base, led by Tinder co-founder Mateen and former SoftBank partner Villamizar, skews toward consumer tech and growth investing rather than deep technical AI research. The company is not open-sourcing its weights but plans to offer training tools for enterprises to do their own post-training, and has set a 50-million-token context window target for Q4. The real test for SubQ isn't benchmarks -- it's whether the math survives independent scrutiny Strip away the marketing language and the social media drama, and the underlying question Subquadratic is asking is genuinely important: Can AI systems break free of quadratic scaling without sacrificing the quality that makes them useful? The stakes are enormous. If attention can be made truly linear without degrading retrieval and reasoning, the economics of AI shift fundamentally. Enterprise applications that today require elaborate retrieval pipelines -- processing entire codebases, contracts, regulatory filings, medical records -- become single-pass operations. The billions of dollars currently spent on RAG infrastructure, context management, and agentic orchestration become partially redundant. Whedon's willingness to engage publicly with technical criticism -- posting a technical blog within hours of pushback -- suggests a team that understands it needs to show its work, not just describe it. And to its credit, the company acknowledged openly that it builds on open-source foundations and that its model is smaller than those at the major labs. Every frontier model in 2026 advertises a context window of at least a million tokens, but almost none of them are actually great at making use of all that information. The gap between a nominal context window and a functional one -- between what a model accepts and what it reliably reasons over -- remains one of the most important unsolved problems in AI. Subquadratic says it has closed that gap. If independent evaluation confirms that claim, the implications would ripple far beyond a single startup's valuation. If it doesn't, the company joins a growing list of long-context promises that sounded revolutionary on launch day and unremarkable six months later. In computing, every fundamental constraint eventually falls. When it does, the breakthrough never comes from the direction the industry expected. The question hanging over Subquadratic is whether a team of 11 PhDs and a $29 million seed round actually found the answer that has eluded organizations spending thousands of times more -- or whether they just found a better way to describe the problem.
[2]
Subquadratic launches with $29M to bring 12M-token context windows to AI - SiliconANGLE
Subquadratic launches with $29M to bring 12M-token context windows to AI Subquadratic, a company developing a novel generative artificial intelligence model, launched today with $29 million in seed funding. The new large language model, dubbed SubQ, uses what the company calls a subquadratic architecture that greatly increases the context window -- how much information the AI can read at once -- without significantly increasing the amount of compute it requires. The company also says it outperforms other state-of-the-art models on speed and accuracy. Many AI systems today are built around the limits of context windows. For example, the industry standard is 128,000 tokens for many AI models and up to 1 million tokens for frontier cloud models such as Claude Sonnet 4.7 and Gemini 3.1 Pro. SubQ can manage a context window of up to 12 million tokens, maintain accuracy, increase speed and reduce compute cost. At that length, it would be around 9 million words, or almost 120 books. To reach this context size, Subquadratic needed to create a model that could handle that much data without breaking the "compute bank." To do that, co-founders Justin Dangel, who's chief executive, and Chief Technology Officer Alexander Whedon told SiliconANGLE in an interview, the company settled on a proprietary transformer architecture that implements sparse attention. "We are very focused on the problem of how we transition from a dense attention, quadratic scaling architecture to a sparse attention linear architecture," Dangel said. "Sparse attention is an effort to say, hey, let's try to figure out how to not compare every token to every token to every token." The "T" in ChatGPT stands for "transformer," which is the type of generative AI model architecture under the hood. It's not necessary to understand what that is, just that it's essentially the engine of the LLM that provides its power to contextualize language. Traditional transformer models use dense attention, meaning the model compares every token in a prompt with every other token. That becomes expensive very quickly: Boubling the input does not just double the work; it roughly quadruples the number of token-to-token relationships the model has to consider. That is the "quadratic" scaling problem Subquadratic is targeting. Attention is what transformers use to "understand" how a prompt operates by comparing words (broken up into tokens) to one another. The same way humans know that, in the sentence "The cat is in the room," the words "cat" and "room" relate to one another, an LLM compares words in a sentence to understand their relationships. With dense attention, every word is compared to every other word. The more words there are, the more comparisons the model needs to make. "If you double the input size with quadratic scaling laws, you need four times to compute; with linear scaling laws, you need just twice," Whedon explained. According to Subquadratic, SubQ is more than 50 times faster and 50 times less expensive than leading frontier models at 1 million tokens, while maintaining higher accuracy. At its full 12 million-token context window, the company says, the model reduces compute requirements by almost 1,000 times compared with other frontier models. On the RULER 128K long-context benchmark, Subquadratic said SubQ scored 95% accuracy at a cost of $8, compared with 94% accuracy and about $2,600 for Claude Opus, representing roughly a 300-times reduction in cost. Currently, the data view of LLMs is limited to a maximum of 1 million tokens for most state-of-the-art models, and even that can be difficult to use because of compute constraints. To handle this, developers carefully curate the data that goes into the context window using systems such as retrieval-augmented generation, or RAG, and agentic retrieval systems to manage data flow. These systems necessarily add latency and computational overhead, and can bias the information fed to the LLM. "I used to manually curate prompts and retrieval systems and evals and conditional logic to chain together the workflows," Whedon said. "And I think that that is kind of a waste of human intelligence and also limiting to the product quality." Subquadratic's vision is that AI is being constrained by the cost curve of dense-attention transformers. The company argues that once the architecture moves from quadratic to linear scaling, developers can build products that were previously too slow, too expensive or too reliant on brittle data curation. To tackle this, the company is launching the SubQ application programming interface, making it available to developers and enterprise teams that need access to the full 12 million-token context window. It is also launching SubQ Code, a command-line interface coding agent designed to load entire codebases into a single context window, so developers can plan, execute and review across a repository without coordinating multiple agents. Dangel also described a search product that will initially be free, suggesting a land-and-expand strategy around long-context research, coding and enterprise workloads. He added that the model will not be open-weight or open-source in the near term, but will be trainable for customer-specific use cases. The funding round was joined by investors including Javier Villamizar, former partner at SoftBank Vision Fund, and Justin Mateen, co-founder of Tinder and founder of JAM fund, alongside early investors in Anthropic PBC, OpenAI Group PBC, Stripe Inc. and Brex Inc. "The fundamental scaling laws imposed by the transformer architecture and dense attention have been broken through," Dangel concluded.
Share
Copy Link
Miami-based Subquadratic emerged from stealth with $29 million in seed funding, claiming its SubQ model is the first large language model built on fully subquadratic architecture. The company says it reduces compute by almost 1,000 times at 12 million tokens compared to frontier models, but AI researchers are demanding independent proof of the extraordinary claims before validating what could be a major inflection point in how AI systems scale.
A Miami-based startup called Subquadratic emerged from stealth on Tuesday with a claim that has sent ripples through the AI research community: it has built the first large language model to fully escape the mathematical constraint that has defined every major AI system since 2017
1
. The company's SubQ 1M-Preview model is built on what it calls a fully subquadratic architecture, where compute grows linearly with context length rather than quadratically2
. At 12 million tokens, the company claims its architecture reduces attention compute by almost 1,000 times compared to other frontier models.The startup announced it has raised $29 million in seed funding from investors including Tinder co-founder Justin Mateen, former SoftBank Vision Fund partner Javier Villamizar, and early investors in Anthropic, OpenAI, Stripe, and Brex
1
. The New Stack reported the raise values the company at $500 million. Co-founders Justin Dangel, who serves as chief executive, and Chief Technology Officer Alexander Whedon are leading the effort to commercialize what they describe as a fundamental shift in how AI processes information.
Source: VentureBeat
Every transformer-based AI model, which includes virtually every frontier system from OpenAI, Anthropic, and Google, relies on an operation called "attention"
1
. In traditional transformer architecture, every token is compared against every other token, so as inputs grow, the number of interactions scales quadratically. Double the input size, and the cost doesn't double—it quadruples. This quadratic scaling problem has shaped the economics of the entire AI industry, dictating what gets built and what doesn't.The industry standard context window is 128,000 tokens for many AI models and up to 1 million tokens for frontier cloud models such as Claude Sonnet 4.7 and Gemini 3.1 Pro
2
. Even at those sizes, the compute cost of processing long inputs becomes punishing. The industry has built an elaborate stack of workarounds to cope, including RAG systems that use search to pull relevant results before sending them to the model, because sending the full corpus isn't feasible. Developers layer retrieval pipelines, chunking strategies, prompts engineering, and multi-agent workflows on top of models—all to route around the fundamental constraint that the model can't efficiently process everything at once.
Source: SiliconANGLE
Subquadratic's approach, called Subquadratic Sparse Attention or SSA, is built on a straightforward premise: most token-to-token comparisons in standard attention are wasted compute
1
. Instead of comparing every token to every other token with dense attention, SSA learns to identify which comparisons actually matter and computes attention only over those positions. The selection is content-dependent—the model decides where to look based on meaning, not on fixed positional patterns."If you double the input size with quadratic scaling laws, you need four times the compute; with linear scaling laws, you need just twice," Whedon explained to SiliconANGLE
2
. According to the company's technical blog, SSA achieves a 7.2x prefill speedup over dense attention at 128,000 tokens, rising to 52.2x at 1 million tokens. The practical payoff scales with context length—exactly the inverse of the problem it's trying to solve.According to Subquadratic, SubQ is more than 50 times faster and 50 times less expensive than leading frontier models at 1 million tokens, while maintaining higher accuracy
2
. On the RULER 128K long-context benchmarks, the company said SubQ scored 95% accuracy at a cost of $8, compared with 94% accuracy and about $2,600 for Claude Opus—representing roughly a 300-times reduction in cost. At its full 12 million-token context window, which would be around 9 million words or almost 120 books, the model reduces compute requirements by almost 1,000 times compared with other frontier models.The company trained the model in three stages: pretraining, supervised fine-tuning, and a reinforcement learning stage specifically targeting long-context retrieval failures
1
. This teaches the model to aggressively use distant context rather than defaulting to nearby information, a subtle failure mode that quietly degrades performance in existing systems.Related Stories
Subquadratic is launching three products into private beta. The SubQ API exposes the full 12 million tokens context window to developers and enterprise teams
2
. SubQ Code is a command-line coding agent designed to load entire codebases into a single context window, so developers can plan, execute, and review across a repository without coordinating multiple agents. The company is also launching SubQ Search, a search tool that leverages the extended context capabilities."I used to manually curate prompts and retrieval systems and evals and conditional logic to chain together the workflows," Whedon told SiliconANGLE
2
. "And I think that that is kind of a waste of human intelligence and also limiting to the product quality." Subquadratic's vision is that AI is being constrained by the cost curve of dense-attention transformers, and that moving from quadratic to linear scaling will enable developers to build products that were previously too slow, too expensive, or too reliant on brittle data curation.The reaction from the AI research community has been mixed, ranging from genuine curiosity to open accusations of vaporware
1
. The numbers Subquadratic is publishing are extraordinary, and if validated independently, would dwarf the efficiency gains of any existing approach. Researchers are calling for independent proof before accepting what could represent a genuine inflection point in how AI systems scale. What remains to be seen is whether the company's benchmarks hold up under scrutiny and whether the approach can maintain quality across the diverse range of tasks that define state-of-the-art large language model performance.Summarized by
Navi
[1]
31 Jan 2025•Technology

13 Nov 2024•Technology

28 Jan 2025•Technology

1
Science and Research

2
Technology

3
Technology
