2 Sources
2 Sources
[1]
The Math on AI Agents Doesn't Add Up
The big AI companies promised us that 2025 would be "the year of the AI agents." It turned out to be the year of talking about AI agents, and kicking the can for that transformational moment to 2026 or maybe later. But what if the answer to the question "When will our lives be fully automated by generative AI robots that perform our tasks for us and basically run the world?" is, like that New Yorker cartoon, "How about never?" That was basically the message of a paper published without much fanfare some months ago, smack in the middle of the overhyped year of "agentic AI." Entitled "Hallucination Stations: On Some Basic Limitations of Transformer-Based Language Models," it purports to mathematically show that "LLMs are incapable of carrying out computational and agentic tasks beyond a certain complexity." Though the science is beyond me, the authors -- a former SAP CTO who studied AI under one of the field's founding intellects, John McCarthy, and his teenage prodigy son -- punctured the vision of agentic paradise with the certainty of mathematics. Even reasoning models that go beyond the pure word-prediction process of LLMs, they say, won't fix the problem. "There is no way they can be reliable," Vishal Sikka, the dad, tells me. After a career that, in addition to SAP, included a stint as Infosys CEO and an Oracle board member, he currently heads an AI services startup called Vianai. "So we should forget about AI agents running nuclear power plants?" I ask. "Exactly," he says. Maybe you can get it to file some papers or something to save time, but you might have to resign yourself to some mistakes. The AI industry begs to differ. For one thing, a big success in agent AI has been coding, which took off last year. Just this week at Davos, Google's Nobel-winning head of AI, Demis Hassabis, reported breakthroughs in minimizing hallucinations, and hyperscalers and startups alike are pushing the agent narrative. Now they have some backup. A startup called Harmonic is reporting a breakthrough in AI coding that also hinges on mathematics -- and tops benchmarks on reliability. Harmonic, which was cofounded by Robinhood CEO Vlad Tenev and Tudor Achim, a Stanford-trained mathematician, claims this recent improvement to its product called Aristotle (no hubris there!) is an indication that there are ways to guarantee the trustworthiness of AI systems. "Are we doomed to be in a world where AI just generates slop and humans can't really check it? That would be a crazy world," says Achim. Harmonic's solution is to use formal methods of mathematical reasoning to verify an LLM's output. Specifically, it encodes outputs in the Lean programming language, which is known for its ability to verify the coding. To be sure, Harmonic's focus to date has been narrow -- its key mission is the pursuit of "mathematical superintelligence," and coding is a somewhat organic extension. Things like history essays -- which can't be mathematically verified -- are beyond its boundaries. For now. Nonetheless, Achim doesn't seem to think that reliable agentic behavior is as much an issue as some critics believe. "I would say that most models at this point have the level of pure intelligence required to reason through booking a travel itinerary," he says. Both sides are right -- or maybe even on the same side. On one hand, everyone agrees that hallucinations will continue to be a vexing reality. In a paper published last September, OpenAI scientists wrote, "Despite significant progress, hallucinations continue to plague the field, and are still present in the latest models." They proved that unhappy claim by asking three models, including ChatGPT, to provide the title of the lead author's dissertation. All three made up fake titles and all misreported the year of publication. In a blog about the paper, OpenAI glumly stated that in AI models, "accuracy will never reach 100 percent."
[2]
AI Agents Are Poised to Hit a Mathematical Wall, Study Finds
The underlying technology behind most of the widely available artificial intelligence models is large language models, a form of machine learning and language processing. The bet that most AI companies are making is that LLMs, if fed enough data, will achieve something like full autonomy to think and function in ways similar to humans -- but with even more collective knowledge. It turns out, betting on infinite growth might not have great odds of paying off. A new study claims to show mathematical proof that "LLMs are incapable of carrying out computational and agentic tasks beyond a certain complexity." The paper, published by father and son researchers Vishal Sikka and Varin Sikka and surfaced recently by Wired after its initial publication flew under the radar, has a pretty simple conclusion, though there's quite a bit of complicated math to reach it. Distilled as simply as possible, it reasons that certain prompts or tasks provided to an LLM will require a more complex computation than what the model is capable of processing, and when that happens, the model will either fail to complete the requested action or will incorrectly carry out the task. The basic premise of the research really pours some cold water on the idea that agentic AI, models that are able to be given multi-step tasks that are completed completely autonomously without human supervision, will be the vehicle for achieving artificial general intelligence. That's not to say that the technology doesn't have a function or won't improve, but it does place a much lower ceiling on what is possible than what AI companies would like to acknowledge when giving a "sky is the limit" pitch. The researchers aren't the first to suggest LLMs may not be all they're cracked up to be, though their research does put real math behind the sense that many AI skeptics have expressed. Last year, researchers at Apple published a paper that concluded that LLMs are not capable of actual reasoning or thinking, despite creating the appearance of doing so. Benjamin Riley, founder of the company Cognitive Resonance, wrote last year that because of how LLMs work, they will never truly achieve what we consider to be "intelligence." Other studies have tested the limits of LLM-powered AI models to see if they are capable of producing novel creative outputs, with pretty uninspiring results. But if none of that is convincing and elaborate mathematical equations are more your thing, then the study from the Sikkas may be the proof you need. All of it is part of a mounting body of evidence that suggests that whatever AI may be capable of in its current form, it almost certainly won't be the technology that will surpass human intelligence by the end of this year, as Elon Musk recently claimed.
Share
Share
Copy Link
A groundbreaking study by researchers Vishal Sikka and Varin Sikka mathematically proves that AI agents cannot perform computational tasks beyond a certain complexity threshold. The research challenges industry promises that 2025 would be the year of agentic AI, suggesting fundamental LLM limitations may prevent the autonomous future tech companies envision.
The ambitious promises from major AI companies that 2025 would mark "the year of AI agents" have collided with a sobering mathematical reality
1
. A research paper titled "Hallucination Stations: On Some Basic Limitations of Transformer-Based Language Models" presents mathematical evidence that AI agents face fundamental constraints, demonstrating that LLM limitations prevent these systems from reliably executing computational tasks beyond a specific complexity threshold2
. The study, authored by Vishal Sikka—a former SAP CTO who studied under AI pioneer John McCarthy—and his son Varin Sikka, argues that large language models are "incapable of carrying out computational and agentic tasks beyond a certain complexity"1
.
Source: Gizmodo
The research reveals that certain prompts or tasks provided to an LLM will require more complex computation than what the model can process
2
. When this happens, the model either fails to complete the requested action or incorrectly carries out the task. "There is no way they can be reliable," Sikka told Wired, effectively ruling out scenarios where AI agents would run critical infrastructure like nuclear power plants1
. This mathematical limitation places a significantly lower ceiling on what's possible with current AI technology than what companies acknowledge when pitching limitless potential. The findings cast doubt on whether agentic AI systems—models designed to complete multi-step tasks autonomously without human supervision—can serve as the pathway to Artificial General Intelligence (AGI)2
.The trustworthiness concerns extend beyond theoretical mathematics. AI hallucinations continue to plague even the most advanced models, undermining AI reliability in practical applications. OpenAI scientists acknowledged in a September paper that "despite significant progress, hallucinations continue to plague the field, and are still present in the latest models"
1
. To demonstrate this persistent problem, researchers asked three models, including ChatGPT, to provide the title of a lead author's dissertation. All three fabricated fake titles and misreported the publication year. OpenAI's own assessment concluded that AI accuracy "will never reach 100 percent"1
. Even reasoning models that extend beyond pure word-prediction processes won't resolve these fundamental issues, according to the Sikka research.
Source: Wired
Related Stories
The AI industry hasn't accepted these mathematical limitations of AI without resistance. Google's Demis Hassabis, a Nobel Prize winner and head of AI, reported breakthroughs in minimizing hallucinations at Davos, while hyperscalers and startups continue advancing the agent narrative
1
. Startup Harmonic, cofounded by Robinhood CEO Vlad Tenev and Stanford-trained mathematician Tudor Achim, claims to have achieved a breakthrough using formal mathematical verification. Their product Aristotle employs formal methods of mathematical reasoning to verify LLM outputs, specifically encoding results in the Lean programming language known for its verification capabilities1
. However, this approach currently works only in narrow domains like coding and mathematical tasks, leaving broader applications like history essays beyond its boundaries.The mounting evidence suggests a more modest trajectory for AI development than industry hype indicates. Researchers at Apple previously concluded that LLMs aren't capable of actual reasoning ability despite creating that appearance
2
. Benjamin Riley, founder of Cognitive Resonance, argued that because of how LLMs fundamentally work, they will never truly achieve what we consider intelligence2
. The Sikka study adds rigorous mathematical backing to what many AI skeptics have sensed. This body of research makes claims like Elon Musk's prediction that AI will surpass human intelligence by year's end seem increasingly improbable2
. While AI agents may still handle specific tasks like filing papers or booking travel itineraries, users should expect mistakes and understand that fully autonomous systems managing complex, critical operations remain out of reach with current transformer-based architectures.Summarized by
Navi
[1]
13 Oct 2024•Science and Research

11 Nov 2024•Technology

09 Jun 2025•Science and Research

1
Policy and Regulation

2
Technology

3
Technology
