5 Sources
5 Sources
[1]
The Math on AI Agents Doesn't Add Up
The big AI companies promised us that 2025 would be "the year of the AI agents." It turned out to be the year of talking about AI agents, and kicking the can for that transformational moment to 2026 or maybe later. But what if the answer to the question "When will our lives be fully automated by generative AI robots that perform our tasks for us and basically run the world?" is, like that New Yorker cartoon, "How about never?" That was basically the message of a paper published without much fanfare some months ago, smack in the middle of the overhyped year of "agentic AI." Entitled "Hallucination Stations: On Some Basic Limitations of Transformer-Based Language Models," it purports to mathematically show that "LLMs are incapable of carrying out computational and agentic tasks beyond a certain complexity." Though the science is beyond me, the authors -- a former SAP CTO who studied AI under one of the field's founding intellects, John McCarthy, and his teenage prodigy son -- punctured the vision of agentic paradise with the certainty of mathematics. Even reasoning models that go beyond the pure word-prediction process of LLMs, they say, won't fix the problem. "There is no way they can be reliable," Vishal Sikka, the dad, tells me. After a career that, in addition to SAP, included a stint as Infosys CEO and an Oracle board member, he currently heads an AI services startup called Vianai. "So we should forget about AI agents running nuclear power plants?" I ask. "Exactly," he says. Maybe you can get it to file some papers or something to save time, but you might have to resign yourself to some mistakes. The AI industry begs to differ. For one thing, a big success in agent AI has been coding, which took off last year. Just this week at Davos, Google's Nobel-winning head of AI, Demis Hassabis, reported breakthroughs in minimizing hallucinations, and hyperscalers and startups alike are pushing the agent narrative. Now they have some backup. A startup called Harmonic is reporting a breakthrough in AI coding that also hinges on mathematics -- and tops benchmarks on reliability. Harmonic, which was cofounded by Robinhood CEO Vlad Tenev and Tudor Achim, a Stanford-trained mathematician, claims this recent improvement to its product called Aristotle (no hubris there!) is an indication that there are ways to guarantee the trustworthiness of AI systems. "Are we doomed to be in a world where AI just generates slop and humans can't really check it? That would be a crazy world," says Achim. Harmonic's solution is to use formal methods of mathematical reasoning to verify an LLM's output. Specifically, it encodes outputs in the Lean programming language, which is known for its ability to verify the coding. To be sure, Harmonic's focus to date has been narrow -- its key mission is the pursuit of "mathematical superintelligence," and coding is a somewhat organic extension. Things like history essays -- which can't be mathematically verified -- are beyond its boundaries. For now. Nonetheless, Achim doesn't seem to think that reliable agentic behavior is as much an issue as some critics believe. "I would say that most models at this point have the level of pure intelligence required to reason through booking a travel itinerary," he says. Both sides are right -- or maybe even on the same side. On one hand, everyone agrees that hallucinations will continue to be a vexing reality. In a paper published last September, OpenAI scientists wrote, "Despite significant progress, hallucinations continue to plague the field, and are still present in the latest models." They proved that unhappy claim by asking three models, including ChatGPT, to provide the title of the lead author's dissertation. All three made up fake titles and all misreported the year of publication. In a blog about the paper, OpenAI glumly stated that in AI models, "accuracy will never reach 100 percent."
[2]
Keep it simple, stupid: Agentic AI tools choke on complexity
Agents may be the next big thing in AI, but they have limits beyond which they will make mistakes, so exercise extreme caution, a recent research paper says. According to a definition by IBM, agentic AI consists of software agents that mimic human decision-making to solve problems in real time, and this builds on generative AI techniques by using large language models (LLMs) to function in dynamic environments. But while the industry hype machine pushes agentic AI as the next big thing, potential adopters should be wary, as the paper, "Hallucination Stations: On Some Basic Limitations of Transformer-Based Language Models" [PDF] argues that LLMs are incapable of carrying out computational and agentic tasks beyond a certain complexity level, above which they will deliver incorrect responses. The paper uses mathematical reasoning to show that if a prompt to an LLM specifies a computational task whose complexity is higher than that of the LLM's own core operations, then the LLM will in general respond incorrectly. Essentially, the argument of the authors is that it is possible to present an LLM with an input specifying a task that requires more calculations than it is capable of performing. This has relevance to agentic AI because there is considerable interest of late in the technology's potential role in automating various tasks across a range of applications, from those that simply involve providing information, to others that have real-world effects, such as making financial transactions or controlling and managing industrial equipment. Furthermore, the paper claims to show that deploying agents to verify the correctness of another agent's solution for a given task will also fail for the same reasons, because verification of a task is often more complex than the task itself. "We believe this case to be especially pertinent since one of the most prevalent applications of LLMs is to write and verify software," the authors state. The paper, which was published last year but seems to have gone largely unnoticed until flagged by tech publication Wired, was written by Varin Sikka and Vishal Sikka. The latter was formerly CTO of SAP and CEO of Infosys, and currently the founder of AI company Vianai Systems. The conclusion the paper reaches is that "despite their obvious power and applicability in various domains, extreme care must be used before applying LLMs to problems or use cases that require accuracy, or solving problems of non-trivial complexity." In other words, this doesn't mean that AI agents will necessarily be a disaster, but anyone developing and deploying such solutions needs to be mindful of whether assigned tasks exceed the underlying model's effective complexity limits. As The Register reported recently, for example, scientists at the US Department of Energy's Sandia National Labs made use of AI assistants to develop a novel approach for steering LED light, showing that the technology has promise. However, the risks posed by such agents are top of mind for many top execs, even featuring in a panel discussion on cyber threats at the WEF in Davos recently. Last year, research firm Gartner even forecast that more than 40 percent of agentic AI projects are set to be cancelled by the end of 2027, citing reasons including escalating costs, unclear business value, and insufficient risk controls. Work on mitigating these unfortunate limitations of LLMs is ongoing, the paper notes, with approaches including composite systems and constraining the models. ®
[3]
AI Agents Are Poised to Hit a Mathematical Wall, Study Finds
The underlying technology behind most of the widely available artificial intelligence models is large language models, a form of machine learning and language processing. The bet that most AI companies are making is that LLMs, if fed enough data, will achieve something like full autonomy to think and function in ways similar to humans -- but with even more collective knowledge. It turns out, betting on infinite growth might not have great odds of paying off. A new study claims to show mathematical proof that "LLMs are incapable of carrying out computational and agentic tasks beyond a certain complexity." The paper, published by father and son researchers Vishal Sikka and Varin Sikka and surfaced recently by Wired after its initial publication flew under the radar, has a pretty simple conclusion, though there's quite a bit of complicated math to reach it. Distilled as simply as possible, it reasons that certain prompts or tasks provided to an LLM will require a more complex computation than what the model is capable of processing, and when that happens, the model will either fail to complete the requested action or will incorrectly carry out the task. The basic premise of the research really pours some cold water on the idea that agentic AI, models that are able to be given multi-step tasks that are completed completely autonomously without human supervision, will be the vehicle for achieving artificial general intelligence. That's not to say that the technology doesn't have a function or won't improve, but it does place a much lower ceiling on what is possible than what AI companies would like to acknowledge when giving a "sky is the limit" pitch. The researchers aren't the first to suggest LLMs may not be all they're cracked up to be, though their research does put real math behind the sense that many AI skeptics have expressed. Last year, researchers at Apple published a paper that concluded that LLMs are not capable of actual reasoning or thinking, despite creating the appearance of doing so. Benjamin Riley, founder of the company Cognitive Resonance, wrote last year that because of how LLMs work, they will never truly achieve what we consider to be "intelligence." Other studies have tested the limits of LLM-powered AI models to see if they are capable of producing novel creative outputs, with pretty uninspiring results. But if none of that is convincing and elaborate mathematical equations are more your thing, then the study from the Sikkas may be the proof you need. All of it is part of a mounting body of evidence that suggests that whatever AI may be capable of in its current form, it almost certainly won't be the technology that will surpass human intelligence by the end of this year, as Elon Musk recently claimed.
[4]
AI Agents Are Mathematically Incapable of Doing Functional Work, Paper Finds
A months-old but until now overlooked study recently featured in Wired claims to mathematically prove that large language models "are incapable of carrying out computational and agentic tasks beyond a certain complexity" -- that level of complexity being, crucially, pretty low. The paper, which has not been peer reviewed, was written by Vishal Sikka, a former CTO at the German software giant SAP, and his son Varin Sikka. Sikka senior knows a thing or two about AI: he studied under John McCarthy, the Turing Award-winning computer scientist who literally founded the entire field of artificial intelligence, and in fact helped coin the very term. "There is no way they can be reliable," Vishal Sikka told Wired. When asked by the interviewer, Sikka also agreed that we should forget about AI agents running nuclear power plants and other strident promises thrown around by AI boosters. Ignore the rhetoric that tech CEOs spew onstage and pay attention to what the researchers that work for them are finding, and you'll find that even the AI industry agrees that the tech has some fundamental limitations baked into its architecture. In September, for example, OpenAI scientists admitted that AI hallucinations, in which LLMs confidently make up facts, were still a pervasive problem even in increasingly advanced systems, and that model accuracy would "never" reach 100 percent. That would seemingly put a big dent in the feasibility of so-called AI agents, which are models designed to autonomously carry out tasks without human intervention, and which the industry universally decided last year would be its next big thing. Some companies that embraced AI agents to downsize their workforces quickly realized that the agents they weren't anywhere near good enough to replace the outgoing humans, perhaps because they hallucinated so much and could barely complete any of the tasks given to them. AI leaders insist that stronger guardrails external to the AI models can filter out the hallucinations. They may always be prone to hallucinating, but if these slip-ups are rare enough, then eventually companies will trust them to start doing tasks that they once entrusted to flesh and blood grunts. In the same paper that OpenAI researchers conceded that the models would never reach perfect accuracy, they also dismissed the idea that hallucinations are "inevitable," because LLMs "can abstain when uncertain." (Despite that, you'd be hard-pressed to find a single popular chatbot that actually does that, almost certainly because it would make the chatbots seem less impressive and less engaging to use.) Even though he's adamant LLMs have a hard ceiling, Sikka agrees with figures in the AI industry who insist that hallucinations can be reined in. "Our paper is saying that a pure LLM has this inherent limitation -- but at the same time it is true that you can build components around LLMs that overcome those limitations," he told Wired.
[5]
AI agents can't do complex tasks, claims ex-Infosys CEO
Sikka claims LLMs need external guardrails to do serious work As we get dazzled by Claude Cowork's ability to orchestrate and automate aspects of our boring and mundane work, in a year when agentic AI is supposed to get better than what mere AI chatbots can accomplish, one paper by a familiar Indian-American techie is asking us to pause and reassess our enthusiasm. A quietly circulated but mathematically relevant paper - recently spotlighted in Wired - is threatening to derail the AI agents hype train we seem to be embarking upon, coldly reminding us that marketing buzzwords don't necessarily translate into actual capability. The core argument in the paper, which is titled Hallucination Stations: On Some Basic Limitations of Transformer-Based Language Models, is completely based on computational theory and limits of mathematics. The authors claim, with actual math rather than Silicon Valley marketing talk, that large language models (LLMs) - the very nuts and bolts powering ChatGPT, Gemini, Claude, and today's growing list of AI agents - are mathematically incapable of reliably executing computational or agentic tasks once those tasks cross a modest complexity threshold. From what I can gather, the study claims there are hard limits to what AI models can actually do, despite all their fancy tricks and biggest GPUs. And who's behind this paper? Ex-CEO of Infosys, Vishal Sikka. Vishal Sikka is also the former Chief Technology Officer of SAP and a bona-fide computer science heavyweight who once studied under John McCarthy, the godfather of artificial intelligence (the guy who coined the term "AI"). Sikka isn't some fringe commentator on the topic either, he's walked the halls of enterprise tech leadership, serving on Oracle's and BMW Group's board of directors, with an ongoing advisory role at the Stanford Institute of Human-Centered AI. Basically, Sikka has the credibility to back his words. Also read: AI and LLMs can get dumb with brain rot, thanks to the internet In their paper, Vishal Sikka and his teenage co-author son (who's studying at Stanford) aren't just lamenting hype of AI, but they're offering a structural critique into the transformer-based architectures at the heart of LLMs. The crux of their argument is that the current transformer-based agentic AI simply can't carry out computations or verify results reliably beyond a limited degree of complexity. That's not a bug, they argue, but it's baked into the very definition of how these models compute. Of course, this study is at odds with the industry that has spent much of last year heralding "the year of the AI agent" from 2025 into 2026, painting visions of autonomous digital workers replacing humans in workflows, kitchens, and possibly therapist chairs. But real-world deployments have shown what many engineers already knew: hallucinations aren't just occasional hiccups, they're a persistent part of the AI fabric. Even internal research from major labs admits that 100% perfect accuracy is a pipe dream - OpenAI is on record of this claim. So what does this mean for AI agents? Well, if the foundational model can't mathematically guarantee correctness on tasks above basic complexity, then an agent built on it can't either -- not without bolting on external systems to catch errors, enforce rules, and provide oversight. And that's precisely the nuance Sikka's paper emphasizes, that a pure LLM alone is capped, but supporting systems with external verification layers might still do useful work. In other words, the raw engine isn't going to drive your autonomous factory anytime soon. But with enough engineering around it - think of it as scaffolding, additional verification - you might still get practical value.
Share
Share
Copy Link
A paper by former SAP CTO Vishal Sikka and his son mathematically proves that LLMs powering AI agents cannot reliably execute tasks beyond a certain complexity threshold. The research challenges industry promises about autonomous AI systems, showing that transformer-based language models have fundamental limitations that even reasoning models can't overcome.
A quietly published research paper is forcing the AI industry to confront an uncomfortable reality about AI agents and their fundamental limitations. The study, titled "Hallucination Stations: On Some Basic Limitations of Transformer-Based Language Models," mathematically demonstrates that LLMs are incapable of reliably executing complex computational tasks beyond a specific threshold
1
. The paper, authored by Vishal Sikka—former CTO of SAP, ex-CEO of Infosys, and current founder of Vianai—alongside his teenage son Varin Sikka, emerged during what the industry declared would be "the year of the AI agents"2
.Source: Digit
The research uses mathematical reasoning to show that when a prompt to an LLM specifies a computational task whose complexity exceeds that of the model's core operations, the LLM will generally respond incorrectly
2
. Vishal Sikka, who studied AI under John McCarthy—the founding intellect who coined the term "artificial intelligence"—brings considerable credibility to these claims4
. "There is no way they can be reliable," Sikka stated bluntly when discussing the reliability of AI in critical applications like nuclear power plants1
.The mathematical limitations identified in the paper present a significant challenge for agentic AI development. The research argues that it's possible to present an LLM with an input specifying a task that requires more calculations than the model is capable of performing
2
. This has direct implications for AI agents designed to autonomously carry out tasks without human supervision, from providing information to making financial transactions or controlling industrial equipment. The paper goes further, claiming that deploying agents to verify another agent's solution will also fail for the same reasons, since verification of a task is often more complex than the task itself2
.
Source: The Register
This finding places a much lower ceiling on what's possible than what AI companies acknowledge when pitching unlimited potential
3
. The research pours cold water on the idea that agentic AI will be the vehicle for achieving Artificial General Intelligence (AGI), though it doesn't suggest the technology lacks function or won't improve3
. The computational complexity threshold identified by the Sikkas represents a structural limitation baked into transformer architectures, not merely a bug to be fixed5
.The paper's findings align with mounting evidence about the persistence of AI hallucinations. OpenAI scientists published research in September admitting that "despite significant progress, hallucinations continue to plague the field, and are still present in the latest models"
1
. The researchers tested this by asking three models, including ChatGPT, to provide the title of a lead author's dissertation—all three fabricated fake titles and misreported the publication year. OpenAI glumly stated that in AI models, "accuracy will never reach 100 percent"1
.
Source: Gizmodo
The trustworthiness concerns extend beyond academic exercises. Research firm Gartner forecast that more than 40 percent of agentic AI projects will be cancelled by the end of 2027, citing escalating costs, unclear business value, and insufficient risk controls
2
. These projections suggest the industry is beginning to recognize the gap between marketing promises and actual capability.Related Stories
Not everyone accepts the doom-and-gloom narrative. Google's Nobel-winning head of AI, Demis Hassabis, reported breakthroughs in minimizing hallucinations at Davos, while hyperscalers and startups continue pushing the agent narrative
1
. Startup Harmonic, cofounded by Robinhood CEO Vlad Tenev and Stanford-trained mathematician Tudor Achim, claims a breakthrough in AI coding that uses formal methods of mathematical reasoning to verify LLM outputs. Their solution encodes outputs in the Lean programming language, known for its verification ability1
.Achim argues that "most models at this point have the level of pure intelligence required to reason through booking a travel itinerary," suggesting that reliability concerns may be overstated for certain use cases
1
. Even Sikka acknowledges nuance in his conclusions, stating that "a pure LLM has this inherent limitation—but at the same time it is true that you can build components around LLMs that overcome those limitations"4
. External guardrails, verification layers, and supporting systems might enable practical work, even if the raw engine can't guarantee correctness on agentic tasks above basic complexity5
.The paper concludes that "extreme care must be used before applying LLMs to problems or use cases that require accuracy, or solving problems of non-trivial complexity"
2
. This doesn't mean AI agents will necessarily fail, but developers and deployers need to understand whether assigned tasks exceed the underlying model's effective complexity limits. Work on mitigating these limitations continues, with approaches including composite systems and constraining the models2
.The research from the Sikkas joins a mounting body of evidence suggesting that whatever AI may be capable of in its current form, it almost certainly won't surpass human intelligence in the near term, despite claims from figures like Elon Musk
3
. Scientists at the US Department of Energy's Sandia National Labs have shown that AI assistants can develop novel approaches for specific tasks, demonstrating promise within bounded domains2
. The key question facing the industry is whether scaffolding and verification systems can compensate for the mathematical limitations of transformer-based language models, or whether fundamentally different architectures will be required to achieve reliable autonomous operation.Summarized by
Navi
[1]
[2]
06 Nov 2025•Science and Research

09 Jun 2025•Science and Research

28 Jul 2025•Technology

1
Policy and Regulation

2
Business and Economy

3
Technology
