AI Agents Hit Mathematical Wall: Vishal Sikka Paper Shows LLMs Can't Handle Complex Tasks

Reviewed byNidhi Govil

5 Sources

Share

A paper by former SAP CTO Vishal Sikka and his son mathematically proves that LLMs powering AI agents cannot reliably execute tasks beyond a certain complexity threshold. The research challenges industry promises about autonomous AI systems, showing that transformer-based language models have fundamental limitations that even reasoning models can't overcome.

Mathematical Proof Challenges AI Agents' Capabilities

A quietly published research paper is forcing the AI industry to confront an uncomfortable reality about AI agents and their fundamental limitations. The study, titled "Hallucination Stations: On Some Basic Limitations of Transformer-Based Language Models," mathematically demonstrates that LLMs are incapable of reliably executing complex computational tasks beyond a specific threshold

1

. The paper, authored by Vishal Sikka—former CTO of SAP, ex-CEO of Infosys, and current founder of Vianai—alongside his teenage son Varin Sikka, emerged during what the industry declared would be "the year of the AI agents"

2

.

Source: Digit

Source: Digit

The research uses mathematical reasoning to show that when a prompt to an LLM specifies a computational task whose complexity exceeds that of the model's core operations, the LLM will generally respond incorrectly

2

. Vishal Sikka, who studied AI under John McCarthy—the founding intellect who coined the term "artificial intelligence"—brings considerable credibility to these claims

4

. "There is no way they can be reliable," Sikka stated bluntly when discussing the reliability of AI in critical applications like nuclear power plants

1

.

The Complexity Ceiling for Transformer-Based Language Models

The mathematical limitations identified in the paper present a significant challenge for agentic AI development. The research argues that it's possible to present an LLM with an input specifying a task that requires more calculations than the model is capable of performing

2

. This has direct implications for AI agents designed to autonomously carry out tasks without human supervision, from providing information to making financial transactions or controlling industrial equipment. The paper goes further, claiming that deploying agents to verify another agent's solution will also fail for the same reasons, since verification of a task is often more complex than the task itself

2

.

Source: The Register

Source: The Register

This finding places a much lower ceiling on what's possible than what AI companies acknowledge when pitching unlimited potential

3

. The research pours cold water on the idea that agentic AI will be the vehicle for achieving Artificial General Intelligence (AGI), though it doesn't suggest the technology lacks function or won't improve

3

. The computational complexity threshold identified by the Sikkas represents a structural limitation baked into transformer architectures, not merely a bug to be fixed

5

.

AI Hallucinations Persist Despite Industry Progress

The paper's findings align with mounting evidence about the persistence of AI hallucinations. OpenAI scientists published research in September admitting that "despite significant progress, hallucinations continue to plague the field, and are still present in the latest models"

1

. The researchers tested this by asking three models, including ChatGPT, to provide the title of a lead author's dissertation—all three fabricated fake titles and misreported the publication year. OpenAI glumly stated that in AI models, "accuracy will never reach 100 percent"

1

.

Source: Gizmodo

Source: Gizmodo

The trustworthiness concerns extend beyond academic exercises. Research firm Gartner forecast that more than 40 percent of agentic AI projects will be cancelled by the end of 2027, citing escalating costs, unclear business value, and insufficient risk controls

2

. These projections suggest the industry is beginning to recognize the gap between marketing promises and actual capability.

Industry Pushback and Alternative Approaches

Not everyone accepts the doom-and-gloom narrative. Google's Nobel-winning head of AI, Demis Hassabis, reported breakthroughs in minimizing hallucinations at Davos, while hyperscalers and startups continue pushing the agent narrative

1

. Startup Harmonic, cofounded by Robinhood CEO Vlad Tenev and Stanford-trained mathematician Tudor Achim, claims a breakthrough in AI coding that uses formal methods of mathematical reasoning to verify LLM outputs. Their solution encodes outputs in the Lean programming language, known for its verification ability

1

.

Achim argues that "most models at this point have the level of pure intelligence required to reason through booking a travel itinerary," suggesting that reliability concerns may be overstated for certain use cases

1

. Even Sikka acknowledges nuance in his conclusions, stating that "a pure LLM has this inherent limitation—but at the same time it is true that you can build components around LLMs that overcome those limitations"

4

. External guardrails, verification layers, and supporting systems might enable practical work, even if the raw engine can't guarantee correctness on agentic tasks above basic complexity

5

.

What This Means for AI Development

The paper concludes that "extreme care must be used before applying LLMs to problems or use cases that require accuracy, or solving problems of non-trivial complexity"

2

. This doesn't mean AI agents will necessarily fail, but developers and deployers need to understand whether assigned tasks exceed the underlying model's effective complexity limits. Work on mitigating these limitations continues, with approaches including composite systems and constraining the models

2

.

The research from the Sikkas joins a mounting body of evidence suggesting that whatever AI may be capable of in its current form, it almost certainly won't surpass human intelligence in the near term, despite claims from figures like Elon Musk

3

. Scientists at the US Department of Energy's Sandia National Labs have shown that AI assistants can develop novel approaches for specific tasks, demonstrating promise within bounded domains

2

. The key question facing the industry is whether scaffolding and verification systems can compensate for the mathematical limitations of transformer-based language models, or whether fundamentally different architectures will be required to achieve reliable autonomous operation.

Today's Top Stories

TheOutpost.ai

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

© 2026 Triveous Technologies Private Limited
Instagram logo
LinkedIn logo