Apple Study Challenges AI Reasoning Capabilities, Casting Doubt on AGI Claims

Reviewed byNidhi Govil

22 Sources

Apple researchers find that advanced AI reasoning models struggle with complex problem-solving, suggesting fundamental limitations in their ability to generalize reasoning like humans do.

Apple Researchers Challenge AI Reasoning Capabilities

A new study from Apple researchers has cast doubt on the capabilities of advanced AI reasoning models, challenging claims about imminent artificial general intelligence (AGI). The research, titled "The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity," was conducted by a team led by Parshin Shojaee and Iman Mirzadeh 1.

Source: NDTV Gadgets 360

Source: NDTV Gadgets 360

Study Methodology and Findings

The researchers examined "large reasoning models" (LRMs), including OpenAI's o1 and o3, DeepSeek-R1, and Claude 3.Sonnet Thinking. These models attempt to simulate logical reasoning through a process called "chain-of-thought reasoning" 1. The study used four classic puzzles - Tower of Hanoi, checkers jumping, river crossing, and blocks world - scaled from easy to extremely complex 1.

Source: Mashable

Source: Mashable

Key findings include:

  1. On simple tasks, standard models outperformed reasoning models.
  2. For moderately difficult tasks, reasoning models had an advantage.
  3. On highly complex tasks, both types of models failed completely 13.

The researchers also observed a "counterintuitive scaling limit" where reasoning models initially generated more thinking tokens as problem complexity increased, but then reduced their reasoning effort beyond a certain threshold 1.

Implications for AI Development

These results align with a recent study by the United States of America Mathematical Olympiad (USAMO), which found that the same models achieved low scores on novel mathematical proofs 1. Both studies documented severe performance degradation on problems requiring extended systematic reasoning.

AI researcher Gary Marcus, known for his skepticism, called the Apple results "pretty devastating to LLMs" 1. The study provides empirical support for the argument that neural networks struggle with out-of-distribution generalization.

Competing Interpretations

Not all researchers agree with the interpretation that these results demonstrate fundamental reasoning limitations. Some argue that the observed limitations may reflect deliberate training constraints rather than inherent inabilities 1.

University of Toronto economist Kevin A. Bryan suggested that models are specifically trained through reinforcement learning to avoid excessive computation, which could explain the observed behavior 1. Software engineer Sean Goedecke offered a similar critique, noting that when faced with extremely complex tasks, models like DeepSeek-R1 may decide that generating all moves manually is impossible and attempt to find shortcuts 1.

Broader Context and Industry Claims

The study's findings contrast sharply with recent claims by AI industry leaders. Sam Altman of OpenAI and Demis Hassabis of Google DeepMind have made bold predictions about AI capabilities in the 2030s, including solving high-energy physics problems and enabling space colonization 2.

Source: The Register

Source: The Register

However, researchers working with today's most advanced AI systems are finding a different reality. Even the best models are failing to solve basic puzzles that most humans find trivial, while the promise of AI that can "reason" seems to be overblown 24.

Limitations and Future Directions

The Apple researchers acknowledge that their study represents only a "narrow slice" of potential reasoning tasks 5. However, their findings suggest that current approaches to AI development may be encountering fundamental barriers to generalizable reasoning 4.

As the AI industry continues to invest heavily in developing more advanced models, with reports of Meta planning a $15 billion investment to achieve "superintelligence" 2, these research findings highlight the need for a critical examination of AI capabilities and limitations. The gap between industry claims and research findings underscores the importance of continued rigorous testing and evaluation of AI systems as they evolve.

Explore today's top stories

Apple Considers Partnering with OpenAI or Anthropic to Boost Siri's AI Capabilities

Apple is reportedly in talks with OpenAI and Anthropic to potentially use their AI models to power an updated version of Siri, marking a significant shift in the company's AI strategy.

TechCrunch logoThe Verge logoTom's Hardware logo

29 Sources

Technology

17 hrs ago

Apple Considers Partnering with OpenAI or Anthropic to

Cloudflare Launches Pay-Per-Crawl Feature to Monetize AI Bot Access

Cloudflare introduces a new tool allowing website owners to charge AI companies for content scraping, aiming to balance content creation and AI innovation.

Ars Technica logoTechCrunch logoMIT Technology Review logo

10 Sources

Technology

1 hr ago

Cloudflare Launches Pay-Per-Crawl Feature to Monetize AI

Elon Musk's xAI Secures $10 Billion in Funding, Intensifying AI Competition

Elon Musk's AI company, xAI, has raised $10 billion in a combination of debt and equity financing, signaling a major expansion in AI infrastructure and development amid fierce industry competition.

TechCrunch logoReuters logoCNBC logo

5 Sources

Business and Economy

9 hrs ago

Elon Musk's xAI Secures $10 Billion in Funding,

Google Unveils Comprehensive AI Tools for Education with Gemini and NotebookLM

Google announces a major expansion of AI tools for education, including Gemini for Education and NotebookLM, aimed at enhancing learning experiences for students and supporting educators in classroom management.

TechCrunch logoThe Verge logoAndroid Police logo

8 Sources

Technology

17 hrs ago

Google Unveils Comprehensive AI Tools for Education with

NVIDIA's GB300 Blackwell Ultra AI Servers Set to Revolutionize AI Computing in Late 2025

NVIDIA's upcoming GB300 Blackwell Ultra AI servers, slated for release in the second half of 2025, are poised to become the most powerful AI servers globally. Major Taiwanese manufacturers are vying for production orders, with Foxconn securing the largest share.

TweakTown logoWccftech logo

2 Sources

Technology

9 hrs ago

NVIDIA's GB300 Blackwell Ultra AI Servers Set to
TheOutpost.ai

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

© 2025 Triveous Technologies Private Limited
Twitter logo
Instagram logo
LinkedIn logo