Apple Study Challenges AI Reasoning Capabilities, Casting Doubt on AGI Claims

Reviewed byNidhi Govil

22 Sources

Apple researchers find that advanced AI reasoning models struggle with complex problem-solving, suggesting fundamental limitations in their ability to generalize reasoning like humans do.

Apple Researchers Challenge AI Reasoning Capabilities

A new study from Apple researchers has cast doubt on the capabilities of advanced AI reasoning models, challenging claims about imminent artificial general intelligence (AGI). The research, titled "The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity," was conducted by a team led by Parshin Shojaee and Iman Mirzadeh 1.

Source: NDTV Gadgets 360

Source: NDTV Gadgets 360

Study Methodology and Findings

The researchers examined "large reasoning models" (LRMs), including OpenAI's o1 and o3, DeepSeek-R1, and Claude 3.Sonnet Thinking. These models attempt to simulate logical reasoning through a process called "chain-of-thought reasoning" 1. The study used four classic puzzles - Tower of Hanoi, checkers jumping, river crossing, and blocks world - scaled from easy to extremely complex 1.

Source: Mashable

Source: Mashable

Key findings include:

  1. On simple tasks, standard models outperformed reasoning models.
  2. For moderately difficult tasks, reasoning models had an advantage.
  3. On highly complex tasks, both types of models failed completely 13.

The researchers also observed a "counterintuitive scaling limit" where reasoning models initially generated more thinking tokens as problem complexity increased, but then reduced their reasoning effort beyond a certain threshold 1.

Implications for AI Development

These results align with a recent study by the United States of America Mathematical Olympiad (USAMO), which found that the same models achieved low scores on novel mathematical proofs 1. Both studies documented severe performance degradation on problems requiring extended systematic reasoning.

AI researcher Gary Marcus, known for his skepticism, called the Apple results "pretty devastating to LLMs" 1. The study provides empirical support for the argument that neural networks struggle with out-of-distribution generalization.

Competing Interpretations

Not all researchers agree with the interpretation that these results demonstrate fundamental reasoning limitations. Some argue that the observed limitations may reflect deliberate training constraints rather than inherent inabilities 1.

University of Toronto economist Kevin A. Bryan suggested that models are specifically trained through reinforcement learning to avoid excessive computation, which could explain the observed behavior 1. Software engineer Sean Goedecke offered a similar critique, noting that when faced with extremely complex tasks, models like DeepSeek-R1 may decide that generating all moves manually is impossible and attempt to find shortcuts 1.

Broader Context and Industry Claims

The study's findings contrast sharply with recent claims by AI industry leaders. Sam Altman of OpenAI and Demis Hassabis of Google DeepMind have made bold predictions about AI capabilities in the 2030s, including solving high-energy physics problems and enabling space colonization 2.

Source: The Register

Source: The Register

However, researchers working with today's most advanced AI systems are finding a different reality. Even the best models are failing to solve basic puzzles that most humans find trivial, while the promise of AI that can "reason" seems to be overblown 24.

Limitations and Future Directions

The Apple researchers acknowledge that their study represents only a "narrow slice" of potential reasoning tasks 5. However, their findings suggest that current approaches to AI development may be encountering fundamental barriers to generalizable reasoning 4.

As the AI industry continues to invest heavily in developing more advanced models, with reports of Meta planning a $15 billion investment to achieve "superintelligence" 2, these research findings highlight the need for a critical examination of AI capabilities and limitations. The gap between industry claims and research findings underscores the importance of continued rigorous testing and evaluation of AI systems as they evolve.

Explore today's top stories

Google Launches Search Live: AI-Powered Voice Conversations in Search

Google introduces Search Live, an AI-powered feature enabling back-and-forth voice conversations with its search engine, enhancing user interaction and information retrieval.

TechCrunch logoCNET logoThe Verge logo

15 Sources

Technology

1 day ago

Google Launches Search Live: AI-Powered Voice Conversations

Microsoft Plans Massive Layoffs Amid $80 Billion AI Investment Push

Microsoft is set to cut thousands of jobs, primarily in sales, as it shifts focus towards AI investments. The tech giant plans to invest $80 billion in AI infrastructure while restructuring its workforce.

Reuters logoTechSpot logoTechRadar logo

13 Sources

Business and Economy

1 day ago

Microsoft Plans Massive Layoffs Amid $80 Billion AI

Apple Explores Generative AI for Chip Design: A Boost to Silicon Innovation

Apple's senior VP of Hardware Technologies, Johny Srouji, reveals the company's interest in using generative AI to accelerate chip design processes, potentially revolutionizing their approach to custom silicon development.

Tom's Hardware logoReuters logo9to5Mac logo

11 Sources

Technology

16 hrs ago

Apple Explores Generative AI for Chip Design: A Boost to

Midjourney Launches V1: Its First AI Video Generation Model

Midjourney, known for AI image generation, has released its first AI video model, V1, allowing users to create short videos from images. This launch puts Midjourney in competition with other AI video generation tools and raises questions about copyright and pricing.

TechCrunch logoThe Verge logoengadget logo

10 Sources

Technology

1 day ago

Midjourney Launches V1: Its First AI Video Generation Model

AI Reasoning Models Generate Up to 50 Times More CO₂ Emissions Than Concise Models, Study Finds

A new study reveals that AI reasoning models produce significantly higher CO₂ emissions compared to concise models when answering questions, highlighting the environmental impact of advanced AI technologies.

Popular Science logoScienceDaily logoLive Science logo

8 Sources

Technology

8 hrs ago

AI Reasoning Models Generate Up to 50 Times More CO₂
TheOutpost.ai

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

© 2025 Triveous Technologies Private Limited
Twitter logo
Instagram logo
LinkedIn logo