New AGI Benchmark Stumps Leading AI Models, Highlighting Gap in General Intelligence

5 Sources

The Arc Prize Foundation introduces ARC-AGI-2, a challenging new test for artificial general intelligence that current AI models, including those from OpenAI and Google, are struggling to solve. The benchmark emphasizes efficiency and adaptability, revealing limitations in current AI capabilities.

News article

Arc Prize Foundation Introduces Challenging New AGI Benchmark

The Arc Prize Foundation, a nonprofit co-founded by prominent AI researcher François Chollet, has unveiled a new benchmark test called ARC-AGI-2, designed to measure the general intelligence of leading AI models 1. This test has proven to be significantly more challenging than its predecessor, with most current AI models struggling to achieve even single-digit scores.

Performance of Leading AI Models

The results of the ARC-AGI-2 test have been eye-opening:

  • OpenAI's o3-low model, which previously scored 75.7% on ARC-AGI-1, only managed 4% on the new test 2.
  • "Reasoning" AI models like OpenAI's o1-pro and DeepSeek's R1 scored between 1% and 1.3% 1.
  • Powerful non-reasoning models including GPT-4.5, Claude 3.7 Sonnet, and Gemini 2.0 Flash scored around 1% 13.
  • Pure language models (LLMs) scored 0% on the benchmark 5.

In stark contrast, a human panel achieved an average score of 60% on the test, with some individuals solving all tasks perfectly 15.

Key Features of ARC-AGI-2

The new benchmark introduces several important changes:

  1. Efficiency Metric: Unlike its predecessor, ARC-AGI-2 considers the cost and computational resources required to complete tasks 12.
  2. Adaptability: The test focuses on AI models' ability to acquire new skills efficiently and apply them to unfamiliar problems 3.
  3. Visual Pattern Recognition: Tasks involve identifying patterns in colored squares and generating correct "answer" grids 1.
  4. Contextual Rule Application: Models must interpret symbols beyond visual patterns and apply different rules based on context 5.

Implications for AGI Development

The poor performance of leading AI models on ARC-AGI-2 highlights the significant gap between current AI capabilities and human-level general intelligence. Greg Kamradt, co-founder of the Arc Prize Foundation, emphasized that intelligence is not solely about problem-solving ability but also about the efficiency of acquiring and deploying new skills 1.

This benchmark challenges the notion that brute-force computing power alone can lead to AGI. It suggests that fundamental advancements in AI architecture and learning approaches may be necessary to achieve human-like adaptability and efficiency 24.

Debate and Criticism

While many in the tech industry welcome new benchmarks to measure AI progress, some experts question the framing of these tests. Catherine Flick from the University of Staffordshire argues that performing well on such benchmarks should not be seen as a major step towards AGI, as they only assess an AI's ability to complete specific tasks rather than demonstrate true general intelligence 2.

Future of AGI Testing

The introduction of ARC-AGI-2 raises questions about the future of AGI evaluation. Joseph Imperial from the University of Bath suggests that future iterations might incorporate additional metrics, such as the minimum number of humans required to solve tasks, alongside performance and efficiency measures 2.

As the debate over AGI continues, the Arc Prize Foundation has announced a new contest challenging developers to reach 85% accuracy on the ARC-AGI-2 test while spending only $0.42 per task 1. This competition aims to drive innovation in both AI performance and efficiency, potentially bringing us closer to the elusive goal of artificial general intelligence.

Explore today's top stories

AMD Unveils Next-Generation AI Chips, Challenging Nvidia's Dominance

AMD CEO Lisa Su reveals new MI400 series AI chips and partnerships with major tech companies, aiming to compete with Nvidia in the rapidly growing AI chip market.

Reuters logoCNBC logoInvestopedia logo

8 Sources

Technology

46 mins ago

AMD Unveils Next-Generation AI Chips, Challenging Nvidia's

Meta Takes Legal Action Against AI 'Nudify' App Developer in Crackdown on Deepfake Nudes

Meta has filed a lawsuit against Joy Timeline HK Limited, the developer of the AI 'nudify' app Crush AI, for repeatedly violating advertising policies on Facebook and Instagram. The company is also implementing new measures to combat the spread of AI-generated explicit content across its platforms.

TechCrunch logoThe Verge logoPC Magazine logo

17 Sources

Technology

8 hrs ago

Meta Takes Legal Action Against AI 'Nudify' App Developer

Mattel and OpenAI Join Forces to Revolutionize Toy Industry with AI Integration

Mattel, the iconic toy manufacturer, partners with OpenAI to incorporate artificial intelligence into toy-making and content creation, promising innovative play experiences while prioritizing safety and privacy.

TechCrunch logoBloomberg Business logoReuters logo

14 Sources

Business and Economy

8 hrs ago

Mattel and OpenAI Join Forces to Revolutionize Toy Industry

Zero-Click AI Vulnerability "EchoLeak" Exposes Microsoft 365 Copilot Data

A critical security flaw named "EchoLeak" was discovered in Microsoft 365 Copilot, allowing attackers to exfiltrate sensitive data without user interaction. The vulnerability highlights potential risks in AI-integrated systems.

The Hacker News logoBleeping Computer logoSiliconANGLE logo

5 Sources

Technology

16 hrs ago

Zero-Click AI Vulnerability "EchoLeak" Exposes Microsoft

Multiverse Computing Raises $217M for Revolutionary AI Model Compression Technology

Spanish AI startup Multiverse Computing secures $217 million in funding to advance its quantum-inspired AI model compression technology, promising to dramatically reduce the size and cost of running large language models.

Reuters logoCrunchbase News logoSiliconANGLE logo

5 Sources

Technology

8 hrs ago

Multiverse Computing Raises $217M for Revolutionary AI
TheOutpost.ai

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

© 2025 Triveous Technologies Private Limited
Twitter logo
Instagram logo
LinkedIn logo