OpenAI's Deep Research Dominates Humanity's Last Exam, Setting New Benchmarks in AI Capabilities

2 Sources

OpenAI's Deep Research achieves a record-breaking 26.6% accuracy on Humanity's Last Exam, a new benchmark designed to test the limits of AI reasoning and problem-solving abilities across diverse fields.

News article

OpenAI's Deep Research Shatters Records on Humanity's Last Exam

In a significant leap forward for artificial intelligence, OpenAI's Deep Research has achieved a groundbreaking score of 26.6% accuracy on Humanity's Last Exam (HLE), a newly established benchmark designed to push AI systems to their limits 1. This result represents a staggering 183% increase in accuracy compared to previous top performers, setting a new standard for AI capabilities in complex reasoning and problem-solving.

Understanding Humanity's Last Exam

HLE, developed by the Center for AI Safety (CAIS) and Scale AI, is considered the world's hardest AI exam. It comprises 3,000 challenging questions spanning over 100 subjects, including mathematics, physics, law, medicine, and philosophy 2. Unlike previous benchmarks, HLE incorporates both text and image-based questions, with 10% of the exam requiring visual processing alongside written context.

The Rapid Progress of AI Models

The AI community has witnessed remarkable progress in a short span of time. Just days before Deep Research's achievement, other models had set impressive benchmarks:

  1. DeepSeek R1: 9.4% accuracy (text-only evaluation)
  2. OpenAI's o3-mini: 10.5% accuracy (standard setting)
  3. OpenAI's o3-mini-high: 13% accuracy (more intelligent but slower setting) 1

Deep Research's Distinctive Advantage

It's worth noting that Deep Research's exceptional performance is partly attributed to its web search capabilities, which are not available to other AI models. This feature provides an advantage in addressing general knowledge questions included in the exam 1.

The Significance of HLE in AI Development

HLE represents a critical shift in how AI progress is measured and evaluated:

  1. Exposing AI Weaknesses: The exam reveals areas where AI still struggles, such as deep reasoning and multi-modal understanding 2.

  2. Setting New Standards: HLE challenges AI companies to focus on meaningful advancements rather than superficial improvements 2.

  3. Increasing Accountability: The benchmark introduces transparency and forces AI models to perform under pressure, mimicking real-world scenarios 2.

The Road Ahead

While Deep Research's 26.6% accuracy on HLE is impressive, it still falls short of what would be considered a passing grade in human terms. This underscores the significant challenges that remain in developing AI systems capable of human-level reasoning across diverse fields 1.

As AI continues to evolve rapidly, HLE will likely play a crucial role in gauging progress and directing research efforts. The AI community now faces the exciting challenge of pushing beyond current limitations, with many wondering how long it will take for an AI model to surpass the 50% mark on this rigorous exam 12.

Explore today's top stories

NVIDIA Unveils Major GeForce NOW Upgrade with RTX 5080 Performance and Expanded Game Library

NVIDIA announces significant upgrades to its GeForce NOW cloud gaming service, including RTX 5080-class performance, improved streaming quality, and an expanded game library, set to launch in September 2025.

CNET logoengadget logoPCWorld logo

10 Sources

Technology

19 hrs ago

NVIDIA Unveils Major GeForce NOW Upgrade with RTX 5080

Nvidia Develops New AI Chip for China Amid Geopolitical Tensions

Nvidia is reportedly developing a new AI chip, the B30A, based on its latest Blackwell architecture for the Chinese market. This chip is expected to outperform the currently allowed H20 model, raising questions about U.S. regulatory approval and the ongoing tech trade tensions between the U.S. and China.

TechCrunch logoTom's Hardware logoReuters logo

11 Sources

Technology

19 hrs ago

Nvidia Develops New AI Chip for China Amid Geopolitical

SoftBank's $2 Billion Investment in Intel: A Strategic Move in the AI Chip Race

SoftBank Group has agreed to invest $2 billion in Intel, buying common stock at $23 per share. This strategic investment comes as Intel undergoes a major restructuring under new CEO Lip-Bu Tan, aiming to regain its competitive edge in the semiconductor industry, particularly in AI chips.

TechCrunch logoTom's Hardware logoReuters logo

18 Sources

Business

11 hrs ago

SoftBank's $2 Billion Investment in Intel: A Strategic Move

Databricks Secures $100 Billion Valuation in Latest Funding Round, Highlighting AI Sector's Rapid Growth

Databricks, a data analytics firm, is set to raise its valuation to over $100 billion in a new funding round, showcasing the strong investor interest in AI startups. The company plans to use the funds for AI acquisitions and product development.

Reuters logoAnalytics India Magazine logoU.S. News & World Report logo

7 Sources

Business

3 hrs ago

Databricks Secures $100 Billion Valuation in Latest Funding

OpenAI Launches Affordable ChatGPT Go Plan in India, Eyeing Global Expansion

OpenAI introduces ChatGPT Go, a new subscription plan priced at ₹399 ($4.60) per month exclusively for Indian users, offering enhanced features and affordability to capture a larger market share.

TechCrunch logoBloomberg Business logoReuters logo

15 Sources

Technology

11 hrs ago

OpenAI Launches Affordable ChatGPT Go Plan in India, Eyeing
TheOutpost.ai

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

© 2025 Triveous Technologies Private Limited
Instagram logo
LinkedIn logo