Humanity's Last Exam reveals the gap between AI and human intelligence despite rapid progress
Researchers created Humanity's Last Exam, a PhD-level AI benchmark with 2,500 questions designed to assess the limits of AI reasoning. Google's Gemini 3 Deep Think achieved the highest score at 48.4%, while human experts score around 90%. Despite progress, experts emphasize this doesn't signal the arrival of Artificial General Intelligence.