Curated by THEOUTPOST
On Thu, 19 Sept, 12:03 AM UTC
2 Sources
[1]
Scientists Preparing "Humanity's Last Exam" to Test Powerful AI
There's one thing the organizers won't be quizzing AI on, though. AI experts are calling for submissions for the "hardest and broadest set of questions ever" to try to stump today's most advanced artificial intelligence systems -- as well as those that are still coming. As Reuters reports, this test -- known in the field, memorably, as "Humanity's Last Exam" -- is being crowdsourced by the Center for AI Safety (CAIS) and the training data labeling firm Scale AI, which over the summer raised a cool billion dollars for an overall value of $14 billion. Reuters points out that submissions for this "exam" were opened just a day after results from OpenAI's new o1 model preview dropped. As CAIS executive director Dan Hendryks notes, o1 seems to have "destroyed the most popular reasoning benchmarks." Back in 2021, Hendrycks co-authored two papers with AI testing proposals that would evaluate whether models could out-quiz undergraduates. At the time, the AI systems being tested were spouting off answers nearly at random, but as Hendrycks notes, the models of today have "crushed" the 2021 tests. While the 2021 testing criteria primarily grilled the AI systems on math and social studies, "Humanity's Last Exam" will, as the CAIS executive director said, incorporate abstract reasoning to make it harder. The two institutions organizing the test are also planning to keep the test criteria confidential and not opening it up to the public, to make sure the answers don't end up in any AI training data. Due November 1, experts in fields as far-flung as rocketry and philosophy are being encouraged to submit questions that would be difficult for those outside their areas of expertise to answer. After undergoing peer review, winners will be offered co-authorship of a paper associated with the test and prizes up to $5,000 sponsored by Scale AI. While the organizers are casting a very wide net for the types of questions they're seeking, they told Reuters that there's one thing that will not be on the exam: anything about weapons, because it's too dangerous for AI to know about.
[2]
Public asked to help create 'humanity's last exam' to spot when AI achieves peak intelligence
Scientists are creating "humanity's last exam" to test AI and see when it has reached expert-level intelligence. People are being asked to submit their questions and create "the world's most difficult artificial intelligence test" by the Center for AI Safety (CAIS) and Scale AI. "Existing tests now have become too easy and we can no longer track AI developments well, or how far they are from becoming expert-level," said the quiz creators in a statement about the test. A few years ago, AI was giving almost random answers to questions on exams - that's no longer the case. Last week, OpenAI's newest model, known as OpenAI o1, "destroyed the most popular reasoning benchmarks", according to Dan Hendrycks, executive director of CAIS. However, AI still isn't able to answer difficult research questions and other intellectual questions. It also appears to score poorly on tests involving planning and visual pattern-recognition puzzles, according to Stanford University's AI Index Report from April. Consequently, "humanity's last exam" will require abstract reasoning to test how clever AI really is. The submissions shouldn't be any ordinary quiz questions. "We found questions written by undergraduates tend to be too easy for the models," the creators of the quiz said. More from Sky News: Russian state media banned from Facebook and Instagram Three mpox scenarios the UK is preparing for Teen Instagram users to get strict privacy settings in update Instead, they recommend that question writers have five or more years of experience in a technical industry job like SpaceX, or are a PhD student or above. The submissions should be difficult for non-experts to answer and "not easily answerable via a quick online search", and trick questions should be avoided. "As a rule of thumb, if a randomly selected undergraduate can understand what is being asked, it is likely too easy for the frontier LLMs of today and tomorrow," said the quiz creators. People who submit successful questions will be invited as co-authors on the paper and have a chance to win money from a $500,000 (£378,400) prize pool, with the writers of the best questions earning $5,000 (£3,780) each.
Share
Share
Copy Link
Researchers are developing a comprehensive test to measure AI capabilities, dubbed "Humanity's Last Exam." This collaborative effort aims to create benchmarks for assessing when AI reaches or surpasses human-level intelligence.
Researchers are embarking on an ambitious project to create what they call "Humanity's Last Exam," a comprehensive test designed to measure the capabilities of artificial intelligence (AI) systems. This initiative aims to establish benchmarks for determining when AI reaches or potentially surpasses human-level intelligence across various domains 1.
The project, spearheaded by the Collective Intelligence Project (CIP), is calling for public participation in developing this crucial assessment tool. Individuals from diverse backgrounds are encouraged to contribute questions and tasks that they believe would effectively gauge AI capabilities 2.
The exam is intended to cover a wide range of human knowledge and skills, including but not limited to:
By encompassing these diverse areas, researchers hope to create a comprehensive benchmark for AI capabilities 1.
The development of such a test raises important questions about the future of AI and its potential impact on society. As AI systems continue to advance rapidly, there is growing concern about the possibility of artificial general intelligence (AGI) surpassing human capabilities in numerous domains 2.
Creating an effective benchmark for AI intelligence presents several challenges:
Researchers acknowledge these challenges and emphasize the importance of ongoing refinement and adaptation of the exam 1.
The results of this project could have far-reaching consequences for various fields, including:
As AI continues to advance, the ability to accurately assess its capabilities becomes increasingly crucial for informed decision-making and responsible development 2.
A group of AI researchers is developing a comprehensive test called "Humanity's Last Exam" to assess the capabilities and limitations of advanced AI systems. This initiative aims to identify potential risks and ensure responsible AI development.
9 Sources
9 Sources
Scale AI and the Center for AI Safety have introduced a challenging new AI benchmark called 'Humanity's Last Exam', which has proven difficult for even the most advanced AI models, highlighting the current limitations of artificial intelligence.
7 Sources
7 Sources
OpenAI's Deep Research achieves a record-breaking 26.6% accuracy on Humanity's Last Exam, a new benchmark designed to test the limits of AI reasoning and problem-solving abilities across diverse fields.
2 Sources
2 Sources
As artificial intelligence continues to evolve at an unprecedented pace, experts debate its potential to revolutionize industries while others warn of the approaching technological singularity. The manifestation of unusual AI behaviors raises concerns about the widespread adoption of this largely misunderstood technology.
2 Sources
2 Sources
Leading computer scientists and AI experts issue warnings about the potential dangers of advanced AI systems. They call for international cooperation and regulations to ensure human control over AI development.
3 Sources
3 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved