OpenAI's GDPval Benchmark: AI Models Approaching Human-Level Performance in Various Occupations

Reviewed byNidhi Govil

9 Sources

Share

OpenAI introduces GDPval, a new benchmark to evaluate AI performance across 44 occupations. Results show top AI models, including GPT-5 and Claude Opus 4.1, are nearing human expert-level quality in many tasks.

OpenAI Introduces GDPval Benchmark

OpenAI has unveiled a new benchmark called GDPval, designed to evaluate the performance of AI models on 'economically valuable, real-world tasks' across 44 different occupations

1

2

. This benchmark aims to ground conversations about AI's impact on the workforce in evidence rather than speculation, and to track model improvements over time

4

.

Source: Digit

Source: Digit

Benchmark Methodology

GDPval focuses on nine industries that contribute significantly to the U.S. Gross Domestic Product (GDP)

1

4

. The benchmark includes around 1,300 specialized tasks crafted by experienced professionals with an average of 14 years of experience

3

. These tasks span various deliverables such as legal briefs, engineering blueprints, customer support conversations, and nursing care plans

2

3

.

Source: Decrypt

Source: Decrypt

Key Findings

OpenAI's tests revealed that leading AI models are approaching parity with human professionals on many tasks

4

. Notably:

  1. Anthropic's Claude Opus 4.1 performed best, with its outputs rated as good as or better than human experts 47.6% of the time

    4

    .
  2. OpenAI's GPT-5 came in second, excelling in domain-specific knowledge

    4

    .
  3. GPT-5-high was rated as better than or on par with industry experts in 40.6% of tasks

    1

    .
Source: Axios

Source: Axios

Performance Across Occupations

The AI models showed varying levels of proficiency across different jobs:

  • Strong performance: Counter and rental clerks, shipping and inventory clerks, sales managers, and software developers

    2

    .
  • Weaker performance: Industrial engineers, medical engineers, pharmacists, financial managers, and video editors

    2

    .

Implications for the Workforce

While the results are promising, OpenAI emphasizes that AI is not poised to replace humans entirely

2

5

. Instead, the company suggests that AI could complement human workers, allowing them to focus on more creative and judgment-intensive aspects of their jobs

1

4

.

Limitations and Future Developments

OpenAI acknowledges that GDPval is an early step and doesn't capture the full complexity of many economic tasks

3

. Future versions of the benchmark are expected to include more interactive workflows and context-rich tasks to better reflect real-world knowledge work

3

.

Industry Response and Concerns

The introduction of GDPval comes at a time when the AI industry is facing scrutiny over the practical value of AI investments. A recent MIT study found that fewer than one in ten AI pilot projects delivered measurable revenue gains

2

. Critics have also raised concerns about 'workslop' – AI-generated content that appears good but lacks substance

2

.

As AI continues to evolve, its impact on the job market remains a topic of intense debate. While OpenAI's research suggests significant progress in AI capabilities, the full implications for various industries and occupations are yet to be fully understood.

Today's Top Stories

TheOutpost.ai

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

© 2025 Triveous Technologies Private Limited
Instagram logo
LinkedIn logo