OpenAI's GDPval Benchmark: AI Models Approaching Human-Level Performance in Various Occupations

Reviewed byNidhi Govil

9 Sources

Share

OpenAI introduces GDPval, a new benchmark to evaluate AI performance across 44 occupations. Results show top AI models, including GPT-5 and Claude Opus 4.1, are nearing human expert-level quality in many tasks.

OpenAI Introduces GDPval Benchmark

OpenAI has unveiled a new benchmark called GDPval, designed to evaluate the performance of AI models on 'economically valuable, real-world tasks' across 44 different occupations

1

2

. This benchmark aims to ground conversations about AI's impact on the workforce in evidence rather than speculation, and to track model improvements over time

4

.

Source: Digit

Source: Digit

Benchmark Methodology

GDPval focuses on nine industries that contribute significantly to the U.S. Gross Domestic Product (GDP)

1

4

. The benchmark includes around 1,300 specialized tasks crafted by experienced professionals with an average of 14 years of experience

3

. These tasks span various deliverables such as legal briefs, engineering blueprints, customer support conversations, and nursing care plans

2

3

.

Source: Decrypt

Source: Decrypt

Key Findings

OpenAI's tests revealed that leading AI models are approaching parity with human professionals on many tasks

4

. Notably:

  1. Anthropic's Claude Opus 4.1 performed best, with its outputs rated as good as or better than human experts 47.6% of the time

    4

    .
  2. OpenAI's GPT-5 came in second, excelling in domain-specific knowledge

    4

    .
  3. GPT-5-high was rated as better than or on par with industry experts in 40.6% of tasks

    1

    .
Source: Axios

Source: Axios

Performance Across Occupations

The AI models showed varying levels of proficiency across different jobs:

  • Strong performance: Counter and rental clerks, shipping and inventory clerks, sales managers, and software developers

    2

    .
  • Weaker performance: Industrial engineers, medical engineers, pharmacists, financial managers, and video editors

    2

    .

Implications for the Workforce

While the results are promising, OpenAI emphasizes that AI is not poised to replace humans entirely

2

5

. Instead, the company suggests that AI could complement human workers, allowing them to focus on more creative and judgment-intensive aspects of their jobs

1

4

.

Limitations and Future Developments

OpenAI acknowledges that GDPval is an early step and doesn't capture the full complexity of many economic tasks

3

. Future versions of the benchmark are expected to include more interactive workflows and context-rich tasks to better reflect real-world knowledge work

3

.

Industry Response and Concerns

The introduction of GDPval comes at a time when the AI industry is facing scrutiny over the practical value of AI investments. A recent MIT study found that fewer than one in ten AI pilot projects delivered measurable revenue gains

2

. Critics have also raised concerns about 'workslop' – AI-generated content that appears good but lacks substance

2

.

As AI continues to evolve, its impact on the job market remains a topic of intense debate. While OpenAI's research suggests significant progress in AI capabilities, the full implications for various industries and occupations are yet to be fully understood.

Today's Top Stories

© 2026 TheOutpost.AI All rights reserved