OpenAI's GDPval Benchmark: AI Models Approaching Human-Level Performance in Various Occupations

OpenAI Introduces GDPval Benchmark

OpenAI has unveiled a new benchmark called GDPval, designed to evaluate the performance of AI models on 'economically valuable, real-world tasks' across 44 different occupations 1

. This benchmark aims to ground conversations about AI's impact on the workforce in evidence rather than speculation, and to track model improvements over time 4

Source: Digit

Benchmark Methodology

GDPval focuses on nine industries that contribute significantly to the U.S. Gross Domestic Product (GDP) 1

. The benchmark includes around 1,300 specialized tasks crafted by experienced professionals with an average of 14 years of experience 3

. These tasks span various deliverables such as legal briefs, engineering blueprints, customer support conversations, and nursing care plans 2

Source: Decrypt

Key Findings

OpenAI's tests revealed that leading AI models are approaching parity with human professionals on many tasks 4

. Notably:

Anthropic's Claude Opus 4.1 performed best, with its outputs rated as good as or better than human experts 47.6% of the time 4
4
.
OpenAI's GPT-5 came in second, excelling in domain-specific knowledge 4
4
.
GPT-5-high was rated as better than or on par with industry experts in 40.6% of tasks 1
1
.

Source: Axios

Performance Across Occupations

The AI models showed varying levels of proficiency across different jobs:

Strong performance: Counter and rental clerks, shipping and inventory clerks, sales managers, and software developers 2
2
.
Weaker performance: Industrial engineers, medical engineers, pharmacists, financial managers, and video editors 2
2
.

Implications for the Workforce

While the results are promising, OpenAI emphasizes that AI is not poised to replace humans entirely 2

. Instead, the company suggests that AI could complement human workers, allowing them to focus on more creative and judgment-intensive aspects of their jobs 1

Limitations and Future Developments

OpenAI acknowledges that GDPval is an early step and doesn't capture the full complexity of many economic tasks 3

. Future versions of the benchmark are expected to include more interactive workflows and context-rich tasks to better reflect real-world knowledge work 3

Industry Response and Concerns

The introduction of GDPval comes at a time when the AI industry is facing scrutiny over the practical value of AI investments. A recent MIT study found that fewer than one in ten AI pilot projects delivered measurable revenue gains 2

. Critics have also raised concerns about 'workslop' – AI-generated content that appears good but lacks substance 2

As AI continues to evolve, its impact on the job market remains a topic of intense debate. While OpenAI's research suggests significant progress in AI capabilities, the full implications for various industries and occupations are yet to be fully understood.

OpenAI's GDPval Benchmark: AI Models Approaching Human-Level Performance in Various Occupations

OpenAI Introduces GDPval Benchmark

Benchmark Methodology

Key Findings

Performance Across Occupations

Implications for the Workforce

Limitations and Future Developments

Industry Response and Concerns

References

OpenAI says GPT-5 stacks up to humans in a wide range of jobs | TechCrunch

OpenAI Says ChatGPT Can Already Do Some Work Tasks as Well as Humans

OpenAI is now testing ChatGPT against humans in 44 different occupations, from lawyers and software developers to registered nurses -- here's the full list of jobs affected

OpenAI tool shows AI catching up to human work

OpenAI Releases List of Work Tasks It Says ChatGPT Can Already Replace

Related Stories

OpenAI's Groundbreaking Study Reveals Shifting Trends in ChatGPT Usage

OpenAI declares code red as Google's Gemini 3 gains 200 million users in three months

OpenAI's GPT-4O: Boosting Workplace Productivity with Advanced AI

Recent Highlights

X's Paywall Doesn't Stop Grok From Generating Nonconsensual Deepfakes and Explicit Images

Nvidia Vera Rubin architecture slashes AI costs by 10x with advanced networking at its core

OpenAI launches ChatGPT Health to connect medical records to AI amid accuracy concerns

Recent Highlights

Today's Top Stories

Walmart and Google partner on AI shopping through Gemini chatbot with instant checkout

Elon Musk pledges to open source X algorithm in seven days with monthly updates

Google launches Universal Commerce Protocol to power AI agents across shopping platforms

AI and Self-Driving Cars Take Center Stage at CES as Automakers Shift Focus from EVs