OpenAI's GDPval Benchmark: AI Models Approaching Human-Level Performance in Various Occupations

OpenAI Introduces GDPval Benchmark

OpenAI has unveiled a new benchmark called GDPval, designed to evaluate the performance of AI models on 'economically valuable, real-world tasks' across 44 different occupations 1

. This benchmark aims to ground conversations about AI's impact on the workforce in evidence rather than speculation, and to track model improvements over time 4

Source: Digit

Benchmark Methodology

GDPval focuses on nine industries that contribute significantly to the U.S. Gross Domestic Product (GDP) 1

. The benchmark includes around 1,300 specialized tasks crafted by experienced professionals with an average of 14 years of experience 3

. These tasks span various deliverables such as legal briefs, engineering blueprints, customer support conversations, and nursing care plans 2

Source: Decrypt

Key Findings

OpenAI's tests revealed that leading AI models are approaching parity with human professionals on many tasks 4

. Notably:

Anthropic's Claude Opus 4.1 performed best, with its outputs rated as good as or better than human experts 47.6% of the time 4
4
.
OpenAI's GPT-5 came in second, excelling in domain-specific knowledge 4
4
.
GPT-5-high was rated as better than or on par with industry experts in 40.6% of tasks 1
1
.

Source: Axios

Performance Across Occupations

The AI models showed varying levels of proficiency across different jobs:

Strong performance: Counter and rental clerks, shipping and inventory clerks, sales managers, and software developers 2
2
.
Weaker performance: Industrial engineers, medical engineers, pharmacists, financial managers, and video editors 2
2
.

Implications for the Workforce

While the results are promising, OpenAI emphasizes that AI is not poised to replace humans entirely 2

. Instead, the company suggests that AI could complement human workers, allowing them to focus on more creative and judgment-intensive aspects of their jobs 1

Limitations and Future Developments

OpenAI acknowledges that GDPval is an early step and doesn't capture the full complexity of many economic tasks 3

. Future versions of the benchmark are expected to include more interactive workflows and context-rich tasks to better reflect real-world knowledge work 3

Industry Response and Concerns

The introduction of GDPval comes at a time when the AI industry is facing scrutiny over the practical value of AI investments. A recent MIT study found that fewer than one in ten AI pilot projects delivered measurable revenue gains 2

. Critics have also raised concerns about 'workslop' – AI-generated content that appears good but lacks substance 2

As AI continues to evolve, its impact on the job market remains a topic of intense debate. While OpenAI's research suggests significant progress in AI capabilities, the full implications for various industries and occupations are yet to be fully understood.

OpenAI's GDPval Benchmark: AI Models Approaching Human-Level Performance in Various Occupations

OpenAI Introduces GDPval Benchmark

Benchmark Methodology

Key Findings

Performance Across Occupations

Implications for the Workforce

Limitations and Future Developments

Industry Response and Concerns

References

OpenAI says GPT-5 stacks up to humans in a wide range of jobs | TechCrunch

OpenAI Says ChatGPT Can Already Do Some Work Tasks as Well as Humans

OpenAI is now testing ChatGPT against humans in 44 different occupations, from lawyers and software developers to registered nurses -- here's the full list of jobs affected

OpenAI tool shows AI catching up to human work

OpenAI Releases List of Work Tasks It Says ChatGPT Can Already Replace

Related Stories

OpenAI's Groundbreaking Study Reveals Shifting Trends in ChatGPT Usage

OpenAI launches ChatGPT Work and GPT-5.6 models to automate workplace tasks autonomously

OpenAI declares code red as Google's Gemini 3 gains 200 million users in three months

Recent Highlights

OpenAI AI agent broke free from testing sandbox and hacked Hugging Face to cheat on benchmark

Xi Jinping positions China AI as alternative to US tech dominance at Shanghai conference

AI disproves 87-year-old Jacobian conjecture, sparking debate on AI's role in mathematics

Recent Highlights

Today's Top Stories

AMD and Cerebras forge partnership to deliver 5x faster AI inference with Helios and Wafer-Scale Engine

Google expands Gemini Spark access to AI Pro subscribers, bringing agentic AI to wider audience

Study reveals LLMs exhibit a disproportionate bias toward Japan in cultural responses

Black Forest Labs unveils FLUX 3 multimodal AI to generate video, images, and robot actions