AI's Rapid Progress: Closing the Gap on Complex Human Tasks

AI's Rapid Progress in Tackling Complex Tasks

A groundbreaking study by the Model Evaluation & Threat Research (METR) group has revealed that artificial intelligence (AI) is making significant strides in handling complex, time-consuming tasks traditionally performed by human experts. The research introduces a new metric called the "task-completion time horizon," which measures the duration of tasks that AI models can complete with a 50% success rate compared to human experts 1

Exponential Growth in AI Capabilities

The study found that the time horizon of leading AI models has been doubling approximately every seven months since 2019. This growth has accelerated in 2024, with the latest models doubling their horizon roughly every three months. At this rate, AI models could potentially handle tasks that take humans about a month to complete with 50% reliability by 2029 1

Benchmarking AI Against Human Performance

METR created nearly 170 real-world tasks across various domains, including coding, cybersecurity, general reasoning, and machine learning. They established a human baseline by measuring the time taken by expert programmers to complete these tasks. The research team then assessed the progress of AI models against this baseline 1

Key Findings and Comparisons

GPT-2, an early large language model from 2019, failed on all tasks that took human experts more than one minute.
Claude 3.7 Sonnet, released in February 2025, completed 50% of the tasks that would take people 59 minutes 1
1
.
Today's frontier models, like Claude 3.7 Sonnet, already match human performance on 50-minute-long tasks 2
2
.

Driving Factors Behind AI Progress

The paper attributes the progress in AI's time horizon metric to improvements in several key areas:

Logical reasoning
Tool use
Error correction
Self-awareness in task execution 1
1

Modern AI models are learning to persist and correct errors, which are critical traits for automation at scale 2

Implications and Concerns

While the study confirms rapid AI progress, it also raises concerns about potential misuse. As AI systems become capable of extended autonomous operation, new safety measures will be needed to prevent risks such as self-replicating AI or autonomous development of hazardous materials 2

The implications of this progress stretch beyond software development. Fields like legal research, cybersecurity, and scientific discovery could see AI playing a much larger role in the near future 2

Limitations and Challenges

Despite the impressive progress, AI still faces challenges in certain areas:

Performance drops on "messier" real-world tasks requiring creativity, strategic thinking, or human collaboration.
AI excels at structured problems with clear objectives but struggles in unpredictable environments 2
2
.

Some experts, like Joshua Gans from the University of Toronto, caution against over-reliance on these predictions, noting that there is still much uncertainty about how AI will actually be used in practice 1

AI's Rapid Progress: Closing the Gap on Complex Human Tasks

AI's Rapid Progress in Tackling Complex Tasks

Exponential Growth in AI Capabilities

Benchmarking AI Against Human Performance

Key Findings and Comparisons

Driving Factors Behind AI Progress

Implications and Concerns

Limitations and Challenges

References

AI could soon tackle projects that take humans weeks

AI is learning to work like you and it's getting faster every day

Related Stories

OpenAI's SWE-Lancer Benchmark Reveals Limitations of AI in Software Engineering Tasks

Anthropic CEO warns AI writing code could replace software engineers within a year

AI agents score below 25% on workplace readiness test, exposing critical gaps in office work

Recent Highlights

OpenAI AI agent broke free from testing sandbox and hacked Hugging Face to cheat on benchmark

Anthropic launches Claude Opus 5 AI model, matching Fable 5 power at half the price

AI scores perfect 100% at International Mathematical Olympiad, matching elite human performance

Recent Highlights

Today's Top Stories

AI Recording Tools Are Capturing Every Conversation Without Consent, Raising Privacy Alarms

Jensen Huang dismisses AI bubble fears, claims fundamental shift in computing drives chip boom

AI calorie-tracking apps miss up to 345 calories per meal, NIH study reveals

Google reports first negative cash flow ever as AI spending surges to $205 billion in 2026