Devin, the 'First AI Software Engineer', Struggles with Basic Tasks, Raising Questions About AI's Readiness to Replace Human Coders

3 Sources

Share

Cognition AI's Devin, touted as the world's first AI software engineer, has been found to fail in 85% of assigned tasks, according to recent evaluations. This revelation challenges claims about AI's readiness to replace human software engineers.

News article

Devin's Disappointing Performance

Cognition AI's Devin, marketed as the "first AI software engineer," has been found to significantly underperform expectations. A team of machine learning data scientists from Answer.AI conducted a month-long analysis of Devin, revealing a staggeringly low success rate of just 15%

1

. Out of 20 assigned tasks, Devin completed only three successfully, with 14 failures and three inconclusive results.

Challenges and Limitations

The researchers highlighted several key issues with Devin's performance:

  1. Unpredictability: The team found it difficult to predict which tasks Devin would successfully complete, with even similar tasks often resulting in failure

    2

    .

  2. Time inefficiency: Tasks that seemed straightforward often took days rather than hours to complete, with Devin getting stuck in technical dead-ends or producing overly complex, unusable solutions

    1

    .

  3. Inability to recognize limitations: Devin would persistently pursue impossible solutions rather than recognizing fundamental blockers, spending excessive time on unachievable tasks

    2

    .

Debunking Marketing Claims

Cognition AI's marketing claims about Devin's capabilities have been called into question. The company initially boasted that Devin could "build and deploy apps end to end" and "autonomously find and fix bugs in codebases"

3

. However, these claims have been challenged by multiple sources:

  1. Software engineer Carl Brown analyzed Cognition's promotional video and accused the company of "lying" about its claims

    1

    .

  2. The Answer.AI team found that Devin often took far longer than any human coder to complete tasks

    1

    .

  3. Another YouTube code pundit pointed out critical security issues in Devin's output

    2

    .

Implications for AI in Software Engineering

Devin's poor performance raises questions about the readiness of AI to replace human software engineers. This comes at a time when tech industry leaders like Mark Zuckerberg have announced intentions to replace "midlevel engineers" with AI

1

. The gap between AI companies' claims and reality continues to be a significant issue in the industry.

Despite its shortcomings, researchers noted that Devin provided a polished user experience that was impressive when it worked. However, the infrequency of successful outcomes remains a major concern

2

.

As the AI industry continues to evolve, the case of Devin serves as a reminder of the challenges that lie ahead in developing truly autonomous AI systems capable of replacing human software engineers. It also highlights the importance of critical evaluation and transparency in AI development and marketing claims.

TheOutpost.ai

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

© 2025 Triveous Technologies Private Limited
Instagram logo
LinkedIn logo