13 Sources
[1]
Study finds AI tools made open source software developers 19 percent slower
When it comes to concrete use cases for large language models, AI companies love to point out the ways coders and software developers can use these models to increase their productivity and overall efficiency in creating computer code. However, a new randomized controlled trial has found that experienced open source coders became less efficient at coding-related tasks when they used current AI tools. For their study, researchers at METR (Model Evaluation and Threat Research) recruited 16 software developers, each with multiple years of experience working on specific open source repositories. The study followed these developers across 246 individual "tasks" involved with maintaining those repos, such as "bug fixes, features, and refactors that would normally be part of their regular work." For half of those tasks, the developers used AI tools like Cursor Pro or Anthropic's Claude; for the others, the programmers were instructed not to use AI assistance. Expected time forecasts for each task (made before the groupings were assigned) were used as a proxy to balance out the overall difficulty of the tasks in each experimental group, and the time needed to fix pull requests based on reviewer feedback was included in the overall assessment. Before performing the study, the developers in question expected the AI tools would lead to a 24 percent reduction in the time needed for their assigned tasks. Even after completing those tasks, the developers believed that the AI tools had made them 20 percent faster, on average. In reality, though, the AI-aided tasks ended up being completed 19 percent slower than those completed without AI tools. Trade-offs By analyzing screen recording data from a subset of the studied developers, the METR researchers found that AI tools tended to reduce the average time those developers spent actively coding, testing/debugging, or "reading/searching for information." But those time savings were overwhelmed in the end by "time reviewing AI outputs, prompting AI systems, and waiting for AI generations," as well as "idle/overhead time" where the screen recordings show no activity. Overall, the developers in the study accepted less than 44 percent of the code generated by AI without modification. A majority of the developers reported needing to make changes to the code generated by their AI companion, and a total of 9 percent of the total task time in the "AI-assisted" portion of the study was taken up by this kind of review. On the surface, METR's results seem to contradict other benchmarks and experiments that demonstrate increases in coding efficiency when AI tools are used. But those often also measure productivity in terms of total lines of code or the number of discrete tasks/code commits/pull requests completed, all of which can be poor proxies for actual coding efficiency. Many of the existing coding benchmarks also focus on synthetic, algorithmically scorable tasks created specifically for the benchmark test, making it hard to compare those results to those focused on work with pre-existing, real-world code bases. Along those lines, the developers in METR's study reported in surveys that the overall complexity of the repos they work with (which average 10 years of age and over 1 million lines of code) limited how helpful the AI could be. The AI wasn't able to utilize "important tacit knowledge or context" about the codebase, the researchers note, while the "high developer familiarity with [the] repositories" aided their very human coding efficiency in these tasks. These factors lead the researchers to conclude that current AI coding tools may be particularly ill-suited to "settings with very high quality standards, or with many implicit requirements (e.g., relating to documentation, testing coverage, or linting/formatting) that take humans substantial time to learn." While those factors may not apply in "many realistic, economically relevant settings" involving simpler code bases, they could limit the impact of AI tools in this study and similar real-world situations. And even for complex coding projects like the ones studied, the researchers are also optimistic that further refinement of AI tools could lead to future efficiency gains for programmers. Systems that have better reliability, lower latency, or more relevant outputs (via techniques such as prompt scaffolding or fine-tuning) "could speed up developers in our setting," the researchers write. Already, they say there is "preliminary evidence" that the recent release of Claude 3.7 "can often correctly implement the core functionality of issues on several repositories that are included in our study." For now, however, METR's study provides some strong evidence that AI's much-vaunted usefulness for coding tasks may have significant limitations in certain complex, real-world coding scenarios.
[2]
AI coding tools may not speed up every developer, study shows | TechCrunch
Software engineer workflows have been transformed in recent years by an influx of AI coding tools like Cursor and GitHub Copilot, which promise to enhance productivity by automatically writing lines of code, fixing bugs, and testing changes. The tools are powered by AI models from OpenAI, Google DeepMind, Anthropic, and xAI that have rapidly increased their performance on a range of software engineering tests in recent years. However, a new study published Thursday by the non-profit AI research group METR calls into question the extent to which today's AI coding tools enhance productivity for experienced developers. METR conducted a randomized controlled trial for this study by recruiting 16 experienced open-source developers and having them complete 246 real tasks on large code repositories they regularly contribute to. The researchers randomly assigned roughly half of those tasks as "AI-allowed," giving developers permission to use state-of-the-art AI coding tools such as Cursor Pro, while the other half of tasks forbade the use of AI tools. Before completing their assigned tasks, the developers forecasted that using AI coding tools would reduce their completion time by 24%. That wasn't the case. "Surprisingly, we find that allowing AI actually increases completion time by 19% -- developers are slower when using AI tooling," the researchers said. Notably, only 56% of the developers in the study had experience using Cursor, the main AI tool offered in the study. While nearly all the developers (94%) had experience using some web-based LLMs in their coding workflows, this study was the first time some used Cursor specifically. The researchers note that developers were trained on using Cursor in preparation for the study. Nevertheless, METR's findings raise questions about the supposed universal productivity gains promised by AI coding tools in 2025. Based on the study, developers shouldn't assume that AI coding tools -- specifically what's come to be known as "vibe coders" -- will immediately speed up their workflows. METR researchers point to a few potential reasons why AI slowed down developers rather than speeding them up. First, developers spend much more time prompting AI and waiting for it to respond when using vibe coders rather than actually coding. AI also tends to struggle in large, complex code bases, which this test used. The study's authors are careful not to draw any strong conclusions from these findings, explicitly noting they don't believe AI systems currently fail to speed up many or most software developers. Other large scale studies have shown that AI coding tools do speed up software engineer workflows. The authors also note that AI progress has been substantial in recent years, and that they wouldn't expect the same results even three months from now. METR has also found that AI coding tools have significantly improved their ability to complete complex, long-horizon tasks in recent years. The research offers yet another reason to be skeptical of the promised gains of AI coding tools. Other studies have shown that today's AI coding tools can introduce mistakes, and in some cases, security vulnerabilities.
[3]
Need to Code Faster? AI Might Be Slowing You Down
Much has been made of how AI tools like Cursor Pro or Anthropic's Claude will revolutionize coders' day-to-day experience, potentially shaving tens of hours off their working weeks -- but research has now cast doubt on those assumptions. A new study assessing the performance of 16 experienced software developers working on 246 different tasks found that using AI tools actually increased completion time for the tasks by 19%. This flew in the face of the expectations of researchers, who had anticipated that the AI-assisted coders would be 24% faster than the non-AI control group. The study, first spotted by The Register, came from Metr, a nonprofit AI-focused research organization. Reasons why AI slowed down developers included low reliability. The study found that the developers accepted fewer than 44% of the AI's code suggestions, and the majority reported making major changes to clean up the AI-generated code. In fact, the developers spent 9% of their time reviewing or cleaning the AI outputs. Though the AI-assisted developers did save a lot of time spent actively coding or researching, many of these gains were nullified by time spent crafting prompts, waiting for the AI to come back with responses, or proofing its output for errors. In addition, the study found that participants consistently overestimated AI's ability to assist at the task at hand, even after hours of using the tool. However, there are plenty of caveats to this research. The participants all had five years of coding experience, with only "moderate" AI experience. So results may have been different for less experienced coders. The participants were also set to work on very large and complex existing code repositories, which broadly "had very high quality bars for code contributions," according to the researchers. So results may be different for a completely green coder working on a simple app. This isn't the first time that we've heard rumblings about AI causing as many problems as it solves in the workplace. Research from economists at the University of Chicago and the University of Copenhagen released earlier this year found that AI has yet to move the needle in terms of real-world productivity. Though it caused gains of roughly 3% in overall worker productivity, this was offset by the workload of tasks created by AI, such as teachers checking homework for ChatGPT use.
[4]
AI coding tools make developers slower, study finds
Artificial intelligence coding tools are supposed to make software development faster, but researchers who tested these tools in a randomized, controlled trial found the opposite. Computer scientists with Model Evaluation & Threat Research (METR), a non-profit research group, have published a study showing that AI coding tools made software developers slower, despite expectations to the contrary. Not only did the use of AI tools hinder developers, but it led them to hallucinate, much like the AIs have a tendency to do themselves. The developers predicted a 24 percent speedup, but even after the study concluded, they believed AI had helped them complete tasks 20 percent faster when it had actually delayed their work by about that percentage. Surprisingly, we find that allowing AI actually increases completion time by 19 percent -- AI tooling slowed developers down "After completing the study, developers estimate that allowing AI reduced completion time by 20 percent," the study says. "Surprisingly, we find that allowing AI actually increases completion time by 19 percent -- AI tooling slowed developers down." The study involved 16 experienced developers who work on large, open source projects. The developers provided a list of real issues (e.g. bug fixes, new features, etc.) they needed to address - 246 in total - and then forecast how long they expected those tasks would take. The issues were randomly assigned to allow or disallow AI tool usage. The developers then proceeded to work on their issues, using their AI tool of choice (mainly Cursor Pro with Claude 3.5/3.7 Sonnet) when allowed to do so. The work occurred between February and June 2025. The study says the slowdown can likely be attributed to five factors: Other considerations like AI generation latency and failure to provide models with optimal context (input) may have played some role in the results, but the researchers say they're uncertain how such things affected the study. Other researchers have also found that AI does not always live up to the hype. A recent study from AI coding biz Qodo found some of the benefits of AI software assistance were undercut by the need to do additional work to check AI code suggestions. An economic survey found that generative AI has had no impact on jobs or wages, based on data from Denmark. An Intel study found that AI PCs make users less productive. And call center workers at a Chinese electrical utility say that while AI assistance can accelerate some tasks, it also slows things down by creating more work. That aspect of AI tool use - the added work - is evident in one of the graphics included in the study. "When AI is allowed, developers spend less time actively coding and searching for/reading information, and instead spend time prompting AI, waiting on and reviewing AI outputs, and idle," the study explains. More anecdotally, a lot of coders find that AI tools can help test new scenarios quickly in a low-stakes way and automate certain routine tasks, but don't save time overall because you still have to validate whether the code actually works - plus, they don't learn like an intern. In other words, AI tools may make programming incrementally more fun, but they don't make it more efficient. The authors - Joel Becker, Nate Rush, Beth Barnes, and David Rein - caution that their work should be reviewed in a narrow context, as a snapshot in time based on specific experimental tools and conditions. "The slowdown we observe does not imply that current AI tools do not often improve developer's productivity - we find evidence that the high developer familiarity with repositories and the size and maturity of the repositories both contribute to the observed slowdown, and these factors do not apply in many software development settings," they say. The authors go on to note that their findings don't imply current AI systems are not useful or that future AI models won't do better. ®
[5]
AI slows down some experienced software developers, study finds
SAN FRANCISCO, July 10 (Reuters) - Contrary to popular belief, using cutting-edge artificial intelligence tools slowed down experienced software developers when they were working in codebases familiar to them, rather than supercharging their work, a new study found. AI research nonprofit METR conducted the in-depth study, opens new tab on a group of seasoned developers earlier this year while they used Cursor, a popular AI coding assistant, to help them complete tasks in open-source projects they were familiar with. Before the study, the open-source developers believed using AI would speed them up, estimating it would decrease task completion time by 24%. Even after completing the tasks with AI, the developers believed that they had decreased task times by 20%. But the study found that using AI did the opposite: it increased task completion time by 19%. The study's lead authors, Joel Becker and Nate Rush, said they were shocked by the results: prior to the study, Rush had written down that he expected "a 2x speed up, somewhat obviously." The findings challenge the belief that AI always makes expensive human engineers much more productive, a factor that has attracted substantial investment into companies selling AI products to aid software development. AI is also expected to replace entry-level coding positions. Dario Amodei, CEO of Anthropic, recently told Axios that AI could wipe out half of all entry-level white collar jobs in the next one to five years. Prior literature on productivity improvements has found significant gains: one study found using AI sped up coders by 56%, opens new tab, another study found developers were able to complete 26% more tasks, opens new tab in a given time. But the new METR study shows that those gains don't apply to all software development scenarios. In particular, this study showed that experienced developers intimately familiar with the quirks and requirements of large, established open source codebases experienced a slowdown. Other studies often rely on software development benchmarks for AI, which sometimes misrepresent real-world tasks, the study's authors said. The slowdown stemmed from developers needing to spend time going over and correcting what the AI models suggested. "When we watched the videos, we found that the AIs made some suggestions about their work, and the suggestions were often directionally correct, but not exactly what's needed," Becker said. The authors cautioned that they do not expect the slowdown to apply in other scenarios, such as for junior engineers or engineers working in codebases they aren't familiar with. Still, the majority of the study's participants, as well as the study's authors, continue to use Cursor today. The authors believe it is because AI makes the development experience easier, and in turn, more pleasant, akin to editing an essay instead of staring at a blank page. "Developers have goals other than completing the task as soon as possible," Becker said. "So they're going with this less effortful route." Reporting by Anna Tong in San Francisco; Editing by Sonali Paul Our Standards: The Thomson Reuters Trust Principles., opens new tab Suggested Topics:Artificial Intelligence Anna Tong Thomson Reuters Anna Tong is a correspondent for Reuters based in San Francisco, where she reports on the technology industry. She joined Reuters in 2023 after working at the San Francisco Standard as a data editor. Tong previously worked at technology startups as a product manager and at Google where she worked in user insights and helped run a call center. Tong graduated from Harvard University.
[6]
Study shows AI coding assistants actually slow down experienced developers
Serving tech enthusiasts for over 25 years. TechSpot means tech analysis and advice you can trust. Cutting corners: In a surprising turn for the fast-evolving world of artificial intelligence, a new study has found that AI-powered coding assistants may actually hinder productivity among seasoned software developers, rather than accelerating it, which is the main reason devs use these tools. The research, conducted by the non-profit Model Evaluation & Threat Research (METR), set out to measure the real-world impact of advanced AI tools on software development. Over several months in early 2025, METR observed 16 experienced open-source developers as they tackled 246 genuine programming tasks - ranging from bug fixes to new feature implementations - on large code repositories they knew intimately. Each task was randomly assigned to either permit or prohibit the use of AI coding tools, with most participants opting for Cursor Pro paired with Claude 3.5 or 3.7 Sonnet when allowed to use AI. Before beginning, developers confidently predicted that AI would make them 24 percent faster. Even after the study concluded, they still believed their productivity had improved by 20 percent when using AI. The reality, however, was starkly different. The data showed that developers actually took 19 percent longer to finish tasks when using AI tools, a result that ran counter not only to their perceptions but also to the forecasts of experts in economics and machine learning. The researchers dug into possible reasons for this unexpected slowdown, identifying several contributing factors. First, developers' optimism about the usefulness of AI tools often outpaced the technology's actual capabilities. Many participants were highly familiar with their codebases, leaving little room for AI to offer meaningful shortcuts. The complexity and size of the projects - often exceeding a million lines of code - also posed a challenge for AI, which tends to perform better on smaller, more contained problems. Furthermore, the reliability of AI suggestions was inconsistent; developers accepted less than 44 percent of the code it generated, spending significant time reviewing and correcting these outputs. Finally, AI tools struggled to grasp the implicit context within large repositories, leading to misunderstandings and irrelevant suggestions. The study's methodology was rigorous. Each developer estimated how long a task would take with and without AI, then worked through the issues while recording their screens and self-reporting the time spent. Participants were compensated $150 per hour to ensure professional commitment to the process. The results remained consistent across various outcome measures and analyses, with no evidence that experimental artifacts or bias influenced the findings. Researchers caution that these results should not be overgeneralized. The study focused on highly skilled developers working on familiar, complex codebases. AI tools may still offer greater benefits to less experienced programmers or those working on unfamiliar or smaller projects. The authors also acknowledge that AI technology is evolving rapidly, and future iterations could yield different outcomes. Despite the slowdown, many participants and researchers continue to use AI coding tools. They note that, while AI may not always speed up the process, it can make certain aspects of development less mentally taxing, transforming coding into a task that is more iterative and less daunting.
[7]
Why "use AI" may not be the answer to boosting software productivity
Driving the news: The study by METR, a nonprofit independent research outfit, looked at experienced programmers working on large, established open-source projects. * It found that these developers believed that using AI tools helped them perform 20% faster -- but they actually worked 19% slower. * The study appears rigorous and well-designed, but it's small (only 16 programmers participated, completing 246 tasks). Zoom out: For decades, industry visionaries have dreamed of a holy grail called "natural language programming" that would allow people to instruct computers using everyday speech, without needing to write code. * As large language models' coding prowess became evident, it appeared this milestone had been achieved. * "The hottest new programming language is English," declared AI guru (and OpenAI cofounder) Andrej Karpathy on X early in 2023, soon after ChatGPT's launch. * In February, Karpathy also coined the term "vibe-coding" -- meaning the quick creation of rough-code prototypes for new projects by just telling your favorite AI to whip up something from scratch. * The most fervent believers in software's AI-written future say that human beings will do less and less programming, and engineers will turn into some combination of project manager, specifications-refiner and quality-checker. * Either that, or they'll be unemployed. Zoom in: AI-driven coding tends to be more valuable in building new systems from the ground up than in extending or refining existing systems, particularly when they're big. * While innovative new products get the biggest buzz and make the largest fortunes, the bulk of software work in most industries consists of more mundane maintenance labor. * Anything that makes such work more efficient could save enormous amounts of time and money. Yes, but: This is where the METR study found AI actually slowed experienced programmers down. * One key factor was that human developers found AI-generated code unreliable and ended up devoting extra time to reviewing, testing and fixing it. * "One developer notes that he 'wasted at least an hour first trying to [solve a specific issue] with AI' before eventually reverting all code changes and just implementing it without AI assistance," the study says. Between the lines: The study authors note that AI coding tools are improving at a rapid enough rate that their findings could soon be obsolete. * They also warn against generalizing too broadly from their findings and note the many counter-examples of organizations and projects that have made productivity gains with coding tools. One notable caution that's inescapable from the study's findings: Don't trust self-reporting of productivity outcomes. We're not always the best judges of our own efficiency. * Another is that it's relatively easy to measure productivity in terms of "task completion" but very hard to assess total added value in software-making. * Thousands of completed tickets can be meaningless -- if, for instance, a program is about to be discontinued. Meanwhile, one big new insight can change everything in ways no productivity metric can capture. The big picture: The software community is divided over whether to view the advent of AI coding with excitement or dread.
[8]
AI Promised Faster Coding. This Study Disagrees
In just the last couple of years, AI has totally transformed the world of software engineering. Writing your own code (from scratch, at least,) has become quaint. Now, with tools like Cursor and Copilot, human developers can marshal AI to write code for them. The human role is now to understand what to ask the models for the best results, and to iron out the inevitable problems that crop up along the way. Conventional wisdom states that this has accelerated software engineering significantly. But has it? A new study by METR, published last week, set out to measure the degree to which AI speeds up the work of experienced software developers. The results were very unexpected. What the study found -- METR measured the speed of 16 developers working on complex software projects, both with and without AI assistance. After finishing their tasks, the developers estimated that access to AI had accelerated their work by 20% on average. In fact, the measurements showed that AI had slowed them down by about 20%. The results were roundly met with surprise in the AI community. "I was pretty skeptical that this study was worth running, because I thought that obviously we would see significant speedup," wrote David Rein, a staffer at METR, in a post on X. Why did this happen? -- The simple technical answer seems to be: while today's LLMs are good at coding, they're often not good enough to intuit exactly what a developer wants and answer perfectly in one shot. That means they can require a lot of back and forth, which might take longer than if you just wrote the code yourself. But participants in the study offered several more human hypotheses, too. "LLMs are a big dopamine shortcut button that may one-shot your problem," wrote Quentin Anthony, one of the 16 coders who participated in the experiment. "Do you keep pressing the button that has a 1% chance of fixing everything? It's a lot more enjoyable than the grueling alternative." (It's also easy to get sucked into scrolling social media while you wait for your LLM to generate an answer, he added.)
[9]
What Actually Happens When Programmers Use AI Is Hilarious, According to a New Study
AI has taken the programming world by storm, with a flurry of speculation about the tech replacing human coders, and Google's CEO recently claiming that 25 percent of the company's code is now AI-generated. But it's possible that in practice, AI is actually hindering efficient software development. As flagged by Ars Technica, a new study from the nonprofit Model Evaluation and Threat Research (METR) found that in practice, programmers are actually slower when using AI assistance tools than making do without them. In the study, 16 programmers were given roughly 250 coding tasks and asked to either use no AI assistance, or employ what METR characterized as "early-2025 AI tools" like Anthropic's Claude and Cursor Pro. The results were surprising, and perhaps profound: the programmers actually spent 19 percent more time when using AI than when forgoing it. When measuring the programmers' screen time, the METR team found that when using AI tools, their subjects did indeed spend less time actively coding, debugging, researching, or testing -- but that was because they instead spent their time "reviewing AI outputs, prompting AI systems, and waiting for AI generations." Ultimately, the AI-assisted cohort accepted less than 44 percent of the tips provided by the tools without any modification, and nine percent of the total time they spent on tasks was eaten up by fixing the AI's outputs. (That's not entirely surprising; companies that laid off people to replace them with AI are now having to hire new contractors to fix the technology's mistakes.) Despite the results, however, the programmers in the study believed initially that AI would reduce nearly a quarter of the time spent on tasks -- and afterward, they still thought those tools sped them up by 20 percent. Perhaps contributing to the disconnect between expectation and reality with AI coding are all those benchmarks claiming that these tools, and others like OpenAI's o3 reasoning model and Google's Gemini, are spitting out immaculate code at record speeds. As Ars notes, however, those benchmarks rely on "synthetic, algorithmically scorable tasks created specifically" for such tests, which may be a poor reflection of the messy world of actual coding. This isn't the first time that the narrative of AI's dominance in coding has been rattled by research findings. Earlier this year, for instance, OpenAI researchers released a paper declaring, based on benchmarking tests from real-world coding tasks, that even the most advanced large language models "are still unable to solve the majority" of problems. AI is also giving rise to other unintended consequences in the world of software development. Untrained programmers who engage in so-called "vibe coding," or writing and fixing code by describing what they want to an AI, are not only screwing up their work itself, but also self-sabotaging by introducing severe cybersecurity risks to the finished product as well. With so many tech workers being laid off in favor of automation, it stands to reason that code generated after such firings is less accurate and secure than it was when humans were writing it -- but thus far, that hasn't seemed to matter much to the people doing the job cuts.
[10]
Report - AI tools slow down experienced developers by 19%. A wake up call for industry hype?
A new study from Model Evaluation & Threat Research (METR) has delivered findings that contradict much of what technology vendors have been telling their customers about AI's impact on developer productivity for the past few years. The research, which measured the real-world impact of early-2025 AI tools on experienced open-source developers, found that developers took 19% longer to complete tasks when using AI assistance. The randomized controlled trial, conducted between February and June 2025, involved 16 experienced developers working on 246 real tasks across mature open-source repositories. These participants averaged five years' experience and 1,500 commits on their respective projects. The results challenge not only vendor marketing claims but also the expectations of the developers themselves (the hype has had an impact, it seems!!), who forecast that AI would reduce their completion time by 24%. The researchers state in their report: We find that when developers use AI tools, they implement issues in 19% more time on average and nearly all quantiles of observed implementation time see AI-allowed issues taking longer. That is, developers are slower when using AI is allowed. Colloquially, we refer to this result that issues with AI-allowed take longer than issues with AI-disallowed as slowdown. Perhaps more concerning for technology leaders is that developers continued to believe AI had sped them up even after completing the study. Post-experiment, participants estimated that AI had reduced their completion time by 20%, despite data showing the opposite. The study randomly assigned developers to complete real issues either with or without AI assistance. When allowed, developers primarily used Cursor Pro with Claude 3.5/3.7 Sonnet - considered frontier models at the time. Screen recordings verified compliance, and developers tracked their implementation time for each task. Expert forecasts missed the mark by an even wider margin. Machine learning experts predicted a 38% reduction in completion time, while economics experts forecasted a 39% improvement. As the researchers note: "This observed slowdown serves as some evidence that AI capabilities in the wild may be lower than results on commonly used benchmarks may suggest. So, why would real-world implementation of AI in a developer environment slow experienced coders down, when all the demos and claims we've been seeing in the market claim the opposite? The study identified five key factors contributing to the productivity loss: While the study focused on experienced developers, it's also worth questioning AI's impact across various skill levels. This is something that I've raised previously, asking: if AI tools benefit/can execute the work of lower skilled or less experienced workers, where do employees gain experience? The researchers acknowledge that their results don't imply "that current AI tools do not often improve developer's productivity" -- particularly for less experienced developers or those working on unfamiliar codebases.The report notes: One important question that emerges given these impressive results is whether productivity gains are captured by individuals of all experience levels. The canonical framework of Agrawal et al. treats AI as a fall in the cost of prediction, with distributional consequences depending on which complementary sub-problems the tool does not solve. Existing empirical work on the micro-level effects of generative AI tools tends to find that access to these tools benefits less experienced workers more, compressing performance distributions This creates a potential challenge for the industry. If AI tools primarily benefit junior developers by helping with routine tasks, but experienced developers find them counterproductive, organizations face difficult questions about skill development. How will junior developers who rely heavily on AI assistance develop the deep contextual knowledge that currently makes senior developers effective? The study's authors note that "our results are consistent with small greenfield projects or development in unfamiliar codebases seeing substantial speedup from AI assistance," suggesting AI's value varies significantly based on context. While the study's findings are notable, several factors should be considered when interpreting the results: The study's findings suggest enterprise technology leaders should approach AI coding tools with measured expectations. While vendor marketing often promises dramatic productivity gains, the reality appears more nuanced. The research doesn't prove that AI tools are inherently bad for development productivity, but it does highlight that their benefits depend heavily on context. Key considerations for organizations include: However, notably, 69% of developers continued using Cursor after the experiment ended, suggesting that despite the productivity loss, developers found other value in the tools -- perhaps in reduced cognitive effort or enhanced creativity. The study suggests the need for more nuanced understanding of AI coding tools rather than wholesale adoption or rejection. The researchers note that "AI systems that have higher fundamental reliability, lower latency, and/or are better elicited...could speed up developers in our setting. They also highlight preliminary evidence that autonomous AI agents can "correctly implement the core functionality of issues," though they typically fail on documentation, styling rules, and testing requirements. This suggests continued improvement in AI capabilities may address current limitations. For enterprise technology buyers, the key takeaway isn't that AI coding tools lack value, but that their impact is complex and context-dependent. Organizations should: As METR notes, "these results do not imply that future models will not speed up developers in this exact setting -- this is a salient possibility given the rapid pace of progress in AI capabilities recently." However, METR's research serves as an important reality check for an industry that sometimes confuses potential with current capability. AI coding tools will undoubtedly improve, but until they can handle the complexity and context that experienced developers navigate daily, their impact will remain mixed. Organizations would do well to approach these tools with clear eyes and careful measurement, rather than blind faith in vendor promises.
[11]
Experienced Developers are Slower with AI Than Without It, Says METR | AIM
Developers may need to rethink their approach to using AI, and evaluate if it is helping them or not. A new study conducted by Model Evaluation & Threat Research (METR) reveals that experienced open-source developers were slower when using generative AI tools, which was contrary to their initial expectations. The study observed a 19% increase in task completion time when developers used AI systems, such as Claude 3.5/3.7 Sonnet and Cursor Pro. The randomised controlled trial involved 16 veteran open-source contributors working on 246 real-world tasks across mature repositories they had contributed to for an average of five years. Developers forecasted that AI assistance would make them 24% faster, and later estimated a perceived speed-up of 20%. However, actual results showed the opposite -- developers completed tasks slower with AI enabled. This suggests a large disconnect between perceived and actual AI productivity benefits, the researchers noted in the paper. Experts from economics and machine learning also misjudged AI's impact, forecasting a 39% and 38% speed-up, respectively. The METR team analysed 143 hours of screen recordings to understand the slowdown, identifying factors such as the low reliability of AI outputs, difficulty in adapting to complex codebases, and the AI's failure to leverage repository context. Developers spent more time reviewing AI suggestions and waiting for generations rather than coding directly. While 93% had prior experience with LLMs, only 44% had used Cursor Pro before the study. Despite this, the slowdown was consistent even among Cursor veterans. METR cautioned against overgeneralising the results, noting that AI may still be useful in other contexts, especially for novice developers or greenfield projects. But the findings challenge the assumption that state-of-the-art AI tools inherently boost productivity for experienced engineers. The report emphasises the importance of carefully evaluating AI tools in real-world settings and cautions against relying solely on benchmark performance or anecdotal success stories.
[12]
AI Might Be Slowing Down Some Employees' Work, a Study Says
This is a little technical, but it's a fascinating result that cuts through some of the hype surrounding this very buzzy technology. Essentially while coding experts thought AI would help them in their work, the hard data shows it actually slowed down the coders who used it. As Marcus put it, "if this is a general, replicable finding," that can be extended beyond the realm of coding into other work sectors where AI tools are becoming commonplace, "it's a serious blow to generative AI's flagship use case. People might be imagining productivity gains that they are not getting, and ignoring real-world costs, to boot." The real-world costs Marcus mentions here are evident from METR's study: slowing down an expert developer by nearly 20 percent means their efficiency is taking a hit, which has direct business costs. Naturally, there are some details of the study that need to be considered. For example, this was a short window of time (early 2025) that encompassed a suite of AI tools that is evolving and improving every day. It's also a niche test group. METR noted they were "experienced developers working on large, complex codebases that, often, they helped build," and this factor may have played into the slowdown that AI use caused. METR noted it expects "AI tools provide greater productivity benefits in other settings (e.g. on smaller projects, with less experienced developers, or with different quality standards)." The impact of AI tools on the coding profession remains a subject of heated debate. A recent Microsoft study, for example, raised an alarm that some young coders, fresh from college, already rely on AI help so much that they don't really understand the details or science behind the code they write -- potentially setting them up for failure later on.
[13]
AI slows down some experienced software developers, study finds - The Economic Times
Contrary to popular belief, using cutting-edge artificial intelligence tools slowed down experienced software developers when they were working in codebases familiar to them, rather than supercharging their work, a new study found. AI research nonprofit METR conducted the in-depth study on a group of seasoned developers earlier this year while they used Cursor, a popular AI coding assistant, to help them complete tasks in open-source projects they were familiar with. Before the study, the open-source developers believed using AI would speed them up, estimating it would decrease task completion time by 24%. Even after completing the tasks with AI, the developers believed that they had decreased task times by 20%. But the study found that using AI did the opposite: it increased task completion time by 19%. The study's lead authors, Joel Becker and Nate Rush, said they were shocked by the results: prior to the study, Rush had written down that he expected "a 2x speed up, somewhat obviously." The findings challenge the belief that AI always makes expensive human engineers much more productive, a factor that has attracted substantial investment into companies selling AI products to aid software development. AI is also expected to replace entry-level coding positions. Dario Amodei, CEO of Anthropic, recently told Axios that AI could wipe out half of all entry-level white collar jobs in the next one to five years. Prior literature on productivity improvements has found significant gains: one study found using AI sped up coders by 56%, another study found developers were able to complete 26% more tasks in a given time. But the new METR study shows that those gains don't apply to all software development scenarios. In particular, this study showed that experienced developers intimately familiar with the quirks and requirements of large, established open source codebases experienced a slowdown. Other studies often rely on software development benchmarks for AI, which sometimes misrepresent real-world tasks, the study's authors said. The slowdown stemmed from developers needing to spend time going over and correcting what the AI models suggested. "When we watched the videos, we found that the AIs made some suggestions about their work, and the suggestions were often directionally correct, but not exactly what's needed," Becker said. The authors cautioned that they do not expect the slowdown to apply in other scenarios, such as for junior engineers or engineers working in codebases they aren't familiar with. Still, the majority of the study's participants, as well as the study's authors, continue to use Cursor today. The authors believe it is because AI makes the development experience easier, and in turn, more pleasant, akin to editing an essay instead of staring at a blank page. "Developers have goals other than completing the task as soon as possible," Becker said. "So they're going with this less effortful route."
Share
Copy Link
A new study by METR finds that AI coding tools unexpectedly increased task completion time by 19% for experienced open-source developers, contradicting expectations of increased efficiency.
A groundbreaking study conducted by Model Evaluation and Threat Research (METR) has revealed that AI coding tools, contrary to popular belief, can significantly slow down experienced software developers working on complex projects. The research, which involved 16 seasoned open-source developers completing 246 real-world tasks, found that using AI tools increased task completion time by 19% 1.
Source: Inc. Magazine
The randomized controlled trial focused on developers with multiple years of experience working on specific open-source repositories. Tasks included bug fixes, feature implementations, and refactoring work. Half of the tasks were completed using AI tools like Cursor Pro or Anthropic's Claude, while the other half were done without AI assistance 2.
Surprisingly, developers initially expected AI tools to reduce task completion time by 24%. Even after completing the tasks, they believed the AI had made them 20% faster. However, the actual results showed a 19% increase in completion time when using AI tools 3.
Source: TIME
The study identified several factors contributing to the unexpected slowdown:
Time spent reviewing AI outputs: Developers accepted less than 44% of AI-generated code without modification, spending significant time reviewing and correcting suggestions 4.
Prompting and waiting: Considerable time was spent crafting prompts for AI systems and waiting for responses.
Complexity of existing codebases: The AI tools struggled with large, mature repositories averaging 10 years of age and over 1 million lines of code 1.
High quality standards: The repositories had very high quality bars for code contributions, limiting AI's effectiveness 5.
Lack of contextual understanding: AI tools couldn't utilize important tacit knowledge or context about the codebase that human developers possessed 1.
The study's findings challenge the widespread belief that AI tools universally enhance coding productivity. However, the researchers caution against broad generalizations, noting that the results may not apply to all software development scenarios 2.
Source: Ars Technica
The authors remain optimistic about the future of AI in coding. They suggest that improvements in reliability, latency, and output relevance could lead to efficiency gains. There is already preliminary evidence that newer AI models, such as Claude 3.7, show promise in correctly implementing core functionality for some of the studied repositories 1.
This study adds to a growing body of research examining the real-world impact of AI tools on productivity. While some studies have shown significant gains in coding efficiency, others have found that AI can introduce mistakes and even security vulnerabilities 2.
The findings may have implications for companies investing heavily in AI-powered coding tools and for predictions about AI's impact on the job market. However, it's important to note that the study focused on a specific scenario involving experienced developers working on complex, established codebases 5.
NASA and IBM have developed Surya, an open-source AI model that can predict solar flares and space weather, potentially improving the protection of Earth's critical infrastructure from solar storms.
5 Sources
Technology
6 hrs ago
5 Sources
Technology
6 hrs ago
Meta introduces an AI-driven voice translation feature for Facebook and Instagram creators, enabling automatic dubbing of content from English to Spanish and vice versa, with plans for future language expansions.
8 Sources
Technology
22 hrs ago
8 Sources
Technology
22 hrs ago
OpenAI CEO Sam Altman reveals plans for GPT-6, focusing on memory capabilities to create more personalized and adaptive AI interactions. The upcoming model aims to remember user preferences and conversations, potentially transforming the relationship between humans and AI.
2 Sources
Technology
22 hrs ago
2 Sources
Technology
22 hrs ago
Chinese AI companies DeepSeek and Baidu are making waves in the global AI landscape with their open-source models, challenging the dominance of Western tech giants and potentially reshaping the AI industry.
2 Sources
Technology
6 hrs ago
2 Sources
Technology
6 hrs ago
A comprehensive look at the emerging phenomenon of 'AI psychosis', its impact on mental health, and the growing concerns among experts and tech leaders about the psychological risks associated with AI chatbots.
3 Sources
Technology
6 hrs ago
3 Sources
Technology
6 hrs ago