2 Sources
[1]
Coders are refusing to work without AI -- and that could come back to bite them
In 2026, you cannot snatch AI coding tools out of developers' vice-grip hands, researchers have discovered. But while AI is undoubtedly helping coders produce code faster, it may not be producing better code, other researchers warn. And that could cause problems down the road for them. Specifically, in February 2026, respected AI research lab METR published a surprising revelation: most developers won't work, even on a limited number of tasks, without AI anymore. METR had hoped to provide an update to some groundbreaking research published a few months earlier, in 2025, on AI coding productivity. In it, researchers measured how much time open source developers took to do tasks by hand versus with AI. While developers in that study reported that AI was making them more productive, they were shocked to learn it actually slowed them down. Sure, it generated code faster, but then they spent extra time finding and fixing errors, steering the AI and waiting on it to complete tasks. When METR set out to repeat the experiment to measure advances in AI and coder proficiency, they couldn't. Devs weren't willing to participate "because they do not wish to work without AI" even just for the study, the researchers confessed. Instead, METR published a survey in May that allowed technical employees to self-report their AI productivity gains. Not surprisingly, they perceived that AI made them twice as valuable to their organizations. But recent headlines about the wild expense of so-called tokenmaxxing, coupled with a smattering of recent research, make such self-perceptions dubious. Tokenmaxxing, or using the number of tokens a person uses as a proxy for productivity with AI, has been the trend of 2026 so far. And it may already be over. Amazon shut down its internal token-tracking leaderboard called Kirorank after employees were gaming it by using AI agents excessively, and running up costs, the Financial Times reported this week. The employees proved that AI use does not automatically translate to increased productivity. Uber blew through its 2026 AI budget within the first four months of the year, The Information reported. COO Andrew Macdonald recently said on a podcast that such spending hadn't led to a measurable increase in projects or productivity. AI-generated code also doesn't necessarily reduce ongoing code maintenance needs, and may even increase it, programmer and author James Shore elegantly argued in a blog post that went viral on Hacker News. "You write code twice as quick now? Better hope you've halved your maintenance costs," he wrote. "Otherwise, you're screwed. You're trading a temporary speed boost for permanent indenture." There's other evidence that AI can increases code maintenance woes. A viral tweet from Aiswarya Sankar, founder and CEO of reliability engineering agent startup Entelligence AI, proclaims that companies are spending 44% of their tokens on bug fixes that their AI generated. Code reviewing tool company Code Rabbit says it analyzed open source pull requests and found that AI produced 1.7x more problems than human code. Those are, admittedly, self-serving stats from those trying to sell AI code reviewing tools. Yet independent researchers have also found such issues. Researchers from the respected Singapore Management University published a report in April warning that "AI-generated code can introduce long-term maintenance costs into real software projects." Given that programmers love their AI assistants, what's the solution? Well, those who want to sell you AI coding agents say devs can just use AI coding agents to do the bone wearing tasks of fixing code as fast as AI spits it out. That's what Cognition founder CEO Scott Wu suggests, maker of AI coding agent Devin. But even he admits that, while Devin can work independently, he'd currently rate its skill between a junior and mid-level programmer, depending on the task. This is not a hand-it-off and forget it solution. The SMU researchers suggest a more human approach. Programmers should know what tasks AI does and doesn't do well as deeply as they know their favorite coding languages. They need strong quality assurance systems designed for AI and they are stuck with carefully reviewing the AI's work as if it was a junior dev. Meanwhile, the researchers say (and Wu agrees), humans should still be doing the big-picture work like software architecture and security design.
[2]
Developers won't work without AI anymore. The research says it might be making them worse.
Devs refuse to code without AI, but research shows it may slow them down. Amazon killed its token leaderboard. Uber blew its AI budget in four months. In February 2026, AI research lab METR tried to repeat a groundbreaking study measuring how much time developers take to complete tasks with and without AI. It could not. Developers refused to participate because they would not work without AI, even for a limited number of tasks in a research setting. The original 2025 study had produced a surprising result. Developers reported that AI made them more productive. The data showed the opposite: AI actually slowed them down because they spent extra time finding and fixing errors, steering the AI, and waiting for it to complete tasks. Unable to replicate the experiment, METR published a survey in May instead. Developers self-reported that AI made them twice as valuable to their organisations. Recent evidence from multiple sources suggests that perception is wrong. Amazon shut down an internal token-tracking leaderboard called Kirorank this week, the Financial Times reported. Employees were gaming it by using AI agents excessively and running up costs. The leaderboard proved that AI use does not automatically translate to increased productivity. Uber blew through its entire 2026 AI budget within the first four months of the year, The Information reported. COO Andrew Macdonald said on a podcast that the spending had not led to a measurable increase in projects or productivity. Two of the world's most technically sophisticated companies spent heavily on AI coding tools and could not demonstrate a return. The term for this pattern is "tokenmaxxing": using token consumption as a proxy for productivity. It has been the corporate trend of 2026. It may already be over. The Amazon and Uber examples show that measuring AI adoption by volume of use, rather than quality of output, produces the wrong incentives. Salesforce projects $300 million in Anthropic token spending this year. CEO Marc Benioff called for an "intermediary layer" that could route tokens intelligently between frontier and cheaper models. The call for a routing layer is an implicit admission that not every token produces value, and that spending needs to be matched to task complexity. The code quality problem is the deeper issue. Programmer and author James Shore argued in a viral blog post that faster code generation without reduced maintenance costs is a trap. "You write code twice as quick now? Better hope you've halved your maintenance costs," he wrote. "Otherwise, you're screwed. You're trading a temporary speed boost for permanent indenture." The data supports the warning. Entelligence AI, a reliability engineering startup, claims that companies spend 44% of their tokens on bug fixes that their AI generated. CodeRabbit, a code-reviewing tool, analysed open-source pull requests and found that AI produced 1.7 times more problems than human code. Both companies sell AI code review tools, which makes the statistics self-serving but not necessarily wrong. Independent researchers at Singapore Management University published a report in April reaching the same conclusion. "AI-generated code can introduce long-term maintenance costs into real software projects," they wrote. The code ships faster. The bugs arrive later. The maintenance debt compounds. The question engineering leaders are avoiding is whether the productivity gains from AI coding tools are real or perceived. If developers refuse to work without AI but the AI is generating more bugs than it prevents, the net effect could be negative. The dependency has outpaced the evidence. Cognition founder Scott Wu, maker of AI coding agent Devin, admits the tool's skill level sits between a junior and mid-level programmer depending on the task. It is not a hand-off-and-forget solution. The SMU researchers recommend treating AI output the way you would treat code from a junior developer: review everything, maintain strong QA systems, and keep humans responsible for architecture and security design. The job market reflects the contradiction. Companies are hiring "vibe coders" and forward deployed engineers at unprecedented rates while simultaneously discovering that the tools those roles depend on may not produce the quality gains their hiring assumes. The AI coding market is growing faster than the evidence that it works. Developers will not go back to coding without AI. That ship has sailed. The question is whether the industry will build the quality assurance infrastructure, the routing layers, and the review processes needed to ensure that faster code production does not become faster technical debt production. Right now, the answer is no. Developers love the tools. The tools may not love them back.
Share
Copy Link
AI research lab METR discovered developers won't participate in studies without AI coding tools, even for limited tasks. But mounting evidence from Amazon, Uber, and academic researchers suggests AI reliance may be creating more problems than it solves, with companies spending 44% of tokens on AI-generated bug fixes.
In February 2026, AI research lab METR attempted to update groundbreaking research from 2025 on developer productivity with AI coding tools. The original study had revealed a paradox: while developers reported feeling more productive with AI, the data showed AI actually slowed them down because they spent extra time finding and fixing code errors, steering the AI, and waiting for it to complete tasks
1
. When METR tried to replicate the experiment, they hit an unexpected wall. Coders refusing to work without AI made the study impossible, as developers wouldn't participate even for a limited number of tasks in a research setting2
.Instead, METR published a survey in May allowing technical employees to self-report their AI reliance and perceived gains. Not surprisingly, developers claimed AI made them twice as valuable to their organizations
1
. But recent evidence from major tech companies and independent researchers suggests this perception may be dangerously disconnected from reality.The trend of tokenmaxxing—using token consumption as a proxy for productivity—has dominated 2026, but it may already be collapsing under scrutiny. Amazon shut down its internal token-tracking leaderboard called Kirorank after employees gamed the system by using AI agents excessively and running up costs without demonstrating actual productivity gains, the Financial Times reported this week
1
. Uber blew through its entire 2026 AI budget within the first four months of the year, The Information reported. COO Andrew Macdonald admitted on a podcast that such spending hadn't led to a measurable increase in projects or developer productivity2
.These examples from two of the world's most technically sophisticated companies reveal a troubling pattern: massive AI spending doesn't automatically translate to better outcomes in software development. Salesforce projects $300 million in Anthropic token spending this year, with CEO Marc Benioff calling for an "intermediary layer" to route tokens intelligently between frontier and cheaper models—an implicit admission that not every token produces value
2
.The negative consequences of AI dependence extend beyond immediate spending concerns. Programmer and author James Shore argued in a viral blog post that faster code generation creates a dangerous trap. "You write code twice as quick now? Better hope you've halved your maintenance costs," he wrote. "Otherwise, you're screwed. You're trading a temporary speed boost for permanent indenture"
1
.Data supports this warning about AI's impact on coding quality. Aiswarya Sankar, founder and CEO of reliability engineering startup Entelligence AI, claims companies spend 44% of their tokens on bug fixes that their AI-generated code created. Code reviewing tool company CodeRabbit analyzed open source pull requests and found that AI produced 1.7 times more problems than human code
1
. While these statistics come from companies selling AI code review tools, independent researchers at Singapore Management University reached the same conclusion in an April report, warning that "AI-generated code can introduce long-term maintenance costs into real software projects"2
.Related Stories
The dependency has outpaced the evidence. Cognition founder Scott Wu, maker of AI coding agent Devin, admits his tool's skill level sits between a junior and mid-level programmer depending on the task—not a hand-off-and-forget solution
2
. The SMU researchers recommend treating AI output the way you would code from a junior developer: implement strong quality assurance systems designed for AI, carefully review everything, and keep humans responsible for software architecture and security design1
.The question engineering leaders must address is whether the productivity gains from AI coding tools are real or merely perceived. If developers refuse to work without AI but the tools generate more bugs than they prevent, the net effect could be negative. The AI coding market is growing faster than the evidence that it works, creating a critical need for quality assurance infrastructure and review processes to ensure faster code production doesn't become faster technical debt production
2
.Summarized by
Navi
11 Jul 2025•Technology

13 May 2026•Technology

07 Apr 2026•Technology

1
Business and Economy

2
Technology

3
Policy and Regulation
