2 Sources
2 Sources
[1]
AI-authored code needs more attention, contains worse bugs
CodeRabbit review of pull requests shows meatbags beat clankers Generating code using AI increases the number of issues that need to be reviewed and the severity of those issues. CodeRabbit, an AI-based code review platform, made that determination by looking at 470 open source pull requests for its State of AI vs Human Code Generation report. The report finds that AI-generated code contains significantly more defects of logic, maintainability, security, and performance than code created by people. On average, AI-generated pull requests (PRs) include about 10.83 issues each, compared with 6.45 issues in human-generated PRs. That's about 1.7x more when AI is involved, meaning longer code reviews and increased risk of defects. Problems caused by AI-generated PRs also tend to be more severe than human-made messes. AI-authored PRs contain 1.4x more critical issues and 1.7x more major issues on average than human-written PRs, the report says. Machine-generated code therefore seems to require reviewers to deal with a large volume of issues that are more severe than those present in human-generated code. These findings echo a report issued last month by Cortex, maker of an AI developer portal. The company's Engineering in the Age of AI: 2026 Benchmark Report [PDF] found that PRs per author increased 20 percent year-over-year even as incidents per pull request increased by 23.5 percent, and change failure rates rose around 30 percent. The CodeRabbit report found that AI-generated code falls short of meatbag-made code across the major issue categories. The bots created more logic and correctness errors (1.75x), more code quality and maintainability errors (1.64x), more security findings (1.57x), and more performance issues (1.42x). In terms of specific security concerns, AI-generated code was 1.88x more likely to introduce improper password handling, 1.91x more likely to make insecure object references, 2.74x more likely to add XSS vulnerabilities, and 1.82x more likely to implement insecure deserialization than human devs. One area where AI outshone people was spelling - spelling errors were 1.76x more common in human PRs than machine-generated ones. Also, human-authored code had 1.32x more testability issues than AI stuff. "These findings reinforce what many engineering teams have sensed throughout 2025," said David Loker, director of AI at CodeRabbit, in a statement. "AI coding tools dramatically increase output, but they also introduce predictable, measurable weaknesses that organizations must actively mitigate." CodeRabbit cautions that its methodology has limitations, such as its inability to be certain that PRs labeled as human-authored actually were exclusively authored by humans. Other studies based on different data have come to different conclusions. For example, an August 2025 paper by University of Naples researchers, "Human-Written vs. AI-Generated Code: A Large-Scale Study of Defects, Vulnerabilities, and Complexity," found that AI-generated Python and Java code "is generally simpler and more repetitive, yet more prone to unused constructs and hardcoded debugging, while human-written code exhibits greater structural complexity and a higher concentration of maintainability issues." Back in January 2025, researchers from Monash University (Australia) and University of Otago (New Zealand) published a paper titled "Comparing Human and LLM Generated Code: The Jury is Still Out!" "Our results show that although GPT-4 is capable of producing coding solutions, it frequently produces more complex code that may need more reworking to ensure maintainability," the southern hemisphere boffins wrote. "On the contrary, however, our outcomes show that a higher number of test cases passed for code generated by GPT-4 across a range of tasks than code that was generated by humans." As to the impact of AI tools on developer productivity, researchers from Model Evaluation & Threat Research (METR) reported in July that "AI tooling slowed developers down." Your mileage may vary. We note that Microsoft patched 1,139 CVEs in 2025, according to Trend Micro researcher Dustin Childs, who claims that's the second-largest year for CVEs by volume after 2020. Microsoft says 30 percent of code in certain repos was written by AI and Copilot Actions comes with a caution about "the security implications of enabling an agent on your computer." "As Microsoft's portfolio continues to increase and as AI bugs become more prevalent, this number is likely to go higher in 2026," Childs wrote in his post. But at least we can expect fewer typos in code comments. ®
[2]
AI-authored code contains worse bugs than software crafted by humans - General Chat
Generating code using AI increases the number of issues that need to be reviewed and the severity of those issues. CodeRabbit, an AI-based code review platform, made that determination by looking at 470 open source pull requests for its State of AI vs Human Code Generation report. The report finds that AI-generated code contains significantly more defects of logic, maintainability, security, and performance than code created by people. On average, AI-generated pull requests (PRs) include about 10.83 issues each, compared with 6.45 issues in human-generated PRs. That's about 1.7x more when AI is involved, meaning longer code reviews and increased risk of defects. Problems caused by AI-generated PRs also tend to be more severe than human-made messes. AI-authored PRs contain 1.4x more critical issues and 1.7x more major issues on average than human-written PRs, the report says. The CodeRabbit report found that AI-generated code falls short of meatbag-made code across the major issue categories. The bots created more logic and correctness errors (1.75x), more code quality and maintainability errors (1.64x), more security findings (1.57x), and more performance issues (1.42x). In terms of specific security concerns, AI-generated code was 1.88x more likely to introduce improper password handling, 1.91x more likely to make insecure object references, 2.74x more likely to add XSS vulnerabilities, and 1.82x more likely to implement insecure deserialization than human devs. One area where AI outshone people was spelling - spelling errors were 1.76x more common in human PRs than machine-generated ones. Also, human-authored code had 1.32x more testability issues than AI stuff. Microsoft CEO Satya Nadella said that as much as 30% of the company's code is now written by artificial intelligence.
Share
Share
Copy Link
A CodeRabbit study analyzing 470 open source pull requests found that AI-generated code contains significantly more defects than human-written code. AI-authored PRs averaged 10.83 issues compared to 6.45 in human PRs, with 1.4x more critical issues and heightened security vulnerabilities including XSS and password handling flaws. The findings raise questions about code quality as companies like Microsoft report 30% of their code now comes from AI tools.
AI-generated code introduces substantially more problems than human-written alternatives, according to a comprehensive CodeRabbit study examining 470 open source pull requests
1
. The research reveals that AI-generated pull requests contain an average of 10.83 issues each, compared to just 6.45 issues in human-generated PRs—representing approximately 1.7x more code bugs when AI tools are involved1
2
. This disparity translates to longer code reviews and increased risk of defects making their way into production systems.
Source: The Register
The severity of problems also escalates with machine-generated contributions. AI-authored code contains 1.4x more critical issues and 1.7x more major issues on average than human-written PRs, the report indicates
2
. These findings align with separate research from Cortex, which documented that pull requests per author increased 20 percent year-over-year while incidents per pull request jumped 23.5 percent, and change failure rates climbed around 30 percent1
.The CodeRabbit study uncovered particularly troubling patterns in security vulnerabilities introduced by AI-generated code. Machine-authored contributions were 2.74x more likely to add XSS vulnerabilities, 1.91x more likely to create insecure object references, 1.88x more likely to introduce improper password handling, and 1.82x more likely to implement insecure deserialization compared to human developers
1
2
. Overall, AI code showed 1.57x more security findings than human-authored code2
.These security concerns arrive at a critical moment for the industry. Microsoft patched 1,139 CVEs in 2025, the second-largest year for CVEs by volume after 2020, according to Trend Micro researcher Dustin Childs
1
. Microsoft CEO Satya Nadella has stated that as much as 30 percent of the company's code is now written by artificial intelligence2
. Childs speculates that "as Microsoft's portfolio continues to increase and as AI bugs become more prevalent, this number is likely to go higher in 2026"1
.Beyond security, AI-generated code falls short across major issue categories that affect long-term code quality. The bots created 1.75x more logic errors and correctness issues, 1.64x more code quality and maintainability errors, and 1.42x more performance issues compared to human-authored code comparison
1
2
. These AI code maintainability issues suggest that while AI tools may accelerate initial development, they potentially create technical debt that requires significant human intervention to address.David Loker, director of AI at CodeRabbit, noted that "AI coding tools dramatically increase output, but they also introduce predictable, measurable weaknesses that organizations must actively mitigate"
1
. The severe defects in AI code manifest as tangible challenges for engineering teams managing code reviews and quality assurance processes.Related Stories
Not all studies paint the same picture. A University of Naples paper titled "Human-Written vs. AI-Generated Code: A Large-Scale Study of Defects, Vulnerabilities, and Complexity" found that AI-generated Python and Java code "is generally simpler and more repetitive, yet more prone to unused constructs and hardcoded debugging, while human-written code exhibits greater structural complexity and a higher concentration of maintainability issues"
1
. Researchers from Monash University and University of Otago observed that GPT-4 frequently produces more complex code requiring reworking for maintainability, though it achieved higher test case pass rates across various tasks1
.The AI impact on developer productivity remains contested. Model Evaluation & Threat Research (METR) reported in July that "AI tooling slowed developers down"
1
. CodeRabbit acknowledges methodological limitations in its study, including uncertainty about whether PRs labeled as human-authored were exclusively created by humans1
. AI did outperform in limited areas—spelling errors were 1.76x more common in human PRs, and human-authored code had 1.32x more testability issues than AI-generated alternatives1
2
. As organizations integrate AI coding assistants at scale, teams must balance velocity gains against the demonstrable increase in defect volume and severity that requires enhanced review processes.Summarized by
Navi
[1]