2 Sources
2 Sources
[1]
AI pair coding: Fast, but devs don't question the bot
Developers who "pair code" with an AI assistant stand to learn as much as they do in traditional human-human pairings, but also show a less critical attitude toward their silicon-based partner's output, academics have found. Pair programming is a common practice in developer circles, though it did not become a formal "pattern" until the turn of this century. The practice is credited with producing better quality code, savings in development time, and knowledge transfer. And, done right and with the right pairing, should generally make for a more pleasant experience. Yet increasingly, developers are working with code assistants, rather than other walking, talking coders. So, researchers at Saarland University in Germany sought to "analyze knowledge transfer in both human-human and human-AI settings." One group of human-human pairs tackled a programming task, while another group of individual developers tackled the task with the assistance of GitHub Copilot. The task involved implementing features within an existing codebase of approximately 400 lines including both Python code and comments, distributed across 5 files. The researchers sought to answer two questions. Firstly, "To what extent do the frequency, length, and depth of knowledge transfer episodes differ between human-human pair programming and human-AI pair programming?" And secondly, "How do the quality and diversity of knowledge transfer episodes, including topic types and finish types, vary between human-human pair programming and human-AI pair programming?" The academics then tracked conversational "episodes" between the meat sack duos using a speech recognition tool and used screen recordings to track interactions within the human and Copilot pairs. Those conversations were analyzed for "contribution to knowledge transfer" with the researchers noting: "In most cases, utterances related to knowledge transfer contain an exchange of information between the two humans or between the human and GITHUB COPILOT." They found that the human-human pairings generated 210 episodes, compared to 126 episodes in human-AI pair programming sessions. "Code" conversations were more frequent in the human-machine pairing, whereas "lost sight" outcomes - ie the conversation got sidetracked - was more common in the human pairings. They found "A high level of TRUST episodes in human-AI pair programming sessions. If this pattern were to generalize beyond our setup, this would carry important real-world implications, warranting further investigation. These frequent TRUST episodes can reduce opportunities for deeper learning." Other, broader but still on topic conversations were more likely to occur in the human-human pairings. The researchers concluded that while the use of AI might increase efficiency, it could also "reduce the broader knowledge exchange that arises from side discussions in human-human pair programming, potentially decreasing long-term efficiency." This could mean that while "AI is useful for simple, repetitive tasks where side discussions are less valuable... when it comes to building deeper knowledge it must be treated with care, especially for students." And the researchers added: "We observe that in many GITHUB COPILOT sessions, programmers tend to accept the assistant's suggestions with minimal scrutiny, relying on the assumption that the code will perform as intended." They suggested, "Human-human pair programming enables spontaneous interactions but also increases the risk of distraction. In contrast, knowledge transfer with GITHUB COPILOT is less likely to be aborted, yet suggestions are often accepted with less scrutiny." However, AI assistants were good at reminding humans of key details, "such as committing database changes, that might otherwise be overlooked." That could, arguably should, be ringing alarm bells for development leaders. It's easy to focus on the increasing efficiency AI generated code can bring. But that code still needs to be reviewed and tested before being put into production, otherwise bad things can happen. GitHub happily trumpeted the uptake of CoPilot in its latest Octoverse report last week, with 80 percent of new users diving into the technology. The use of CoPilot, and other code assistants, is even shaping the languages developers use, with a shift to more strongly typed languages which lend themselves to code generation platforms. But generating code is just part of the pipeline. Research by Cloudsmith earlier this year highlighted how coders are acutely aware of the perils of LLM generated code, for instance by recommending non-existent or even malicious packages. At the same time, a third of developers were deploying AI generated code without review. ®
[2]
Software developers show less constructive skepticism when using AI assistants than when working with human colleagues
When writing program code, software developers often work in pairs -- a practice that reduces errors and encourages knowledge sharing. Increasingly, AI assistants are now being used for this role. But this shift in working practice isn't without its drawbacks, as a new empirical study by computer scientists in Saarbrücken reveals. Developers tend to scrutinize AI-generated code less critically and they learn less from it. These findings will be presented at the 40th IEEE/ACM International Conference on Automated Software Engineering (ASE 2025) in Seoul. When two software developers collaborate on a programming project -- known in technical circles as pair programming -- it tends to yield a significant improvement in the quality of the resulting software. "Developers can often inspire one another and help avoid problematic solutions. They can also share their expertise, thus ensuring that more people in their organization are familiar with the codebase," explains Sven Apel, professor of computer science at Saarland University. Together with his team, Apel has examined whether this collaborative approach works equally well when one of the partners is an AI assistant. In the study, 19 students with programming experience were divided into pairs: Six worked with a human partner, while seven collaborated with an AI assistant. The methodology for measuring knowledge transfer was developed by Niklas Schneider as part of his bachelor's thesis. For the study, the researchers used GitHub Copilot, an AI-powered coding assistant introduced by Microsoft in 2021, which -- like similar products from other companies -- has now been widely adopted by software developers. These tools have significantly changed how software is written. "It enables faster development and the generation of large volumes of code in a short time. But this also makes it easier for mistakes to creep in unnoticed, with consequences that may only surface later on," says Apel. The team wanted to understand which aspects of human collaboration enhance programming and whether these can be replicated in human-AI pairings. Participants were tasked with developing algorithms and integrating them into a shared project environment. "Knowledge transfer is a key part of pair programming," Apel explains. "Developers will continuously discuss current problems and work together to find solutions. This does not involve simply asking and answering questions, it also means that the developers share effective programming strategies and volunteer their own insights." According to the study, such exchanges also occurred in the AI-assisted teams -- but the interactions were less intense and covered a narrower range of topics. "In many cases, the focus was solely on the code," says Apel. "By contrast, human programmers working together were more likely to digress and engage in broader discussions and were less focused on the immediate task." One finding particularly surprised the research team: "The programmers who were working with an AI assistant were more likely to accept AI-generated suggestions without critical evaluation. They assumed the code would work as intended," says Apel. "The human pairs, in contrast, were much more likely to ask critical questions and were more inclined to carefully examine each other's contributions." He believes this tendency to trust AI more readily than human colleagues may extend to other domains as well, stating, "I think it has to do with a certain degree of complacency -- a tendency to assume the AI's output is probably good enough, even though we know AI assistants can also make mistakes. Apel warns that this uncritical reliance on AI could lead to the accumulation of "technical debt," which can be thought of as the hidden costs of the future work needed to correct these mistakes, thereby complicating the future development of the software. For Apel, the study highlights the fact that AI assistants are not yet capable of replicating the richness of human collaboration in software development. "They are certainly useful for simple, repetitive tasks," says Apel. "But for more complex problems, knowledge exchange is essential -- and that currently works best between humans, possibly with AI assistants as supporting tools." Apel emphasizes the need for further research into how humans and AI can collaborate effectively while still retaining the kind of critical eye that characterizes human collaboration.
Share
Share
Copy Link
Research from Saarland University reveals that while AI pair programming tools like GitHub Copilot increase efficiency, developers show less critical evaluation of AI-generated code compared to human collaboration, potentially compromising long-term software quality and learning outcomes.
Researchers at Saarland University conducted an empirical study comparing traditional human-human pair programming with human-AI pair programming using GitHub Copilot
1
2
. The study involved 19 students with programming experience, divided into pairs: six worked with human partners while seven collaborated with AI assistants. Participants tackled programming tasks involving implementing features within an existing codebase of approximately 400 lines, distributed across 5 files containing both Python code and comments1
.The research team tracked conversational "episodes" between human pairs using speech recognition tools and monitored human-AI interactions through screen recordings. They analyzed these conversations for "contribution to knowledge transfer," focusing on information exchange patterns between participants
1
.
Source: Tech Xplore
The study revealed significant differences in knowledge transfer between the two approaches. Human-human pairings generated 210 episodes compared to 126 episodes in human-AI pair programming sessions
1
. However, the nature of these interactions varied considerably."Code" conversations were more frequent in human-machine pairings, while "lost sight" outcomes—where conversations became sidetracked—were more common in human pairings
1
. The research identified "a high level of TRUST episodes in human-AI pair programming sessions," with developers showing a tendency to accept AI-generated suggestions without critical evaluation1
2
.Professor Sven Apel, who led the research, noted that "the programmers who were working with an AI assistant were more likely to accept AI-generated suggestions without critical evaluation. They assumed the code would work as intended"
2
. This contrasted sharply with human pairs, who "were much more likely to ask critical questions and were more inclined to carefully examine each other's contributions"2
.
Source: The Register
The research highlights concerning implications for software development practices. While AI assistants demonstrated efficiency in generating code quickly, they also reduced the broader knowledge exchange that characterizes effective human collaboration
1
. The study found that "when it comes to building deeper knowledge it must be treated with care, especially for students"1
.Apel warns that uncritical reliance on AI could lead to the accumulation of "technical debt," representing hidden costs of future work needed to correct mistakes and complications in software development
2
. This concern is amplified by separate research from Cloudsmith, which found that despite developers being "acutely aware of the perils of LLM generated code," including recommendations for non-existent or malicious packages, "a third of developers were deploying AI generated code without review"1
.Related Stories
The findings come as AI coding assistants gain widespread adoption across the software development industry. GitHub's latest Octoverse report revealed that 80 percent of new users are embracing Copilot technology
1
. The influence extends beyond adoption rates, with AI assistants "shaping the languages developers use, with a shift to more strongly typed languages which lend themselves to code generation platforms"1
.Despite the efficiency gains, the research suggests that AI assistants cannot fully replicate the richness of human collaboration in software development. As Apel explains, "They are certainly useful for simple, repetitive tasks, but for more complex problems, knowledge exchange is essential—and that currently works best between humans, possibly with AI assistants as supporting tools"
2
.Summarized by
Navi
[1]
1
Technology

2
Technology

3
Business and Economy
