Curated by THEOUTPOST
On Thu, 27 Feb, 12:03 AM UTC
2 Sources
[1]
AI-detection software isn't the solution to classroom cheating -- assessment has to shift
Two years since the release of ChatGPT, teachers and institutions are still struggling with assessment in the age of artificial intelligence (AI). Some have banned AI tools outright. Others have turned to AI tools only to abandon them months later or have called for teachers to embrace AI to transform assessment. The result is a hodgepodge of responses, leaving many kindergarten to Grade 12 and post-secondary teachers to make decisions about AI use that may not be aligned with the teacher next door, institutional policies, or current research on what AI can and cannot do. One response has been to use AI detection software, which rely on algorithms to try to identify how a specific text was generated. AI detection tools are better than humans at spotting AI-generated work. But they're a sufficiently imperfect solution, and they do nothing to address the core validity problem of designing assessments where we can be confident in what students know and can do. Teachers using AI detectors A recent American survey, based on nationally representative surveys of K-12 public school teachers published by the Center for Democracy and Technology, reported that 68 per cent of teachers use AI detectors. This practice has also founds its way into some Canadian K-12 schools and universities. AI detectors vary in their methods. Two common approaches are to check for qualities described as "burstiness," referring to alternating and short and long sentences (the way humans tend to write) and complexity (or "perplexity"). If an assignment does not have the typical markers of human-generated text, the software may flag it as AI-generated, prompting the teacher to begin an investigation for academic misconduct. To its credit, AI detection software is more reliable than human detection. Repeated studies across contexts show humans -- including teachers and other experts -- are incapable of reliably distinguishing AI-generated text, despite teachers' confidence that they can spot a fake. Accuracy of detectors varies While some AI detection tools are unreliable or biased against English language learners, others seem to be more successful. However, what success rates should really signal for educators is questionable. Turnitin boasts that their AI detector has a 99 per cent success rate, vis-Ã -vis their near one per cent rate of false positives (that is, the number of human-generated submissions their tool incorrectly flags as AI-generated). This accuracy has been challenged by a recent study that found Turnitin only detected AI-generated text about 61 per cent of the time. The same study suggested how different factors could shape accuracy results. For example, GPTZero's accuracy may be as low as 26 per cent, especially if students edit the output an AI tool generates. Yet a different study of the same detector suggested a wide range of results (for example, between 23 and 82 per cent accuracy or 74 and 100 per cent accuracy). Considering numbers in context The value of a percentage depends on its context. In most courses, being correct 99 per cent of the time is exceptional. It's above the most common threshold for statistical significance in academic research, which is often set at 95 per cent. But a 99 per cent success rate would be atrocious in air travel. There, a 99 per cent success rate would mean around 500 accidents every day in the United States alone. That level of failure would be unacceptable. To suggest what this could look like: at an institution like mine, the University of Winnipeg, about 10,000 students submit multiple assignments -- we could ballpark five, for argument's sake -- for around five courses every year. That would be about 250,000 assignments every year. There, even a 99 per cent success rate means roughly 2,500 failures. That's 2,500 false positives where students did not use ChatGPT or other tools, but the AI detection software flags them for possible use of AI, potentially initiating hours of investigative work for teachers and administrators alongside stress for students who may be falsely accused of cheating. Time wasted investigating false positives While AI detection software merely flag possible problems, we've already seen that humans are unreliable detectors. We cannot tell which of these 2,500 assignments are false positives, meaning cheaters will still slip through the cracks and precious teacher time will be wasted investigating innocent students who did nothing wrong. This is not a new problem. Cheating has been a major concern long before ChatGPT. Ubiquitous AI has merely shed a spotlight on a long-standing validity problem. When students can plagiarize, hire contract cheaters, rely on ChatGPT or have their friend or sister write the paper, relying on take-home assessments written outside class time without any teacher oversight is indefensible. I cannot presume that such forms of assessment represent the student's learning, because I cannot reliably discern if the student actually wrote them. Need to change assessment The solution to taller cheating ladders is not taller walls. The solution is to change how we are assessing -- something classroom assessment researchers have been advocating for long before the onset of AI. Just as we don't spend thousands of dollars on "did-their-sister-write-this" detectors, schools should not rest easy simply because AI detection companies have a product to sell. If educators want to make valid inferences about what students know and can do, assessment practices are needed that emphasize ongoing formative assessment (like drafts, works-in-progress and repeated observations of student learning). These need to be rooted in authentic contexts relevant to students' lives and their learning that centre comprehensive academic integrity as a shared responsibility of students, teachers and system leaders -- not just a mantra of "don't cheat and if we catch you we will punish you." Let's spend less on flawed detection tools and more on supporting teachers to develop their assessment capacity across the board.
[2]
AI-detection software isn't the solution to classroom cheating -- assessment has to shift
Two years since the release of ChatGPT, teachers and institutions are still struggling with assessment in the age of artificial intelligence (AI). Some have banned AI tools outright. Others have turned to AI tools only to abandon them months later or have called for teachers to embrace AI to transform assessment. The result is a hodgepodge of responses, leaving many kindergarten to Grade 12 and post-secondary teachers to make decisions about AI use that may not be aligned with the teacher next door, institutional policies, or current research on what AI can and cannot do. One response has been to use AI detection software, which rely on algorithms to try to identify how a specific text was generated. AI detection tools are better than humans at spotting AI-generated work. But they're a sufficiently imperfect solution, and they do nothing to address the core validity problem of designing assessments where we can be confident in what students know and can do. AI detectors vary in their methods. Two common approaches are to check for qualities described as "burstiness," referring to alternating and short and long sentences (the way humans tend to write) and complexity (or "perplexity"). If an assignment does not have the typical markers of human-generated text, the software may flag it as AI-generated, prompting the teacher to begin an investigation for academic misconduct. To its credit, AI detection software is more reliable than human detection. Repeated studies across contexts show humans -- including teachers and other experts -- are incapable of reliably distinguishing AI-generated text, despite teachers' confidence that they can spot a fake. Accuracy of detectors varies While some AI detection tools are unreliable or biased against English language learners, others seem to be more successful. However, what success rates should really signal for educators is questionable. Turnitin boasts that their AI detector has a 99% success rate, vis-Ã -vis their near 1% rate of false positives (that is, the number of human-generated submissions their tool incorrectly flags as AI-generated). This accuracy has been challenged by a recent study that found Turnitin only detected AI-generated text about 61% of the time. The same study suggested how different factors could shape accuracy results. For example, GPTZero's accuracy may be as low as 26%, especially if students edit the output an AI tool generates. Yet a different study of the same detector suggested a wide range of results (for example, between 23% and 82% accuracy or 74% and 100% accuracy). Considering numbers in context The value of a percentage depends on its context. In most courses, being correct 99% of the time is exceptional. It's above the most common threshold for statistical significance in academic research, which is often set at 95%. But a 99% success rate would be atrocious in air travel. There, a 99% success rate would mean around 500 accidents every day in the United States alone. That level of failure would be unacceptable. To suggest what this could look like: at an institution like mine, the University of Winnipeg, about 10,000 students submit multiple assignments -- we could ballpark five, for argument's sake -- for around five courses every year. That would be about 250,000 assignments every year. There, even a 99% success rate means roughly 2,500 failures. That's 2,500 false positives where students did not use ChatGPT or other tools, but the AI detection software flags them for possible use of AI, potentially initiating hours of investigative work for teachers and administrators alongside stress for students who may be falsely accused of cheating. Time wasted investigating false positives While AI detection software merely flag possible problems, we've already seen that humans are unreliable detectors. We cannot tell which of these 2,500 assignments are false positives, meaning cheaters will still slip through the cracks and precious teacher time will be wasted investigating innocent students who did nothing wrong. This is not a new problem. Cheating has been a major concern long before ChatGPT. Ubiquitous AI has merely shed a spotlight on a long-standing validity problem. When students can plagiarize, hire contract cheaters, rely on ChatGPT or have their friend or sister write the paper, relying on take-home assessments written outside class time without any teacher oversight is indefensible. I cannot presume that such forms of assessment represent the student's learning, because I cannot reliably discern if the student actually wrote them. Need to change assessment The solution to taller cheating ladders is not taller walls. The solution is to change how we are assessing -- something classroom assessment researchers have been advocating for long before the onset of AI. Just as we don't spend thousands of dollars on "did-their-sister-write-this" detectors, schools should not rest easy simply because AI detection companies have a product to sell. If educators want to make valid inferences about what students know and can do, assessment practices are needed that emphasize ongoing formative assessment (like drafts, works-in-progress and repeated observations of student learning). These need to be rooted in authentic contexts relevant to students' lives and their learning that center comprehensive academic integrity as a shared responsibility of students, teachers and system leaders -- not just a mantra of "don't cheat and if we catch you we will punish you." Let's spend less on flawed detection tools and more on supporting teachers to develop their assessment capacity across the board.
Share
Share
Copy Link
As educators grapple with AI-generated content in classrooms, experts argue that AI detection software is an imperfect solution and call for a fundamental shift in assessment methods to ensure academic integrity.
Two years after the release of ChatGPT, educational institutions are still grappling with the challenges of assessment in the age of artificial intelligence (AI). In response to these challenges, many have turned to AI detection software as a potential solution. A recent survey by the Center for Democracy and Technology revealed that 68% of K-12 public school teachers in the United States are now using AI detectors 1.
AI detection tools employ various methods to identify AI-generated text. Common approaches include analyzing "burstiness" (the alternation of short and long sentences typical in human writing) and complexity (or "perplexity") of the text. While these tools have shown to be more reliable than human detection, their accuracy remains a subject of debate 1.
The accuracy of AI detection software varies widely across different studies and tools. Turnitin, a popular plagiarism detection service, claims a 99% success rate with only 1% false positives. However, a recent study challenged this claim, finding that Turnitin only detected AI-generated text about 61% of the time 2.
Another tool, GPTZero, showed even more inconsistent results. One study suggested its accuracy could be as low as 26%, especially when students edit AI-generated output. A different study of the same tool reported accuracy ranges between 23% and 82%, or 74% and 100%, depending on the context 2.
Even with high accuracy rates, the sheer volume of assignments in educational institutions means that false positives remain a significant concern. For instance, at the University of Winnipeg, with approximately 250,000 assignments submitted annually, a 99% accuracy rate would still result in 2,500 false positives each year 1.
Experts argue that the focus on detection tools fails to address the core issue of assessment validity. The ease with which students can access various forms of external help, from AI tools to contract cheating, calls into question the effectiveness of traditional take-home assessments 2.
Instead of relying on detection software, educators are being urged to fundamentally change their assessment methods. Suggestions include:
As the debate continues, it's clear that the education sector needs to adapt its practices to the realities of the AI era, focusing on developing assessment methods that can effectively measure student learning in this new landscape.
Exeter University pioneers AI-friendly assessments as higher education grapples with ChatGPT's impact. The move sparks debate on academic integrity and the future of education in the AI era.
2 Sources
2 Sources
As AI tools like ChatGPT become popular for homework, educators are using AI detection tools to catch cheaters. However, these tools sometimes falsely accuse students, leading to serious consequences.
2 Sources
2 Sources
Recent tests reveal that AI detectors are incorrectly flagging human-written texts, including historical documents, as AI-generated. This raises questions about their accuracy and the potential consequences of their use in academic and professional settings.
2 Sources
2 Sources
A dramatic increase in AI usage among UK university students for academic work has prompted calls for urgent policy changes and assessment reviews.
2 Sources
2 Sources
An in-depth look at the current state of AI content detection, exploring various tools and methods, their effectiveness, and the challenges faced in distinguishing between human and AI-generated text.
2 Sources
2 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved