Curated by THEOUTPOST
On Fri, 11 Oct, 12:05 AM UTC
4 Sources
[1]
The Editors Protecting Wikipedia from AI Hoaxes
A group of Wikipedia editors have formed WikiProject AI Cleanup, "a collaboration to combat the increasing problem of unsourced, poorly-written AI-generated content on Wikipedia." The group's goal is to protect one of the world's largest repositories of information from the same kind of misleading AI-generated information that has plagued Google search results, books sold on Amazon, and academic journals. "A few of us had noticed the prevalence of unnatural writing that showed clear signs of being AI-generated, and we managed to replicate similar 'styles' using ChatGPT," Ilyas Lebleu, a founding member of WikiProject AI Cleanup, told me in an email. "Discovering some common AI catchphrases allowed us to quickly spot some of the most egregious examples of generated articles, which we quickly wanted to formalize into an organized project to compile our findings and techniques."
[2]
How AI-generated content is upping the workload for Wikipedia editors
As AI-generated slop takes over increasing swathes of the user-generated Internet thanks to the rise of large language models (LLMs) like OpenAI's GPT, spare a thought for Wikipedia editors. In addition to their usual job of grubbing out bad human edits, they're having to spend an increasing proportion of their time trying to weed out AI filler. 404 Media has talked to Ilyas Lebleu, an editor at the crowdsourced encyclopedia, who was involved in founding the "WikiProject AI Cleanup" project. The group is trying to come up with best practices to detect machine-generated contributions. (And no, before you ask, AI is useless for this.) A particular problem with AI-generated content in this context is that it's almost always improperly sourced. The ability of LLMs to instantly produce reams of plausible-sounding text has even led to whole fake entries being uploaded in a bid to sneak hoaxes past Wikipedia's human experts.
[3]
Wikipedia Declares War on AI Slop
AI slop threatens to degrade the useability of Wikipedia -- and its editors are fighting back. As 404 Media reports, a team of Wikipedia editors has assembled to create "WikiProject AI Cleanup," which describes itself as "a collaboration to combat the increasing problem of unsourced, poorly-written AI-generated content on Wikipedia." The group is clear that they don't wish to ban responsible AI use outright, but instead seek to eradicate instances of badly-sourced, hallucination-filled, or otherwise unhelpful AI content that erodes the overall quality of the web's decades-old information repository. "The purpose of this project is not to restrict or ban the use of AI in articles," the battle-ready cohort's Wikipedia forum reads, "but to verify that its output is acceptable and constructive, and to fix or remove it otherwise." In some cases, the editors told 404, AI misuse is obvious. One clear sign is users of AI tools leaving well-known chatbot auto-responses behind in Wikipedia entries, such as paragraphs starting with "as an AI language model, I..." or "as of my last knowledge update." The editors also say they've learned to recognize certain prose patterns and "catchphrases," which has allowed them to spot and neutralize sloppy AI text. "A few of us had noticed the prevalence of unnatural writing that showed clear signs of being AI-generated, and we managed to replicate similar 'styles' using ChatGPT," WikiProject AI Cleanup founding member Ilyas Lebleu told 404, adding that "discovering some common AI catchphrases allowed us to quickly spot some of the most egregious examples of generated articles." Still, a lot of poor-quality AI content is tough to spot, especially when it comes to confident-sounding errors hidden in complex material. One example flagged to 404 by editors was an impressively crafted history of a "timbery" Ottoman fortress that never actually existed. While it was simply wrong, the text itself was passable enough that unless you happen to specialize in 13th-century Ottoman architecture, you likely wouldn't have caught the error. As we previously reported, Wikipedia editors have in some cases chosen to demote the reliability of certain news sites like CNET -- which we caught publishing error-laden AI articles last year -- as a direct result of AI misuse. Given that it's incredibly cheap to mass produce, limiting sloppy AI content is often difficult. Add the fact that Wikipedia is, and has always been, a crowdsourced, volunteer-driven internet project, and fighting the tide of AI sludge gets that much more difficult.
[4]
Wikipedia is under assault: rogue users keep posting AI generated nonsense
Serving tech enthusiasts for over 25 years. TechSpot means tech analysis and advice you can trust. This is why we can't have nice things: Wikipedia is in the middle of an editing crisis at the moment, thanks to AI. People have started flooding the website with nonsensical information dreamed up by large language models like ChatGPT. But honestly, who didn't see this coming? Wikipedia has a new initiative called WikiProject AI Cleanup. It is a task force of volunteers currently combing through Wikipedia articles, editing or removing false information that appears to have been posted by people using generative AI. Ilyas Lebleu, a founding member of the cleanup crew, told 404 Media that the crisis began when Wikipedia editors and users began seeing passages that were unmistakably written by a chatbot of some kind. The team confirmed the theory by recreating some passages using ChatGPT. "A few of us had noticed the prevalence of unnatural writing that showed clear signs of being AI-generated, and we managed to replicate similar 'styles' using ChatGPT," said Lebleu. "Discovering some common AI catchphrases allowed us to quickly spot some of the most egregious examples of generated articles, which we quickly wanted to formalize into an organized project to compile our findings and techniques." For example, There is one article about an Ottoman fortress built in the 1400s called "Amberlisihar." The 2,000-word article details the landmark's location and construction. Unfortunately, Amberlisihar does not exist, and all the information about it is a complete hallucination peppered with enough factual information to lend it some credibility. The mischief is not limited to newly posted material either. The bad actors are inserting bogus AI-generated information into existing articles that volunteer editors have already vetted. In one example, someone had inserted a correctly cited section about a particular crab species into an article about an unrelated beetle. Lebleu and his fellow editors say they don't know why people are doing this, but let's be honest - we all know this is happening for two primary reasons. First is an inherent problem with Wikipedia's model - anyone can be an editor on the platform. Many universities do not accept students turning in papers that cite Wikipedia for this exact reason. The second reason is simply that the internet ruins everything. We've seen this time and again, particularly with AI applications. Remember Tay, Microsoft's Twitter bot that got pulled in less than 24 hours when it began posting vulgar and racist tweets? More modern AI applications are just as susceptible to abuse as we have seen with deepfakes, ridiculous AI-generated shovelware books on Kindle, and other shenanigans. Anytime the public is allowed virtually unrestricted access to something, you can expect a small percentage of users to abuse it. When we are talking about 100 people, it might not be a big deal, but when it's millions, you are going to have a problem. Sometimes, it's for illicit gain. Other times, it's just because they can. Such is the case with Wikipedia's current predicament.
Share
Share
Copy Link
Wikipedia's volunteer editors form WikiProject AI Cleanup to combat the rising tide of AI-generated content, aiming to protect the integrity of the world's largest online encyclopedia.
Wikipedia, one of the world's largest repositories of information, is grappling with a new threat: the influx of AI-generated content. A group of dedicated editors has formed WikiProject AI Cleanup to combat this growing problem, which risks undermining the credibility and usefulness of the crowdsourced encyclopedia [1].
The proliferation of large language models (LLMs) like OpenAI's GPT has led to an increase in AI-generated content across the internet. Wikipedia has not been immune to this trend, with editors noticing a surge in unsourced, poorly-written articles and edits that show clear signs of being AI-generated [2].
Ilyas Lebleu, a founding member of WikiProject AI Cleanup, explained, "A few of us had noticed the prevalence of unnatural writing that showed clear signs of being AI-generated, and we managed to replicate similar 'styles' using ChatGPT" [1]. This observation led to the formation of the cleanup project, aimed at compiling findings and techniques to identify and remove AI-generated content.
The WikiProject AI Cleanup team has developed several methods to spot AI-generated text:
While some AI-generated content is easy to spot, more sophisticated attempts pose significant challenges. One notable example was a 2,000-word article about "Amberlisihar," a non-existent Ottoman fortress supposedly built in the 1400s. The article was detailed and peppered with enough factual information to lend it credibility, making it difficult for non-experts to identify as false [4].
The influx of AI-generated content has significantly increased the workload for Wikipedia's volunteer editors. In addition to their usual tasks of removing bad human edits, they now must dedicate time to identifying and removing AI-generated text [2]. This challenge is compounded by the fact that AI-generated content is often improperly sourced and can be produced in large quantities at minimal cost [3].
While WikiProject AI Cleanup aims to remove low-quality AI-generated content, the group does not seek to ban responsible AI use outright. Their Wikipedia forum states, "The purpose of this project is not to restrict or ban the use of AI in articles, but to verify that its output is acceptable and constructive, and to fix or remove it otherwise" [3].
The challenges faced by Wikipedia reflect a larger issue affecting the internet as a whole. As AI-generated content becomes more prevalent, maintaining the integrity and reliability of online information sources becomes increasingly difficult. This situation highlights the ongoing need for human oversight and critical evaluation of digital content [4].
As Wikipedia continues to battle against the tide of AI-generated misinformation, the efforts of projects like WikiProject AI Cleanup underscore the importance of human expertise and diligence in preserving the quality and accuracy of crowdsourced knowledge in the age of artificial intelligence.
Reference
[1]
[3]
Researchers warn that the proliferation of AI-generated web content could lead to a decline in the accuracy and reliability of large language models (LLMs). This phenomenon, dubbed "model collapse," poses significant challenges for the future of AI development and its applications.
8 Sources
Recent tests reveal that AI detectors are incorrectly flagging human-written texts, including historical documents, as AI-generated. This raises questions about their accuracy and the potential consequences of their use in academic and professional settings.
2 Sources
Experts raise alarms about the potential limitations and risks associated with large language models (LLMs) in AI. Concerns include data quality, model degradation, and the need for improved AI development practices.
2 Sources
An in-depth look at the current state of AI content detection, exploring various tools and methods, their effectiveness, and the challenges faced in distinguishing between human and AI-generated text.
2 Sources
A Harvard study reveals the presence of AI-generated research papers on Google Scholar, sparking debates about academic integrity and the future of scholarly publishing. The findings highlight the challenges posed by AI in distinguishing between human-authored and machine-generated content.
4 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2024 TheOutpost.AI All rights reserved