Curated by THEOUTPOST
On Thu, 12 Sept, 12:07 AM UTC
4 Sources
[1]
Fake GPT-written studies are flooding Google Scholar. Here's why taking them down could make things worse.
This story is available exclusively to Business Insider subscribers. Become an Insider and start reading now. Have an account? Log in. "They are often created with widely available, general-purpose AI applications, most likely ChatGPT, and mimic scientific writing," the study said. ChatGPT is a chatbot developed by OpenAI that launched in 2022. The chatbot quickly went viral as users began drafting everything from workout routines to diet plans. Other companies like Meta and Google now have their own competing large language models. Researchers gathered data by analyzing a sample of scientific papers pulled from Google Scholar that showed signs of GPT use. Specifically, scientific papers that included phrases considered common responses from ChatGPT or similar programs: "I don't have access to real-time data" and "as of my last knowledge update." From that sample, researchers identified 139 "questionable" papers listed as regular results on Google Scholar. "Most of these GPT-fabricated papers were found in non-indexed journals and working papers, but some cases included research published in mainstream scientific journals and conference proceedings," the study said. Many of the research papers involved controversial topics like health, computing, and the environment, which are "susceptible to disinformation," according to the study. While researchers acknowledged that the papers could be removed, they warned doing so could fuel conspiracy theories. "As the rise of the so-called anti-vaxx movement during the COVID-19 pandemic and the ongoing obstruction and denial of climate change show, retracting erroneous publications often fuels conspiracies and increases the following of these movements rather than stopping them," the study said. Representatives for Google and OpenAI did not respond to Business Insider's request for comment. The study also identified two main risks from the "increasingly common" decision to use GPT to create "fake, scientific papers." "First, the abundance of fabricated 'studies' seeping into all areas of the research infrastructure threatens to overwhelm the scholarly communication system and jeopardize the integrity of the scientific record," the study said. The second risk involves the "increased possibility that convincingly scientific-looking content was in fact deceitfully created with AI tools and is also optimized to be retrieved by publicly available academic search engines, particularly Google Scholar." "However small, this possibility and awareness of it risks undermining the basis for trust in scientific knowledge and poses serious societal risks," the study said.
[2]
Harvard study finds AI-generated research papers on Google Scholar - why it matters
Be careful what you cite if you use the popular Google Scholar search engine. By this point, most chatbot users have accepted the possibility that artificial intelligence (AI) tools will hallucinate in almost every scenario. Despite the efforts of AI content detectors, fact-checkers, and increasingly sophisticated large language models (LLMs), no developers have found a solution for this yet. Also: Implementing AI? Check MIT's free database for the risks Meanwhile, the consequences of misinformation are only getting higher: People are using generative AI (gen AI) tools like ChatGPT to create fake research. A recent study published in the Harvard Kennedy School's Misinformation Review found 139 papers on Google Scholar, a search engine for scholarly literature, that appear to be AI-generated. The researchers found most of the "questionable" papers in non-indexed (unverified) journals, though 19 of them were found in indexed journals and established publications. Another 19 appeared in university databases, apparently written by students. Even more concerning is the content of the papers. 57% of the fake studies covered topics like health, computational tech, and the environment -- areas the researchers note are relevant to and could influence policy development. Also: The best AI image generators of 2024: Tested and reviewed After analyzing the papers, the researchers identified them as likely AI-generated due to their inclusion of "at least one of two common phrases returned by conversational agents that use large language models (LLM) like OpenAI's ChatGPT." The team then used Google Search to find where the papers could be accessed, locating multiple copies of them across databases, archives, and repositories and on social media. "The public release of ChatGPT in 2022, together with the way Google Scholar works, has increased the likelihood of lay people (e.g., media, politicians, patients, students) coming across questionable (or even entirely GPT-fabricated) papers and other problematic research findings," the study explains. Also: The data suggests gen AI boosts software productivity - for these developers The researchers behind the study noted that theirs is not the first list of academic papers suspected to be AI-generated and that papers are "constantly being added" to these. So what risks do these fake studies pose being on the internet? Also: How do AI checkers actually work? While propaganda and slapdash or falsified studies aren't new, gen AI makes this content exponentially easier to create. "The abundance of fabricated 'studies' seeping into all areas of the research infrastructure threatens to overwhelm the scholarly communication system and jeopardize the integrity of the scientific record," the researchers explain in their findings. They went on to note that it's worrisome that someone could "deceitfully" create "convincingly scientific-looking content" using AI and optimize it to rank on popular search engines like Google Scholar. Back in April, 404 Media found similar evidence of entirely AI-fabricated books and other material on Google Books and Google Scholar by searching for the phrase "As of my last knowledge update," which is commonly found in ChatGPT responses due to its previously limited dataset. Now that the free version of ChatGPT has web browsing and can access live information, markers like this may be less frequent or disappear altogether, making AI-generated texts harder to spot. While Google Scholar does have a majority of quality literature, it "lacks the transparency and adherence to standards that usually characterize citation databases," the study explains. The researchers note that, like Google Search, Scholar uses automated crawlers, meaning "the inclusion criteria are based on primarily technical standards, allowing any individual author -- with or without scientific affiliation -- to upload papers." Users also can't filter results for parameters like material type, publication status, or whether they've been peer-reviewed. Also: I tested 7 AI content detectors - they're getting dramatically better at identifying plagiarism Google Scholar is easily accessible -- and very popular. According to SimilarWeb, the search engine had over 111 million visits last month, putting it just over academic databases like ResearchGate.net. With so many users flocking to Scholar, likely based on brand trust from all the other Google products they use daily, the odds of them citing false studies are only getting higher. The most potent difference between AI chatbot hallucinations and entirely falsified studies is context. If users querying ChatGPT know to expect some untrue information, they can take ChatGPT's responses with a grain of salt and double-check its claims. But if AI-generated text is presented as vetted academic research conducted by humans and platformed by a popular source database, users have little reason or means to verify what they're reading is real.
[3]
Study finds AI-generated research papers on Google Scholar - why it matters
Be careful what you cite if you use the popular Google Scholar search engine. By this point, most chatbot users have accepted the possibility that artificial intelligence (AI) tools will hallucinate in almost every scenario. Despite the efforts of AI content detectors, fact-checkers, and increasingly sophisticated large language models (LLMs), no developers have found a solution for this yet. Also: Implementing AI? Check MIT's free database for the risks Meanwhile, the consequences of misinformation are only getting higher: People are using generative AI (gen AI) tools like ChatGPT to create fake research. A recent study published in the Harvard Kennedy School's Misinformation Review found 139 papers on Google Scholar, a search engine for scholarly literature, that appear to be AI-generated. The researchers found most of the "questionable" papers in non-indexed (unverified) journals, though 19 of them were found in indexed journals and established publications. Another 19 appeared in university databases, apparently written by students. Even more concerning is the content of the papers. 57% of the fake studies covered topics like health, computational tech, and the environment -- areas the researchers note are relevant to and could influence policy development. Also: The best AI image generators of 2024: Tested and reviewed After analyzing the papers, the researchers identified them as likely AI-generated due to their inclusion of "at least one of two common phrases returned by conversational agents that use large language models (LLM) like OpenAI's ChatGPT." The team then used Google Search to find where the papers could be accessed, locating multiple copies of them across databases, archives, and repositories and on social media. "The public release of ChatGPT in 2022, together with the way Google Scholar works, has increased the likelihood of lay people (e.g., media, politicians, patients, students) coming across questionable (or even entirely GPT-fabricated) papers and other problematic research findings," the study explains. Also: The data suggests gen AI boosts software productivity - for these developers The researchers behind the study noted that theirs is not the first list of academic papers suspected to be AI-generated and that papers are "constantly being added" to these. So what risks do these fake studies pose being on the internet? Also: How do AI checkers actually work? While propaganda and slapdash or falsified studies aren't new, gen AI makes this content exponentially easier to create. "The abundance of fabricated 'studies' seeping into all areas of the research infrastructure threatens to overwhelm the scholarly communication system and jeopardize the integrity of the scientific record," the researchers explain in their findings. They went on to note that it's worrisome that someone could "deceitfully" create "convincingly scientific-looking content" using AI and optimize it to rank on popular search engines like Google Scholar. Back in April, 404 Media found similar evidence of entirely AI-fabricated books and other material on Google Books and Google Scholar by searching for the phrase "As of my last knowledge update," which is commonly found in ChatGPT responses due to its previously limited dataset. Now that the free version of ChatGPT has web browsing and can access live information, markers like this may be less frequent or disappear altogether, making AI-generated texts harder to spot. While Google Scholar does have a majority of quality literature, it "lacks the transparency and adherence to standards that usually characterize citation databases," the study explains. The researchers note that, like Google Search, Scholar uses automated crawlers, meaning "the inclusion criteria are based on primarily technical standards, allowing any individual author -- with or without scientific affiliation -- to upload papers." Users also can't filter results for parameters like material type, publication status, or whether they've been peer-reviewed. Also: I tested 7 AI content detectors - they're getting dramatically better at identifying plagiarism Google Scholar is easily accessible -- and very popular. According to SimilarWeb, the search engine had over 111 million visits last month, putting it just over academic databases like ResearchGate.net. With so many users flocking to Scholar, likely based on brand trust from all the other Google products they use daily, the odds of them citing false studies are only getting higher. The most potent difference between AI chatbot hallucinations and entirely falsified studies is context. If users querying ChatGPT know to expect some untrue information, they can take ChatGPT's responses with a grain of salt and double-check its claims. But if AI-generated text is presented as vetted academic research conducted by humans and platformed by a popular source database, users have little reason or means to verify what they're reading is real.
[4]
Harvard study finds AI-generated papers on Google Scholar - why it matters
Be careful what you cite if you use the popular Google Scholar search engine. By this point, most chatbot users have accepted the possibility that artificial intelligence (AI) tools will hallucinate in almost every scenario. Despite the efforts of AI content detectors, fact-checkers, and increasingly sophisticated large language models (LLMs), no developers have found a solution for this yet. Also: Implementing AI? Check MIT's free database for the risks Meanwhile, the consequences of misinformation are only getting higher: People are using generative AI (gen AI) tools like ChatGPT to create fake research. A recent study published in the Harvard Kennedy School's Misinformation Review found 139 papers on Google Scholar, a search engine for scholarly literature, that appear to be AI-generated. The researchers found most of the "questionable" papers in non-indexed (unverified) journals, though 19 of them were found in indexed journals and established publications. Another 19 appeared in university databases, apparently written by students. Even more concerning is the content of the papers. 57% of the fake studies covered topics like health, computational tech, and the environment -- areas the researchers note are relevant to and could influence policy development. Also: The best AI image generators of 2024: Tested and reviewed After analyzing the papers, the researchers identified them as likely AI-generated due to their inclusion of "at least one of two common phrases returned by conversational agents that use large language models (LLM) like OpenAI's ChatGPT." The team then used Google Search to find where the papers could be accessed, locating multiple copies of them across databases, archives, and repositories and on social media. "The public release of ChatGPT in 2022, together with the way Google Scholar works, has increased the likelihood of lay people (e.g., media, politicians, patients, students) coming across questionable (or even entirely GPT-fabricated) papers and other problematic research findings," the study explains. Also: The data suggests gen AI boosts software productivity - for these developers The researchers behind the study noted that theirs is not the first list of academic papers suspected to be AI-generated and that papers are "constantly being added" to these. So what risks do these fake studies pose being on the internet? Also: How do AI checkers actually work? While propaganda and slapdash or falsified studies aren't new, gen AI makes this content exponentially easier to create. "The abundance of fabricated 'studies' seeping into all areas of the research infrastructure threatens to overwhelm the scholarly communication system and jeopardize the integrity of the scientific record," the researchers explain in their findings. They went on to note that it's worrisome that someone could "deceitfully" create "convincingly scientific-looking content" using AI and optimize it to rank on popular search engines like Google Scholar. Back in April, 404 Media found similar evidence of entirely AI-fabricated books and other material on Google Books and Google Scholar by searching for the phrase "As of my last knowledge update," which is commonly found in ChatGPT responses due to its previously limited dataset. Now that the free version of ChatGPT has web browsing and can access live information, markers like this may be less frequent or disappear altogether, making AI-generated texts harder to spot. While Google Scholar does have a majority of quality literature, it "lacks the transparency and adherence to standards that usually characterize citation databases," the study explains. The researchers note that, like Google Search, Scholar uses automated crawlers, meaning "the inclusion criteria are based on primarily technical standards, allowing any individual author -- with or without scientific affiliation -- to upload papers." Users also can't filter results for parameters like material type, publication status, or whether they've been peer-reviewed. Also: I tested 7 AI content detectors - they're getting dramatically better at identifying plagiarism Google Scholar is easily accessible -- and very popular. According to SimilarWeb, the search engine had over 111 million visits last month, putting it just over academic databases like ResearchGate.net. With so many users flocking to Scholar, likely based on brand trust from all the other Google products they use daily, the odds of them citing false studies are only getting higher. The most potent difference between AI chatbot hallucinations and entirely falsified studies is context. If users querying ChatGPT know to expect some untrue information, they can take ChatGPT's responses with a grain of salt and double-check its claims. But if AI-generated text is presented as vetted academic research conducted by humans and platformed by a popular source database, users have little reason or means to verify what they're reading is real.
Share
Share
Copy Link
A Harvard study reveals the presence of AI-generated research papers on Google Scholar, sparking debates about academic integrity and the future of scholarly publishing. The findings highlight the challenges posed by AI in distinguishing between human-authored and machine-generated content.
A recent study conducted by researchers at Harvard University has uncovered a concerning trend in academic publishing: the presence of AI-generated research papers on Google Scholar 1. This discovery has sent shockwaves through the academic community, raising questions about the integrity of scholarly work and the potential implications for future research.
The Harvard team employed a novel approach to identify AI-generated content. They used GPT-4, a large language model, to create fake academic papers on topics such as COVID-19 and climate change 2. These artificially created papers were then uploaded to open-access repositories, which are typically indexed by Google Scholar.
Surprisingly, within just a few days, some of these AI-generated papers appeared in Google Scholar search results. This rapid indexing demonstrates the potential ease with which AI-generated content can infiltrate reputable academic databases.
The presence of AI-generated papers in scholarly databases raises significant concerns about the reliability of academic research. As AI technology becomes more sophisticated, it becomes increasingly difficult to distinguish between human-authored and machine-generated content 3.
This development could potentially undermine the credibility of genuine research and make it challenging for scholars to separate fact from fiction. It also poses a threat to the peer-review process, which has long been the gold standard for ensuring the quality and validity of academic work.
The study highlights the challenges faced by academic databases like Google Scholar in filtering out AI-generated content. As AI models become more advanced, traditional methods of detecting plagiarism or fake research may prove insufficient 4.
Google Scholar, being an automated system, may struggle to implement effective measures to identify and exclude AI-generated papers. This situation calls for the development of more sophisticated detection tools and stricter vetting processes for academic publications.
The infiltration of AI-generated papers into scholarly databases could have far-reaching consequences for the academic community and beyond. Researchers relying on these databases for literature reviews may unknowingly cite AI-generated work, potentially compromising the integrity of their own research.
Moreover, policymakers and industry professionals who depend on academic research to inform decisions may be misled by AI-generated content that lacks empirical foundation or scientific rigor. This could lead to misguided policies or strategies based on artificial rather than genuine scientific insights.
As the academic community grapples with this new challenge, there is a growing consensus on the need for collaborative efforts to address the issue. Potential solutions may include:
The discovery of AI-generated papers on Google Scholar serves as a wake-up call for the academic world, highlighting the urgent need to adapt to the rapidly evolving landscape of artificial intelligence in scholarly publishing.
Reference
[1]
A new study by Columbia's Tow Center for Digital Journalism finds that AI-driven search tools frequently provide incorrect information, with an average error rate of 60% when queried about news content.
11 Sources
11 Sources
AI is transforming scientific research, offering breakthroughs and efficiency, but also enabling easier fabrication of data and papers. The scientific community faces the challenge of maximizing AI's benefits while minimizing risks of misconduct.
2 Sources
2 Sources
A Columbia University study reveals that ChatGPT's search function often misattributes or fabricates news sources, raising concerns about its reliability for accessing current information.
2 Sources
2 Sources
Researchers warn that the proliferation of AI-generated web content could lead to a decline in the accuracy and reliability of large language models (LLMs). This phenomenon, dubbed "model collapse," poses significant challenges for the future of AI development and its applications.
8 Sources
8 Sources
Recent tests reveal that AI detectors are incorrectly flagging human-written texts, including historical documents, as AI-generated. This raises questions about their accuracy and the potential consequences of their use in academic and professional settings.
2 Sources
2 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved