Curated by THEOUTPOST
On Tue, 13 May, 4:01 PM UTC
2 Sources
[1]
AI paper mills are swamping science with garbage studies
Research flags rise in one-dimensional health research fueled by large language models A report from a British university warns that scientific knowledge itself is under threat from a flood of low-quality AI-generated research papers. The research team from the University of Surrey notes an "explosion of formulaic research articles," including inappropriate study designs and false discoveries, based on data cribbed from the US National Health and Nutrition Examination Survey (NHANES) nationwide health database. The study, published in PLOS Biology, a nonprofit publisher of open-access journals, found that many post-2021 papers used "a superficial and oversimplified approach to analysis." These often focused on a single variable while ignoring more realistic, multi-factor explanations of links between health conditions and potential causes, along with some cherry-picked narrow data subsets without justification. "We've seen a surge in papers that look scientific but don't hold up under scrutiny - this is 'science fiction' using national health datasets to masquerade as science fact," states Matt Spick, a lecturer in health and biomedical data analytics at Surrey University, and one of the authors of the report. "The use of these easily accessible datasets via APIs, combined with large language models, is overwhelming some journals and peer reviewers, reducing their ability to assess more meaningful research - and ultimately weakening the quality of science overall," he added. The report notes that AI-ready datasets, such as NHANES, can open up new opportunities for data-driven research, but also lead to the risk of potential data exploitation by what it calls "paper mills" - entities that churn out questionable scientific papers, often for paying clients seeking confirmation of an existing belief. Surrey Uni's work involved a systematic literature search going back ten years to retrieve potentially formulaic papers covering NHANES data, and analyzing these for telltale statistical approaches or study design. The team identified and retrieved 341 reports published across a number of different journals. It found that over the last three years, there has been a rapid rise in the number of publications analyzing single-factor associations between predictors (independent variables) and various health conditions using the NHANES dataset. An average of four papers per year were published between 2014 and 2021, increasing to 33, 82, and 190 in 2022, 2023, and the first ten months of 2024, respectively. Also noted is a change in the origins of the published research. From 2014 to 2020, just two out of 25 manuscripts had a primary author affiliation in China. Between 2021 and 2024, this rose to 292 out of 316 manuscripts. The report says this jump in single-factor associative research means there is a corresponding increase in the risk of misleading findings being introduced to the wider body of scientific literature. For example, it says that some well-known multifactorial health issues are analyzed as single-factor studies, citing depression, cardiovascular disease, and cognitive function - all recognized as multifactorial - being investigated using simplistic, single-factor approaches in some of the papers reviewed. To combat this, the team sets out a number of suggestions, including that editors and reviewers at scientific journals should regard single-factor analysis of conditions known to be complex and multifactorial as a "red flag" for potentially problematic research. Providers of datasets should also take steps including API keys and application numbers to prevent data dredging, an approach already used by the UK Biobank, the report says. Publications referencing such data should be made to include an auditable account number as a condition of access. Another suggestion is that full dataset analysis should be made mandatory, unless using data subsets can be justified. "We're not trying to block access to data or stop people using AI in their research - we're asking for some common sense checks," said Tulsi Suchak, a post-graduate researcher at the University of Surrey and lead author of the study. "This includes things like being open about how data is used, making sure reviewers with the right expertise are involved, and flagging when a study only looks at one piece of the puzzle." This isn't the first time the issue has come to light. Last year, US publishing house Wiley discontinued 19 scientific journals overseen by its Hindawi subsidiary that were publishing reports churned out by AI paper mills. It is also part of a wider problem of AI-generated content appearing online and in web searches that can be difficult to distinguish from reality. Dubbed "AI slop," this includes fake pictures and entire video sequences of celebrities and world leaders, but also fake historical photographs and AI-generated portraits of historical figures appearing in search results as if they were genuine.
[2]
AI tools may be weakening the quality of published research, study warns
Artificial intelligence could be affecting the scientific rigor of new research, according to a study from the University of Surrey. The research team has called for a range of measures to reduce the flood of "low-quality" and "science fiction" papers, including stronger peer review processes and the use of statistical reviewers for complex datasets. In a study published in PLOS Biology, researchers reviewed papers proposing an association between a predictor and a health condition using an American government dataset called the National Health and Nutrition Examination Survey (NHANES), published between 2014 and 2024. NHANES is a large, publicly available dataset used by researchers around the world to study links between health conditions, lifestyle and clinical outcomes. The team found that between 2014 and 2021, just four NHANES association-based studies were published each year -- but this rose to 33 in 2022, 82 in 2023, and 190 in 2024. Dr. Matt Spick, co-author of the study from the University of Surrey, said, "While AI has the clear potential to help the scientific community make breakthroughs that benefit society, our study has found that it is also part of a perfect storm that could be damaging the foundations of scientific rigor. "We've seen a surge in papers that look scientific but don't hold up under scrutiny -- this is 'science fiction' using national health datasets to masquerade as science fact. The use of these easily accessible datasets via APIs, combined with large language models, is overwhelming some journals and peer reviewers, reducing their ability to assess more meaningful research -- and ultimately weakening the quality of science overall." The study found that many post-2021 papers used a superficial and oversimplified approach to analysis -- often focusing on single variables while ignoring more realistic, multi-factor explanations of the links between health conditions and potential causes. Some papers cherry-picked narrow data subsets without justification, raising concerns about poor research practice, including data dredging or changing research questions after seeing the results. Tulsi Suchak, post-graduate researcher at the University of Surrey and lead author of the study, added, "We're not trying to block access to data or stop people using AI in their research -- we're asking for some common-sense checks. This includes things like being open about how data is used, making sure reviewers with the right expertise are involved, and flagging when a study only looks at one piece of the puzzle. "These changes don't need to be complex, but they could help journals spot low-quality work earlier and protect the integrity of scientific publishing." To help tackle the issue, the team has laid out a number of practical steps for journals, researchers and data providers. They recommend that researchers use the full datasets available to them unless there's a clear and well-explained reason to do otherwise, and that they are transparent about which parts of the data were used, over what time periods, and for which groups. For journals, the authors suggest strengthening peer review by involving reviewers with statistical expertise and making greater use of early desk rejection to reduce the number of formulaic or low-value papers entering the system. Finally, they propose that data providers assign unique application numbers or IDs to track how open datasets are used -- a system already in place for some UK health data platforms. Anietie E. Aliu, co-author of the study and post-graduate student at the University of Surrey, said, "We believe that in the AI era, scientific publishing needs better guardrails. Our suggestions are simple things that could help stop weak or misleading studies from slipping through, without blocking the benefits of AI and open data. "These tools are here to stay, so we need to act now to protect trust in research."
Share
Share
Copy Link
A University of Surrey study reveals a surge in low-quality, AI-generated research papers, particularly in health sciences, potentially compromising scientific rigor and knowledge integrity.
A groundbreaking study from the University of Surrey has raised alarming concerns about the impact of artificial intelligence on scientific research quality. The report, published in PLOS Biology, highlights a significant increase in low-quality, AI-generated research papers that could potentially undermine the foundations of scientific knowledge 12.
The study identified an "explosion of formulaic research articles" post-2021, particularly in health sciences. These papers often feature inappropriate study designs and false discoveries, primarily based on data from the US National Health and Nutrition Examination Survey (NHANES) 1. The research team observed a dramatic rise in publications analyzing single-factor associations between predictors and health conditions using the NHANES dataset:
The study revealed several red flags in these AI-generated papers:
Dr. Matt Spick, co-author of the study, warned that this trend is creating "science fiction" masquerading as scientific fact 2.
The report also noted a significant change in the geographical origin of published research. From 2014 to 2020, only two out of 25 manuscripts had a primary author affiliation in China. However, between 2021 and 2024, this number skyrocketed to 292 out of 316 manuscripts 1.
This surge in single-factor associative research increases the risk of introducing misleading findings into the broader scientific literature. The study cites examples of complex health issues like depression, cardiovascular disease, and cognitive function being investigated using simplistic, single-factor approaches 1.
To combat this issue, the research team has proposed several measures:
This problem extends beyond scientific research, contributing to the broader issue of AI-generated content online, dubbed "AI slop." This includes fake images, video sequences, and historical photographs that can be difficult to distinguish from reality 1.
As AI tools become increasingly prevalent in research, the scientific community faces a critical challenge in maintaining the integrity and quality of published work. The University of Surrey study serves as a wake-up call, urging immediate action to implement safeguards and preserve trust in scientific research in the AI era 2.
Reference
[1]
A Harvard study reveals the presence of AI-generated research papers on Google Scholar, sparking debates about academic integrity and the future of scholarly publishing. The findings highlight the challenges posed by AI in distinguishing between human-authored and machine-generated content.
4 Sources
4 Sources
AI is transforming scientific research, offering breakthroughs and efficiency, but also enabling easier fabrication of data and papers. The scientific community faces the challenge of maximizing AI's benefits while minimizing risks of misconduct.
2 Sources
2 Sources
Researchers warn that the proliferation of AI-generated web content could lead to a decline in the accuracy and reliability of large language models (LLMs). This phenomenon, dubbed "model collapse," poses significant challenges for the future of AI development and its applications.
8 Sources
8 Sources
Researchers warn about the potential misuse of AI in creating fraudulent scientific images, highlighting the need for updated protocols in the peer-review system to maintain research integrity.
2 Sources
2 Sources
A significant portion of research papers may already be co-authored by AI, raising questions about authorship, ethics, and the future of scientific publishing.
2 Sources
2 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved