2 Sources
[1]
Academics in Meltdown Now That They're Responsible for AI Hallucinations in Their Research Papers
Can't-miss innovations from the bleeding edge of science and tech Even in 2026, there are still plenty of researchers who refuse to use AI to publish their research papers. Others do use the tech for tasks like sourcing journal articles for references, editing copy, or formatting citations -- but they face pressure to verify every claim, since AI has a baked-in risk of contaminating their work with hallucinations. A vocal minority of academics, however, argue they should be able to use AI to write original research while remaining immune from any hallucinated claims or data that make their way into the final product. Last week, the open-source research repository arXiv announced that it was banning scholarly authors from the platform for up to a year if "hallucinated references" are found in their work. The rationale behind this should be obvious enough for any self-respecting academic: as arXiv computer science chair Thomas Dietterich wrote in his announcement, "if a submission contains incontrovertible evidence that the authors did not check the results of LLM generation, this means we can't trust anything in the paper." As TechCrunch observed, arXiv isn't banning AI altogether, but simply clarifying that the author is ultimately responsible for any work that goes out under their name. Makes sense, right? Apparently not. After Dietterich's announcement of X-formerly-Twitter, numerous researchers immediately went on the offensive, trashing the platform for its decision. "So this means you expect every author to check every citation and make sure that every citation is real and accurate?" economics professor at Smith College James Miller replied in shock. "What if it's beyond the ability of one of the authors to verify one of the citations because that citation is in a language he doesn't know or concerns technical material he doesn't understand but another author on the paper does?" "This is way too strict. Errors can slip in when using any tools. We aren't perfect," said Luca Ambrogioni, assistant professor in AI at the Donders Institute for brain, cognition and behaviour. "Having a prompt left in is a mistake, it's sloppy but giving permanent answer a one time sloppiness is absurd." Ambrogioni, who appears to argue that getting reprimanded via arXiv's policy on hallucinated citations will amount to a de facto "lifetime ban" from publishing, continued: "we are not taking just about false citations (more serious), but also more harmless copy pasting editing mistakes. Papers are long, the likelihood of an incorrect copy past in the supplementary isn't zero even in a otherwise good quality work." Neal Amin, a former neuroscientist and Stanford medical clinician turned biotech startup founder, wrote on X that "this is what overreaction looks like and how gatekeeping starts."
[2]
AI hallucinations are slipping past experts into papers and books to enter the permanent record | Fortune
AI hallucinations risk entering the permanent library of ideas.Will Newton for The Washington Post via Getty Images The associate professor at Columbia University's School of Nursing had grown accustomed to having artificial intelligence tools help polish scientific papers for grammar, formatting, and other details. But a few weeks after submitting his latest research, the academic journal he was due to publish in came back with questions about a reference. The AI tool Topaz had used had silently inserted a fabricated source into his work. "I felt deeply embarrassed," Topaz, who leads a team at Columbia developing AI applications in healthcare, told Fortune. "I'm an AI researcher. I know about hallucinations," he said. "If this is happening to me, an AI expert, what happens to other people?" That near-miss sent Topaz on an investigation to find out how often experts were getting subtly fooled by AI. The answer, it turns out, is a lot. In a study published earlier this month in The Lancet, Topaz and his colleagues audited nearly 2.5 million biomedical papers and 97 million citations indexed on PubMed Central, the central repository used by clinicians and researchers worldwide. They found more than 4,000 fabricated references buried across nearly 3,000 papers. Not all the references were AI-generated, though Topaz said the steady rise in fake sourcing went "vertical" in 2024, shortly after AI tools in research entered more widespread use. "It's very reasonable that AI is highly associated with them now," he said. Over the past three years, the rate of fabricated references in biomedical literature has grown more than 12-fold. In 2023, one in 2,828 papers contained at least one fake reference, a rate that had risen to one in 458 by last year. Over the first seven weeks of 2026, the researchers found, one in 277 papers had at least one non-existent reference. "I'm thinking this is just the tip of the iceberg," Topaz said. Hallucinations happen when an AI model prioritizes word patterns over accuracy. They are often harmless, but the stakes are different when AI errors begin infiltrating academic literature, as hallucinations risk undermining the scientific process. Medicine is a field that builds on itself. Clinical trials cite earlier studies; systematic reviews then aggregate those trials, and medical guidelines finally cite those reviews. Doctors and nurses rely on those guidelines when they decide how to treat patients. A fabricated study planted at the start of that process doesn't stay there. "This is the evidence chain, that's how we care for and treat people. If you put the fictional study at the bottom of the stack, the whole structure inherits it," Topaz said. "We've already seen paper mill articles included in systematic reviews informing clinical guidelines," he added. "When a guideline paper cites a paper with a partially fictional references list, the evidence-based chain for treatment decisions is compromised." That AI is vulnerable to hallucinations has been known since ChatGPT first entered the scene four years ago, when students began to bravely submit specious AI-generated papers under their own name. But with a litany of tools, agents, and extensions now ubiquitous in nearly every profession, even experts in their field are getting tripped up by AI. Take the case of Steven Rosenbaum. The author and filmmaker was in the headlines for all the wrong reasons this week after the New York Times identified a slew of inaccurate quotes throughout his new book, titled The Future of Truth: How AI Reshapes Reality. The book carried blurbs from prominent journalists, including Nicholas Thompson, The Atlantic's chief executive, and a foreword by Maria Ressa, the Nobel Peace Prize-winning reporter from the Philippines. It arrived, according to the Times, "to great fanfare." Rosenbaum's book contained more than a half-dozen misattributed or entirely invented quotes, apparently generated by AI tools he had disclosed using in his acknowledgments. In a statement to the Times, Rosenbaum recognized the errors, calling the episode "a warning about the risks of AI-assisted research and verification." Instances like these might be inevitable given how widely AI is being used in expert-level knowledge work. Several journalism outlets, Fortune included, are now piloting the use of AI tools in reporting. Surveys suggest more than half of legal professionals are using AI tools to draft briefs and memos. A recent report by the American Medical Association found over 80% of physicians now use AI professionally to summarize research and prepare clinical documentation, a share that has more than doubled since 2023. Even Nobel laureates, such as Literature Prize winner Olga Tokarczuk, admit to using AI in their work. As for research, one study last year by an American medical journal identified 36% of its papers contained at least some AI-generated text, although only 9% of researchers disclosed this when prompted prior to submitting their manuscripts. Another recent study found more than half of researchers are likely to be using AI tools while peer-reviewing other people's work. But as it turns out, experts in their field are no less likely to get duped. Topaz's study of hallucinations in biomedical research joins a growing pile of anecdotes and datasets documenting embarrassing errors, including legal analyst Damien Charlotin's catalog of 1,459 legal decisions citing AI-generated inaccurate content. Before he started the project a year ago, AI hallucinations in legal cases appeared two or three times a month. Now, there's around five a day. Fake AI-generated research papers are already a problem in academia, increasingly difficult to parse through and threatening to overwhelm the peer-review system. But hallucinated references in real studies produced by humans could be just as widespread, and potentially even harder to track down. The vast majority of papers tracked by Topaz contained only one or two fabricated citations, out of the several dozen references academic studies usually need to publish, suggesting most cases of AI hallucinations in research are unintentional. But the publishing industry might not be prepared to handle the surging number of fake references, Topaz said. Verification methods differ between journals, and while some use software to check references and scan for AI-generated content, enforcement varies wildly. There is also no easy mechanism to retroactively screen the evidence chain to find original fake studies or references. So far, few journals have been able to identify hallucinations, as Topaz's analysis found 98.4% of studies with fake references had not been retracted by publishers at the time of his audit. It's part of what people in the field have referred to as science's "reproducibility crisis," compounded in the age of AI by a rising flood of useless or unreliable AI-generated content that now permeates academic literature. But it's a similar story in other fields that rely on output that can be reproduced. Stories in newspapers drive conversations and form the bedrock of future investigations. Legal decisions are eventually cited by lawyers and scholars in other cases. Topaz said AI itself is not necessarily the villain, and he gladly uses it in his own work. "The problem is unverified AI output entering the permanent record," he said. "The fix is not to stop using the tools, it's to build verification into the workflow." "The longer we wait to put verifications in place, the harder it becomes to clean up," he added. AI hallucinations don't care how well-versed in a subject users are. The mistakes are designed to look real, and they're getting better at hiding. The more consequential the field -- be it medicine, law, or journalism -- the more dangerous errors become when they aren't caught.
Share
Copy Link
The open-source research repository arXiv announced bans for authors who publish AI-hallucinated references, sparking fierce debate among academics. A new study reveals fabricated references in biomedical papers have surged 12-fold in three years, with one in 277 papers containing fake citations by early 2026. The controversy highlights growing tensions over AI tools in academic settings and who bears responsibility when AI-generated inaccuracies slip into the permanent record.
The open-source research repository arXiv has ignited a firestorm in academia by announcing it will ban authors for up to a year if AI-hallucinated references appear in their work
1
. Computer science chair Thomas Dietterich explained the rationale plainly: "if a submission contains incontrovertible evidence that the authors did not check the results of LLM generation, this means we can't trust anything in the paper"1
. The AI policy clarifies that arXiv isn't banning AI tools in academic settings altogether, but rather holding researchers accountable for verifying AI-generated claims before publication.
Source: Fortune
The reaction from some academics has been swift and hostile. Smith College economics professor James Miller questioned whether authors should be expected to verify every citation, particularly when references appear in unfamiliar languages or technical domains
1
. Luca Ambrogioni, assistant professor in AI at the Donders Institute, called the policy "way too strict," arguing that "errors can slip in when using any tools" and that punishing one-time sloppiness amounts to overreach1
. Former Stanford neuroscientist Neal Amin characterized the move as "what overreaction looks like and how gatekeeping starts"1
.The controversy arrives as new research exposes the scale of AI hallucinations infiltrating scientific literature. Columbia University's Nursing School associate professor and AI researcher experienced this firsthand when an AI tool silently inserted a fabricated source into his research papers
2
. "I felt deeply embarrassed," Topaz told Fortune. "I'm an AI researcher. I know about hallucinations. If this is happening to me, an AI expert, what happens to other people?"2
.His investigation, published in The Lancet, audited nearly 2.5 million biomedical papers and 97 million citations indexed on PubMed Central. The findings reveal more than 4,000 fabricated references buried across nearly 3,000 papers
2
. While not all were AI-generated, the rate of fake sourcing went "vertical" in 2024, coinciding with widespread adoption of AI in research2
. In 2023, one in 2,828 papers contained at least one fake reference. By 2025, that ratio had climbed to one in 458. Over the first seven weeks of 2026, one in 277 papers had at least one non-existent reference2
.The stakes extend beyond academic integrity into patient care. Medicine builds on cumulative evidence: clinical trials cite earlier studies, systematic reviews aggregate those trials, and medical guidelines cite those reviews. Doctors and nurses rely on these guidelines when deciding treatment protocols. "This is the evidence chain, that's how we care for and treat people. If you put the fictional study at the bottom of the stack, the whole structure inherits it," Topaz explained
2
. The risk of AI-generated inaccuracies entering the permanent record means compromised evidence chains could directly influence clinical decisions.
Source: Futurism
The problem extends beyond medicine. Author Steven Rosenbaum's book "The Future of Truth: How AI Reshapes Reality" contained more than a half-dozen misattributed or entirely invented quotes, apparently generated by AI tools he disclosed using
2
. Rosenbaum called the episode "a warning about the risks of AI-assisted research and verification"2
.Related Stories
The central question driving this academic meltdown is who bears responsibility for AI hallucinations when experts themselves are getting fooled. Surveys indicate over 80% of physicians now use AI professionally to summarize research and prepare clinical documentation, more than doubling since 2023
2
. One study identified 36% of papers from an American medical journal contained at least some AI-generated text, though only 9% of researchers disclosed this2
.While many researchers use AI tools responsibly for tasks like formatting citations, editing copy, or sourcing journal articles, they face mounting pressure to verify every claim
1
. A vocal minority argues they should remain immune from hallucinated claims that slip through, but arXiv's stance is clear: authors remain ultimately responsible for any work published under their name1
. As AI tools become ubiquitous across journalism, legal work, and research, the challenge of maintaining quality control while leveraging efficiency gains will only intensify. Topaz suspects current findings represent "just the tip of the iceberg"2
.Summarized by
Navi
[1]
21 Jan 2026•Science and Research

11 Dec 2025•Entertainment and Society

15 May 2026•Policy and Regulation

1
Technology

2
Technology

3
Science and Research
