4 Sources
4 Sources
[1]
Irony alert: Hallucinated citations found in papers from NeurIPS, the prestigious AI conference
AI detection startup GPTZero scanned all 4,841 papers accepted by the prestigious Conference on Neural Information Processing Systems (NeurIPS), which took place last month in San Diego. The company found 100 hallucinated citations across 51 papers that it confirmed as fake, the company tells TechCrunch. Having a paper accepted by NeurIPS is a resume-worthy achievement in the world of AI. Given that these are the leading minds of AI research, one might assume they would use LLMs for the catastrophically boring task of writing citations. So caveats abound with this finding: 100 confirmed hallucinated citations across 51 papers is not statistically significant. Each paper has dozens of citations. So out of tens of thousands of citations, this is, statistically, zero. It's also important to note that an inaccurate citation doesn't negate the paper's research. As NeurIPS told Fortune, which was first to report on this GPTZero's research, "Even if 1.1% of the papers have one or more incorrect references due to the use of LLMs, the content of the papers themselves [is] not necessarily invalidated." But having said all that, a faked citation is not a nothing, either. NeurIPS prides itself on its "rigorous scholarly publishing in machine learning and artificial intelligence," it says. And each paper is peer-reviewed by multiple people who are instructed to flag hallucinations. Citations are also a sort of currency for researchers. They are used as a career metric to show how influential a researcher's work is among their peers. When AI makes them up, it waters down their value. No one can fault the peer reviewers for not catching a few AI-fabricated citations given the sheer volume involved. GPTZero is also quick to point this out. The goal of the exercise was to offer specific data on how AI slop sneaks in via "a submission tsunami" that has "strained these conferences' review pipelines to the breaking point," the startup says in its report. GPTZero even points to a May 2025 paper called "The AI Conference Peer Review Crisis" that discussed the problem at premiere conferences including NeurIPS. Still, why couldn't the researchers themselves fact-check the LLMs work for accuracy? Surely, they must know the actual list of papers they used for their work. What the whole thing really points to one big, ironic takeaway: If the world's leading AI experts, with their reputations at stake, can't ensure their LLM usage is accurate in the details, what does that mean for the rest of us?
[2]
AI conference's papers contaminated by AI hallucinations
100 vibe citations spotted in 51 NeurIPS papers show vetting efforts have room for improvement GPTZero, a detector of AI output, has found yet again that scientists are undermining their credibility by relying on unreliable AI assistance. The New York-based biz has identified 100 hallucinations in more than 51 papers accepted by the Conference on Neural Information Processing Systems (NeurIPS). This finding follows the company's prior discovery of 50 hallucinated citations in papers under review by the International Conference on Learning Representations (ICLR). GPTZero's senior machine-learning engineer Nazar Shmatko, head of machine learning Alex Adam, and academic writing editor Paul Esau argue in a blog post that the availability of generative AI tools has fueled "a tsunami of AI slop." "Between 2020 and 2025, submissions to NeurIPS increased more than 220 percent from 9,467 to 21,575," they observe. "In response, organizers have had to recruit ever greater numbers of reviewers, resulting in issues of oversight, expertise alignment, negligence, and even fraud." These hallucinations consist largely of authors and sources invented by generative AI models, and of purported AI-authored text. The legal community has been dealing with similar issues. More than 800 errant legal citations attributed to AI models have been flagged in various court filings, often with consequences for the attorneys, judges, or plaintiffs involved. Academics may not face the same misconduct sanctions as legal professionals, but the consequences from the careless application of AI can have consequences beyond squandered integrity. The AI paper submission surge has coincided with an increase in the number of substantive errors in academic papers - mistakes like incorrect formulas, miscalculations, errant figures, and so on, as opposed to citations of non-existing source material. A pre-print paper published in December 2025 by researchers from Together AI, NEC Labs America, Rutgers University, and Stanford University looked specifically at AI papers from three major machine learning organizations: ICLR (2018-2025), NeurIPS (2021-2025), and TMLR (Transactions on Machine Learning Research) (2022-2025). The authors found "published papers contain a non-negligible number of objective mistakes and that the average number of mistakes per paper has increased over time - from 3.8 in NeurIPS 2021 to 5.9 in NeurIPS 2025 (55.3 percent increase); from 4.1 in ICLR 2018 to 5.2 in ICLR 2025; and from 5.0 in TMLR 2022/23 to 5.5 in TMLR 2025." Correlation is not causation, but when the error rate in NeurIPS papers has increased 55.3 percent following the introduction of OpenAI's ChatGPT, the rapid adoption of generative AI tools cannot be ignored. The risk of unchecked AI usage for scientists is not just reputational. It may invalidate their work. A spokesperson for NeurIPS didn't immediately respond to our request for comment. We'll update this story if we hear back post-publication. GPTZero contends that its Hallucination Check software should be a part of a publisher's arsenal of AI-detection tools. That may help when attempting to determine whether a citation refers to actual research, but there are countermeasures that claim to be able to make AI authorship more difficult to detect. For example, a Claude Code skill called Humanizer says it "removes signs of AI-generated writing from text, making it sound more natural and human." And there are many other anti-forensic options. A recent report from the International Association of Scientific, Technical & Medical Publishers (STM) attempts to address the integrity challenges the scholarly community faces. The report says that the amount of academic communication reached 5.7 million articles in 2024, up from 3.9 million five years earlier. And it argues that publishing practices and policies need to adapt to the reality of AI-assisted and AI-fabricated research. "Academic publishers are definitely aware of the problem and are taking steps to protect themselves," said Adam Marcus, co-founder of Retraction Watch, which has documented many AI-related retractions, and managing editor of Gastroenterology & Endoscopy News, in an email to The Register. "Whether those will succeed remains to be seen. "We're in an AI arms race and it's not clear the defenders can withstand the siege. However, it's also important to recognize that publishers have made themselves vulnerable to these assaults by adopting a business model that has prioritized volume over quality. They are far from innocent victims." ®
[3]
How AI-generated references are polluting scientific papers
Artificial intelligence tools are now appearing inside top-tier research papers - and in some cases, they are introducing references to studies that do not exist. The problem has surfaced in accepted conference papers, raising concerns about how easily reference errors can slip into peer-reviewed work. A recent scan of 4,841 accepted papers identified 100 fabricated citations across 51 submissions. The review came from GPTZero, which examined reference lists after finding that citation mistakes often survive peer review. The results matter because they involve conferences such as NeurIPS, one of the most selective venues in artificial intelligence research. While invented citations can trigger rejection or revocation, the findings highlight a broader risk: as AI writing tools spread, even small reference failures can make verification harder and can weaken trust in scientific publishing. The damage of false AI references A fake citation is more than a typo, because it breaks the trail that lets readers track evidence. Some authors used a large language model (LLM) - a text-prediction system trained on vast collections of text - which can invent sources. NeurIPS pointed out that even if 1.1 percent of papers contain one or more incorrect references due to the use of large language models, the papers' core findings are not necessarily invalidated. That stance protects valid results, yet it still leaves readers with extra work when they need to verify claims. Why fake references appear Prediction-driven writing rewards plausibility, so an LLM can sound confident while guessing details it never truly checked. Because the model fills patterns from training text, it may blend real authors with incorrect journals and dates. Standard citation styles add believable structure, making these mistakes harder to spot during a fast final edit. Simple database searches can stop this, but only if someone runs them before trusting the generated reference list. Citations as career currency In research hiring, citation metrics - measures of how often a paper is cited - often sit alongside letters of recommendation and awards. Those numbers matter because they signal attention, which can translate into funding, jobs, and invitations to collaborate. The San Francisco Declaration on Research Assessment (DORA) urges institutions to judge the work itself, not journal-based scores used as shortcuts. Made-up citations blur those signals, and they can reward sloppy behavior by padding influence that was never earned. Too many papers to check Official statistics put the NeurIPS main track at 21,575 submissions and 5,290 acceptances, a 24.52 percent rate. That volume forced organizers to rely on a huge volunteer network, where reviewers juggle research, teaching, and deadlines. Program chairs wrote that limited time kept them from manually revisiting every outlier decision flagged by scores. When attention runs thin, reference lists become easy to skim, so small errors can slide into the final record. How references get verified Citation checkers start by splitting each reference into parts, then they standardize spelling and punctuation across entries. Next, they query bibliographic databases, online indexes that store paper titles and authors, and flag entries with no matches. Most systems also score near-matches, since small typos can hide in initials, page numbers, or conference names. A flagged entry still needs judgment, because older books and early drafts posted online sometimes sit outside major databases. Rethinking peer review incentives Calls for reform are growing, since conferences depend on goodwill while the number of submissions keeps climbing each year. A position paper proposed letting authors rate review quality and giving reviewers formal credit for effort. Those feedback loops could discourage rushed, template-like reviews, because poor work would become visible to the same community that submits papers. Even with better incentives, automated citation checks may still be needed so reviewers can spend their time on results. Preventing citation mistakes Careful authorship treats citations as evidence, so the reference list deserves the same attention as figures and tables. Reference managers can pull details from databases, which reduces hand typing and keeps titles, years, and author order consistent. When AI systems help draft text, verifying each referenced title in a search engine can catch fabricated sources before submission. That habit adds minutes, yet it spares readers from chasing dead ends when they try to follow the supporting literature. The ripple effect of mistakes Evidence from earlier tests shows that chatbots can produce polished reference lists even when the sources do not exist. One peer-reviewed study found that 55 percent of AI-generated references from an earlier ChatGPT model were fabricated, while a newer version reduced that rate to 18 percent. Many of the remaining errors blended real and fake details, making quick human checks less reliable. Those small reference failures can easily spread beyond a single paper. Under deadline pressure, conference submissions and other scholarly work may inherit unverified citations pasted into otherwise careful prose. Those errors can then ripple from review panels to everyday readers. Clearer policies, better tools, and routine citation checks can help protect trust in scholarship as AI writing tools become more common. -- - Like what you read? Subscribe to our newsletter for engaging articles, exclusive content, and the latest updates. Check us out on EarthSnap, a free app brought to you by Eric Ralls and Earth.com.
[4]
NeurIPS papers contained 100+ AI-hallucinated citations, new report claims | Fortune
NeurIPS, one of the world's most prestigious AI research conferences, held its 38th annual meeting in San Diego in December, drawing tens of thousands of submissions and participants. What was once a largely academic gathering has become a prime hunting ground for top AI labs, where a strong showing can translate directly into job offers. Researchers whose papers are accepted for live presentation are considered among the field's elite. Yet Canadian startup GPTZero analyzed more than 4,000 research papers accepted and presented at NeurIPS (Neural Information Processing Systems) 2025 and says it uncovered hundreds of AI-hallucinated citations that slipped past the three or more reviewers assigned to each submission, spanning at least 53 papers in total. The hallucinations had not previously been reported. From fully made-up citations to subtle changes In some cases, an AI model blended or paraphrased elements from multiple real papers, including believable-sounding titles and author lists, the company says. Others appeared to be fully made up: a nonexistent author, a fabricated paper title, a fake journal or conference, or a URL that leads nowhere. In other cases, the model started from a real paper but made subtle changes -- expanding an author's initials into a guessed first name, dropping or adding coauthors, or paraphrasing the title. Some, however, are plainly wrong -- citing "John Smith" and "Jane Doe" as authors, for example. When reached for comment, the NeurIPS board shared the following statement: "The usage of LLMs in papers at AI conferences is rapidly evolving, and NeurIPS is actively monitoring developments. In previous years, we piloted policies regarding the use of LLMs, and in 2025, reviewers were instructed to flag hallucinations. Regarding the findings of this specific work, we emphasize that significantly more effort is required to determine the implications. Even if 1.1% of the papers have one or more incorrect references due to the use of LLMs, the content of the papers themselves are not necessarily invalidated. For example, authors may have given an LLM a partial description of a citation and asked the LLM to produce bibtex (a formatted reference). As always, NeurIPS is committed to evolving the review and authorship process to best ensure scientific rigor and to identify ways that LLMs can be used to enhance author and reviewer capabilities." Edward Tian, cofounder and CEO of GPTZero, which was founded in January 2023 and raised a $10 million Series A round in 2024, told Fortune the NeurIPS analysis came just weeks after the company uncovered 50 hallucinated citations in papers under review for another top AI research conference, ICLR, which will be held in Rio de Janeiro in April. In that case, the papers had not yet been accepted -- but the bogus citations had already slipped past peer reviewers. Tian said the ICLR conference has hired the company to check future submissions for fabricated citations during peer review. Errors appeared in papers accepted and presented at NeurIPS According to Tian, the NeurIPS findings are even more troubling because the errors appear in papers that were accepted by the conference. In the academic world of AI, "publish or perish" is more than a cliché: hiring and tenure often hinge on accumulating peer-reviewed publications. Yet under long-standing academic norms, even a single fabricated citation would, in principle, be grounds for rejection. References are meant to anchor a paper in the existing body of research -- and to demonstrate that its authors have actually read and engaged with the work they cite. "It's definitely a bigger escalation in the sense that these were this first documented cases of hallucinated citations entering the official record of the top machine learning conference," Tian said, pointing out that since NeurIPS 2025 had an acceptance rate for main track papers of 24.52%, each of these papers beat out 15,000 other papers despite containing one or more hallucinations. "These survived peer review, and were published in the final conference proceeding," he said. "So it's definitely a big moment." Around half of the papers with hallucinated citations were papers that were likely to be AI-generated themselves or had a high amount of AI use, he added. "But what we were really focused on in this investigation is the citations themselves," he said. AI detection tools have often been criticized for false positives in attempting to identify machine-written text. But Tian argued that hallucination detection is a different class of problem, with GPTZero's tool checking verifiable facts -- searching the open web and academic databases to confirm whether a cited paper actually exists. The company says the tool is more than 99% accurate, and for the NeurIPS analysis, every flagged citation was also reviewed by a human expert on GPTZero's machine-learning team. Alex Cui, Tian's cofounder and chief technology officer, said that GPTZero's hallucination checker tool ingests a paper and then searches across the open web and academic databases to verify each citation -- its authors, title, publication venue, and link. If a reference can't be found, or if it only partially matches a real paper, the system flags it. That's how it catches cases where an AI model starts from a real paper but adds authors who don't exist, alters the title, or invents a publication. "Sometimes, even when there is a match, you'll find that like they added like five authors who don't exist to a real paper, so these are mistakes that no human would reasonably make," he explained. For the NeurIPS investigation, after the automated scan, a member of GPTZero's machine-learning team manually verified every flagged citation, ensuring the findings aren't themselves false positives. Sheer volume of papers makes deep scrutiny difficult A big part of the challenge is sheer scale. In 2025, the main NeurIPS research track received 21,575 valid submissions -- up from 15,671 in 2024 and 12,343 in 2023. Even with thousands of volunteer reviewers, that volume makes deep scrutiny of every paper and its references increasingly difficult. But while AI has a part in that by making it dramatically easier to churn out conference submissions, Tian said, a flawed paper still carries real reputational risk -- for the authors, for the conference that accepted it, and for the companies that hire researchers based on those credentials. That's particularly true for citations, he said, because in modern AI research, citations are a core part of how the field tries to solve issues of reproducibility. "AI results are notoriously hard to reproduce, so citations are important," he said, to "draw the line between whether that result was reproducible or not," by letting other researchers trace a result back to something concrete and testable. Hallucinated citations, on the other hand, sends readers to something that doesn't exist.
Share
Share
Copy Link
GPTZero scanned 4,841 papers from NeurIPS, one of AI's most prestigious conferences, and found 100 hallucinated citations across 51 accepted papers. The discovery highlights how AI-generated references are infiltrating scientific papers despite rigorous peer review, raising concerns about research integrity as submission volumes surge 220% since 2020.
GPTZero, an AI detection startup, has uncovered a troubling pattern at the heart of AI research itself. After scanning all 4,841 papers accepted by the Conference on Neural Information Processing Systems (NeurIPS) in December, the company identified 100 hallucinated citations across 51 scientific papers that slipped past multiple peer reviewers
1
. These fabricated citations included nonexistent authors, made-up paper titles, fake journals, and URLs leading nowhere4
. The findings expose how AI-generated references are contaminating academic publishing at one of the world's most selective AI research venues, where acceptance rates hover around 24.52%3
.
Source: Fortune
NeurIPS prides itself on rigorous scholarly work, making the discovery particularly ironic. Edward Tian, cofounder and CEO of GPTZero, told Fortune this represents "the first documented cases of hallucinated citations entering the official record of the top machine learning conference"
4
. The detection follows GPTZero's earlier discovery of 50 hallucinated citations in papers under review for ICLR, another major AI conference2
.The problem stems from researchers using Large Language Models (LLMs) to handle citation tasks. These AI systems can sound confident while inventing details they never verified. In some cases, an LLM blended elements from multiple real papers, creating believable-sounding titles and author lists
4
. Other instances showed subtle changes—expanding author initials into guessed first names, dropping coauthors, or paraphrasing titles4
. Some citations plainly listed "John Smith" and "Jane Doe" as authors4
.
Source: Earth.com
Prediction-driven writing rewards plausibility, so LLM-generated content can appear credible while containing fundamental errors
3
. Earlier studies found that 55% of AI-generated references from older ChatGPT models were fabricated, though newer versions reduced this to 18%3
. Around half the papers with hallucinated citations showed signs of extensive AI use4
.The scale of the problem reflects broader pressures on academic publishing. Between 2020 and 2025, submissions to NeurIPS surged 220%—from 9,467 to 21,575 papers
2
. This submission tsunami has strained the peer review process to breaking point, forcing organizers to recruit ever-larger numbers of peer reviewers2
. When reviewers juggle research, teaching, and tight deadlines, reference lists become easy to skim3
.NeurIPS instructed reviewers to flag AI hallucinations, yet the errors survived
4
. GPTZero senior machine-learning engineer Nazar Shmatko and colleagues argue that generative AI tools have fueled "a tsunami of AI slop" that creates issues of oversight, expertise alignment, and even fraud2
.
Source: TechCrunch
No one can fault peer reviewers given the sheer volume involved, but the findings raise questions about research integrity when verification fails
1
.Fabricated citations carry consequences beyond simple errors. In AI research, citations function as career currency—metrics that demonstrate how influential a researcher's work is among peers
1
. Citation metrics often sit alongside recommendation letters during hiring decisions, signaling attention that translates into funding, jobs, and collaboration invitations3
. When AI makes them up, it waters down their value1
.The NeurIPS board emphasized that "even if 1.1% of the papers have one or more incorrect references due to the use of LLMs, the content of the papers themselves are not necessarily invalidated"
1
. While this protects valid findings, it leaves readers with extra verification work when tracking evidence3
.The citation problem coincides with increasing substantive errors in scientific papers. A December 2025 pre-print from researchers at Together AI, NEC Labs America, Rutgers University, and Stanford University examined AI papers from ICLR, NeurIPS, and TMLR
2
. They found the average number of mistakes per paper increased 55.3% at NeurIPS—from 3.8 errors in 2021 to 5.9 in 20252
. These mistakes include incorrect formulas, miscalculations, and errant figures beyond citation issues2
.Academic communication reached 5.7 million articles in 2024, up from 3.9 million five years earlier, according to the International Association of Scientific, Technical & Medical Publishers
2
. Alex Marcus, co-founder of Retraction Watch, noted that "publishers have made themselves vulnerable to these assaults by adopting a business model that has prioritized volume over quality"2
.Related Stories
GPTZero argues its Hallucination Check software should become part of publishers' AI detection tools arsenal
2
. Unlike text-based AI detection prone to false positives, hallucination detection verifies facts by searching academic databases and the open web to confirm whether cited papers exist4
. The company claims accuracy above 99%, with every flagged citation reviewed by human experts4
. ICLR has hired GPTZero to check future submissions during peer review4
.Yet countermeasures exist. Tools like Claude Code's "Humanizer" claim to remove signs of AI-generated writing, making detection harder
2
. This creates an arms race where defenders may struggle to withstand the siege2
.The discovery raises a pointed question: If leading AI experts with reputations at stake cannot ensure AI accuracy in their own work, what does that mean for wider adoption
1
? The legal community has flagged more than 800 errant citations attributed to AI models in court filings, often with consequences for attorneys and judges2
. Academic rigor demands the same fact-checking standards, yet publishing practices have not adapted to the reality of LLM-generated content2
.Reform proposals include letting authors rate review quality and giving peer reviewers formal credit for effort, creating feedback loops that discourage rushed work
3
. Reference managers that pull details from databases can reduce typing errors and maintain consistency3
. When AI systems help draft text, verifying each referenced title adds minutes but spares readers from chasing dead ends3
. As data integrity concerns mount, the question becomes whether academic publishing can maintain trust while navigating the flood of AI-assisted research assessment and submission growth.Summarized by
Navi
[1]
[2]
11 Dec 2025•Entertainment and Society

12 Sept 2024

12 Jul 2025•Science and Research

1
Policy and Regulation

2
Technology

3
Technology
