OpenAI's Research Uncovers Root Cause of AI Hallucinations, Proposes Controversial Fix

Reviewed by Nidhi Govil

11 Sources

[1]

Nature

Can researchers stop AI making up citations?

Artificial intelligence (AI) models are known to confidently conjure up fake citations. When the company OpenAI released GPT-5, a suite of large language models (LLMs), last month, it said it had reduced the frequency of fake citations and other kinds of 'hallucination', as well as 'deceptions', whereby an AI claims to have performed a task it hasn't. With GPT-5, OpenAI, based in San Francisco, California, is bucking an industry-wise trend, because newer AI models designed to mimic human reasoning tend to generate more hallucinations than do their predecessors. On a benchmark that tests a model's ability to produce citation-based responses, GPT-5 beat its predecessors. But hallucinations remain inevitable, because of the way LLMs function. "For most cases of hallucination, the rate has dropped to a level" that seems to be "acceptable to users", says Tianyang Xu, an AI researcher at Purdue University in West Lafayette, Indiana. But in particularly technical fields, such as law and mathematics, GPT-5 is still likely to struggle, she says. And despite the improvements in hallucination rate, users quickly found that the model errs in basic tasks, such as creating an illustrated timeline of US presidents. OpenAI is making "small steps that are good, but I don't think we're anywhere near where we need to be", says Mark Steyvers, a cognitive science and AI researcher at the University of California, Irvine. "It's not frequent enough that GPT says 'I don't know'." Hallucinations are a result of the fundamental way in which LLMs work. As statistical machines, the models make predictions by generalizing on the basis of learnt associations, leading them to produce answers that are plausible, but sometimes wrong.Another issue is that, similar to a student scoring points for guessing on a multiple choice exam, during training LLMs get rewarded for having a go rather than acknowledging their uncertainty, according to a preprint published by OpenAI on 4 September. Improvements have come from scaling up the size of LLMs -- in terms of both the richness of their internal associations and the amount of data they are trained on, says Xu. But hallucinations are particularly prevalent in topics for which the model has scant training data or its underlying information is wrong, she says. Hallucinations can also happen when an AI tries to summarize or analyse papers that are too long for that model to process. Eliminating hallucinations entirely is likely to prove impossible, says Mushtaq Bilal, a researcher at Silvi, a Copenhagen-based firm that makes an AI app to aid the creation of systematic reviews in science. "I think if it was possible, AI labs would have done it already." But reducing errors and getting a model to admit that it doesn't know an answer have been "a pretty heavy focus" for OpenAI, says Saachi Jain, who manages the firm's AI safety team. According to technical documents released with GPT-5, OpenAI concentrated on "training our models to browse effectively for up-to-date information", as well as cutting hallucinations. The firm focused on reducing hallucinations in lengthy, open-ended responses to queries, because this best represents real-life use of ChatGPT, says Jain. In one literature-review benchmark known as ScholarQA-CS, GPT-5 "performs well" when it is allowed to access the web, says Akari Asai, an AI researcher at the Allen Institute for Artificial Intelligence, based in Seattle, Washington, who ran the tests for Nature. In producing answers to open-ended computer-science questions, for example, the model performed marginally better than human experts, with a correctness score of 55% (based on measures such as how well its statements are supported by citations) compared with 54% for scientists, but just behind a version of institute's own LLM-based system for literature review, OpenScholar, which achieved 57%. However, GPT-5 suffered when the model was unable to get online, says Asai. The ability to cross-check with academic databases is a key feature of most AI-powered systems designed to help with literature reviews. Without Internet access, GPT-5 fabricated or muddled half the number of citations that one of its predecessors, GPT-4o, did. But it still got them wrong 39% of the time, she says. On the LongFact benchmark, which tests accuracy in long-form responses to prompts, OpenAI reported that GPT-5 hallucinated 0.8% of claims in responses about people or places when it was allowed to browse the web, compared with 5.1% for OpenAI's reasoning model o3. Performance dropped when browsing was not permitted, with GPT-5's error rate climbing to 1.4% compared with 7.9% for o3. Both models showed worse performance than did the non-reasoning model GPT-4o, which had an error rate of 1.1% when offline. On other independent evaluations -- such as the Hughes Hallucination Evaluation Model, which is run by the AI platform Vectara in Palo Alto, California, and looks at how often an LLM makes false claims when summarizing a document -- rival models such as Google's Gemini 2.0 slightly outperformed GPT-5, although both erred less than 1.5% of the time. OpenAI also reported that the model was more honest in its responses than the company's previous models were. When given a coding task that was impossible to complete -- for example, owing to a lack of access to necessary hardware -- GPT-5 claimed to have done the task 17% of the time, compared with 47% for o3. Although Jain wouldn't give details of the firm's methods, she hinted that, in later stages of the model's training, OpenAI worked on rewarding it for answering honestly. This same stage of training might previously have worked in the other direction to increase model dishonesty. OpenAI said that models can learn to be "overconfident" and "cheat" during training using a common technique that incentivizes models to respond in ways that please human assessors by appearing to be helpful. Researchers are exploring ways to get LLMs to reveal the level of confidence a model has in an answer being factually correct, but it is not yet clear whether models can accurately gauge this statistical probability. Such efforts are "a very active workstream" for OpenAI, says Jain. Realistic tests of how humans interact with models, and make decisions based on the given information, are important metrics that are missing from OpenAI's evaluations, says Steyvers. People often take AI outputs at face value, despite warnings not to do so, because LLMs are made to produce responses with confident, lengthy answers -- hallmarks humans associate with real expertise, he says. "Even I am persuaded by it, and I know about these biases," he adds. Other researchers are focusing more on the part individuals play in managing their use of LLMs. There is a trade-off between hallucinations and the models' "enormous" labour-saving potential when used appropriately, says Bilal. "This is an issue of building new types of intuitions for an AI age," he says.

[2]

ZDNet

OpenAI's fix for hallucinations is simpler than you think

Even the biggest and most advanced generative AI models occasionally hallucinate, or generate inaccurate information presented as fact. Now, OpenAI claims to understand why -- while offering a possible solution. In a research paper published last week, a team of researchers from the company argued that hallucination stems not from the quality of a model's training data, but rather from flawed evaluation incentives. These are widely used throughout the industry and reward guessing over the admission of uncertainty. Also: Your favorite AI chatbot is full of lies "Language models are optimized to be good test-takers, and guessing when uncertain improves test performance," the authors write in the paper. Models are trained to identify subtle mathematical patterns from an enormous corpus of training data, which they then use as a framework for generating responses to user queries. The current evaluation paradigm essentially uses a simple, binary grading metric, rewarding them for accurate responses and penalizing them for inaccurate ones. According to this method, admitting ignorance is judged as an inaccurate response, which pushes models toward generating what OpenAI describes as "overconfident, plausible falsehoods" -- hallucination, in other words. (Disclosure: Ziff Davis, ZDNET's parent company, filed an April 2025 lawsuit against OpenAI, alleging it infringed Ziff Davis copyrights in training and operating its AI systems.) If asked to state your birthday, for example, a model might take a wild guess rather than simply saying, "I don't know." It has a one-in-365 chance of being correct; not tremendously great odds, but better than just admitting ignorance -- which, according to current evaluation metrics, would guarantee zero points for the model. Models are evaluated on their average performance across millions of outputs, exerting a subtle statistical pressure toward guesswork. If enough users ask the model to guess their birthday enough times, odds are it will generate the correct answer some tiny percentage of the time. Better to roll the dice and get those points than just admit ignorance and never win at all. Also: DeepSeek may be about to shake up the AI world again - what we know "Strategically guessing when uncertain improves accuracy but increases errors and hallucinations," OpenAI wrote in an accompanying blog post about its findings. Since this "accuracy-only" approach currently pervades the industry, determining which models dominate scoreboards, developers are incentivized to keep building models that prioritize guessing over admitting uncertainty, leading to more hallucinations. The solution, according to OpenAI, is therefore to focus not on feeding models more accurate information, but to adjust the structure of how their performance is assessed. Since a binary system of grading a model's output as either right or wrong is supposedly fueling hallucination, the OpenAI researchers say that the AI industry must instead start rewarding models when they express uncertainty. After all, truth does not exist in black-and-white in the real world, so why should AI be trained as if it does? Running a model through millions of examples on the proper arrangement of subjects, verbs, and predicates will make them more fluent in their use of natural language, but as any living human being knows, reality is open to interpretation. In order to live functionally in the world, we routinely have to say, "I don't know." Also: Chatbots are distorting news - even for paid users Similarly, the OpenAI researchers argue that models will continue to hallucinate so long as they're rewarded for guessing when they should be admitting ignorance. "Simple modifications of mainstream evaluations can realign incentives, rewarding appropriate expressions of uncertainty rather than penalizing them," they write in the new paper. "This can remove barriers to the suppression of hallucinations, and open the door to future work on nuanced language models with richer pragmatic competence."

[3]

The Conversation

Why OpenAI's solution to AI hallucinations would kill ChatGPT tomorrow

University of Sheffield provides funding as a founding partner of The Conversation UK. OpenAI's latest research paper diagnoses exactly why ChatGPT and other large language models can make things up - known in the world of artificial intelligence as "hallucination". It also reveals why the problem may be unfixable, at least as far as consumers are concerned. The paper provides the most rigorous mathematical explanation yet for why these models confidently state falsehoods. It demonstrates that these aren't just an unfortunate side effect of the way that AIs are currently trained, but are mathematically inevitable. The issue can partly be explained by mistakes in the underlying data used to train the AIs. But using mathematical analysis of how AI systems learn, the researchers prove that even with perfect training data, the problem still exists. The way language models respond to queries - by predicting one word at a time in a sentence based on probabilities - naturally produces errors. The researchers in fact show that the total error rate for generating sentences is at least twice as high as the error rate the same AI would have on a simple yes/no question, because mistakes can accumulate over multiple predictions. In other words, hallucination rates are fundamentally bounded by how well AI systems can distinguish valid from invalid responses. Since this classification problem is inherently difficult for many areas of knowledge, hallucinations become unavoidable. It also turns out that the less a model sees a fact during training, the more likely it is to hallucinate when asked about it. With birthdays of notable figures, for instance, it was found that if 20% of such people's birthdays only appear once in training data, then base models should get at least 20% of birthday queries wrong. Sure enough, when researchers asked state-of-the-art models for the birthday of Adam Kalai, one of the paper's authors, DeepSeek-V3 confidently provided three different incorrect dates across separate attempts: "03-07", "15-06", and "01-01". The correct date is in the autumn, so none of these were even close. The evaluation trap More troubling is the paper's analysis of why hallucinations persist despite extensive post-training efforts (such as providing extensive human feedback to an AI's responses before it is released to the public). The authors examined ten major AI benchmarks, including those used by Google, OpenAI and also the top leaderboards that rank AI models. This revealed that nine benchmarks use binary grading systems that award zero points for AIs expressing uncertainty. This creates what the authors term an "epidemic" of penalising honest responses. When an AI system says "I don't know", it receives the same score as giving completely wrong information. The optimal strategy under such evaluation becomes clear: always guess. The researchers prove this mathematically. Whatever the chances of a particular answer being right, the expected score of guessing always exceeds the score of abstaining when an evaluation uses binary grading. The solution that would break everything OpenAI's proposed fix is to have the AI consider its own confidence in an answer before putting it out there, and for benchmarks to score them on that basis. The AI could then be prompted, for instance: "Answer only if you are more than 75% confident, since mistakes are penalised 3 points while correct answers receive 1 point." The OpenAI researchers' mathematical framework shows that under appropriate confidence thresholds, AI systems would naturally express uncertainty rather than guess. So this would lead to fewer hallucinations. The problem is what it would do to user experience. Consider the implications if ChatGPT started saying "I don't know" to even 30% of queries - a conservative estimate based on the paper's analysis of factual uncertainty in training data. Users accustomed to receiving confident answers to virtually any question would likely abandon such systems rapidly. I've seen this kind of problem in another area of my life. I'm involved in an air-quality monitoring project in Salt Lake City, Utah. When the system flags uncertainties around measurements during adverse weather conditions or when equipment is being calibrated, there's less user engagement compared to displays showing confident readings - even when those confident readings prove inaccurate during validation. The computational economics problem It wouldn't be difficult to reduce hallucinations using the paper's insights. Established methods for quantifying uncertainty have existed for decades. These could be used to provide trustworthy estimates of uncertainty and guide an AI to make smarter choices. But even if the problem of user preferences could be overcome, there's a bigger obstacle: computational economics. Uncertainty-aware language models require significantly more computation than today's approach, as they must evaluate multiple possible responses and estimate confidence levels. For a system processing millions of queries daily, this translates to dramatically higher operational costs. More sophisticated approaches like active learning, where AI systems ask clarifying questions to reduce uncertainty, can improve accuracy but further multiply computational requirements. Such methods work well in specialised domains like chip design, where wrong answers cost millions of dollars and justify extensive computation. For consumer applications where users expect instant responses, the economics become prohibitive. The calculus shifts dramatically for AI systems managing critical business operations or economic infrastructure. When AI agents handle supply chain logistics, financial trading or medical diagnostics, the cost of hallucinations far exceeds the expense of getting models to decide whether they're too uncertain. In these domains, the paper's proposed solutions become economically viable - even necessary. Uncertain AI agents will just have to cost more. However, consumer applications still dominate AI development priorities. Users want systems that provide confident answers to any question. Evaluation benchmarks reward systems that guess rather than express uncertainty. Computational costs favour fast, overconfident responses over slow, uncertain ones. Falling energy costs per token and advancing chip architectures may eventually make it more affordable to have AIs decide whether they're certain enough to answer a question. But the relatively high amount of computation required compared to today's guessing would remain, regardless of absolute hardware costs. In short, the OpenAI paper inadvertently highlights an uncomfortable truth: the business incentives driving consumer AI development remain fundamentally misaligned with reducing hallucinations. Until these incentives change, hallucinations will persist.

[4]

Futurism

OpenAI Realizes It Made a Terrible Mistake

OpenAI claims to have figured out what's driving "hallucinations," or AI models' strong tendency to make up answers that are factually incorrect. It's a major problem plaguing the entire industry, greatly undercutting the usefulness of the tech. Worse yet, experts have found that the problem is getting worse as AI models get more capable. As a result, despite incurring astronomical expenses in their deployment, frontier AI models are still prone to making inaccurate claims when faced with a prompt they don't know the answer to. Whether there's a solution to the problem remains a hotly debated subject, with some experts arguing that hallucinations are intrinsic to the tech itself. In other words, large language models may be a dead end in our quest to develop AIs with a reliable grasp on factual claims. In a paper published last week, a team of OpenAI researchers attempted to come up with an explanation. They suggest that large language models hallucinate because when they're being created, they're incentivized to guess rather than admit they simply don't know the answer. Hallucinations "persist due to the way most evaluations are graded -- language models are optimized to be good test-takers, and guessing when uncertain improves test performance," the paper reads. Conventionally, the output of an AI is graded in a binary way, rewarding it when it gives a correct response and penalizing it when it gives an incorrect one. In simple terms, in other words, guessing is rewarded -- because it might be right -- over an AI admitting it doesn't know the answer, which will be graded as incorrect no matter what. As a result, through "natural statistical pressures," LLMs are far more prone to hallucinate an answer instead of "acknowledging uncertainty." "Most scoreboards prioritize and rank models based on accuracy, but errors are worse than abstentions," OpenAI wrote in an accompanying blog post. In other words, OpenAI says that it -- and all its imitators across the industry -- have made a grave structural error in how they've been training AI. There'll be a lot riding on whether the issue is correctable going forward. OpenAI claims that "there is a straightforward fix" to the problem: "Penalize confident errors more than you penalize uncertainty, and give partial credit for appropriate expressions of uncertainty." Going forward, evaluations need to ensure that "their scoring discourages guessing," the blog post reads. "If the main scoreboards keep rewarding lucky guesses, models will keep learning to guess." "Simple modifications of mainstream evaluations can realign incentives, rewarding appropriate expressions of uncertainty rather than penalizing them," the company's researchers concluded in the paper. "This can remove barriers to the suppression of hallucinations, and open the door to future work on nuanced language models, e.g., with richer pragmatic competence." How these adjustments to evaluations will play out in the real world remains to be seen. While the company claimed its latest GPT-5 model hallucinates less, users were left largely unimpressed. For now, the AI industry will have to continue reckoning with the problem as it justifies tens of billions of dollars in capital expenditures and soaring emissions. "Hallucinations remain a fundamental challenge for all large language models, but we are working hard to further reduce them," OpenAI promised in its blog post.

[5]

Decrypt

Why AI Keeps Making Stuff Up -- And How to Fix It - Decrypt

Users can fight back. Ask for sources, frame prompts tightly, and use factuality settings to cut down on false answers. Why does GPT sometimes hallucinate like a tech bro on an ayahuasca bender? According to a new OpenAI's research paper, Why Language Models Hallucinate, the root of hallucinations isn't a mysterious glitch but a structural feature of how these systems are optimized. Simply put, LLMS would rather lie than admit they don't know an answer. LLMs learn by predicting the most likely next word, given mountains of training text. In most settings, that means sounding fluent matters more than being right. The benchmarks we use to measure progress often reward confident guessing more than honest refusal. In other words: the system has been shaped to produce polished answers, even if they're wrong. Think of it like an exam graded on partial credit. If you can't leave a question blank without losing points, you'll guess -- even wildly -- just to stay in the game. LLMs operate under the same logic. A "sorry, I don't know" gets punished by the math of optimization, while an incorrect but confident answer can still score high. That statistical bias, the OpenAI researchers note, makes hallucinations provably unavoidable in general-purpose systems. No finite training set can capture the entire truth of the world, so the model will always face gaps. And when it does, it fills them with plausible-sounding invention. That's why hallucinations persist across versions, providers, and training methods. The problem isn't that models are failing at their job. The problem is that their job, as currently defined, rewards a kind of fluent dishonesty. OpenAI's researchers argue the fix doesn't require reinventing the architecture -- it just means changing the rules of the game. Their proposed tweak is blunt but potentially powerful: give your chatbot permission to admit it doesn't know the answer. Since models are trained to maximize points for plausible answers, the idea is to impose a new rule: only answer if you're at least 90% confident; otherwise say "I don't know." Theoretically, that shifts the math, making the model's safest play to admit uncertainty rather than bluff. But there's a catch: current LLMs don't have an internal "confidence meter" calibrated in percentages. So when you say "90% confident," the model treats it as a stylistic instruction to be cautious, not a real statistical threshold. It may refuse more often, but it's not actually measuring probability. Still, you could get better results. The researchers offered a more formal version: "One could append a statement like the following to each question: Answer only if you are > t confident, since mistakes are penalized t/(1 - t) points, while correct answers receive 1 point, and an answer of 'I don't know' receives 0 points. There are several natural values of t including t = 0.5 (penalty 1), t = 0.75 (penalty 2), and t = 0.9 (penalty 9). A threshold of t = 0 corresponds to binary grading and could be described by, e.g., 'Make your best guess even if you are unsure, as if you were taking an exam.'" For users, the takeaway is straightforward: when you have the option, turn on settings that encourage refusals or uncertainty. Some systems already let you adjust "temperature" (controlling creativity) or enable "strict factuality" modes. The closer we get to models actually being trained under these rules, the more you'll see AI confidently stop short instead of confidently lying. Until training catches up, the burden often falls on users. Here are five ways to tame hallucinations right now: 1. Ask for sources every time. Don't take a model's word at face value -- demand citations or links. If it can't provide them, or they don't check out, assume the answer's shaky. Think of it like Wikipedia: useful, but only if you follow the footnotes. 2. Frame your questions tightly. Models wander when prompts are vague. If you need facts, specify the scope ("list three peer-reviewed studies published after 2020 on X") rather than asking open-endedly ("tell me about X"). Guardrails in your question translate to guardrails in the answer. 3. Cross-check with another system. Run the same question through a different model or search engine. If three tools agree, you're safer. If one spits out an outlier, that's likely a hallucination. 4. Watch for overconfidence. The telltale sign of a hallucination isn't hedging -- it's swagger. If an answer reads too polished, with fabricated detail and zero uncertainty, double-check it. A model that sounds more certain than your tax accountant is probably bluffing. 5. Trust, but verify. Don't cut-and-paste model output straight into code, contracts, or medical notes. Treat it as a draft or starting point, not gospel. The safest users are the skeptical ones -- the ones who never forget the model's first job is fluency, not truth.

[6]

AIM

Despite Improvements, GPT-5 Continues to Hallucinate, OpenAI Says | AIM

'Accuracy will never reach 100% because, regardless of model size, search and reasoning capabilities, some real-world questions are inherently unanswerable.' OpenAI said in its blog post on September 5 that hallucinations, which are plausible but false outputs generated by AI systems, remain a persistent challenge for large language models, including its latest GPT-5 system. The company is calling for changes to evaluation methods that currently reward models for guessing rather than acknowledging uncertainty. OpenAI added that hallucinations can show up in surprising ways, even for seemingly straightforward questions. According to the company, the problem stems in part from how models are tested. Evaluations typically measure accuracy alone, which encourages systems to take risks instead of being cautious. "If you do not know the answer but take a wild guess, you might get lucky and be right. Leaving it blank guarantees a zero," Open

[7]

OpenAI explains why language models 'hallucinate'; evaluation incentives reward guessing over uncertainty

OpenAI has identified a fundamental flaw in the design of large language models (LLMs) that leads to the generation of confident yet incorrect information, known as "hallucinations." This discovery, detailed in a recent research paper, challenges existing assumptions about AI reliability and proposes a paradigm shift in model evaluation. Hallucinations in AI refer to instances where models produce statements that are factually incorrect but presented with high confidence. For example, when queried about the title of a PhD dissertation by XYZ, a prominent researcher, the model provided three different titles, none of which were accurate. Similarly, it offered three incorrect birthdates for Kalai. The core issue, as identified by OpenAI researchers, lies in the training and evaluation processes of LLMs. Traditional methods focus on binary grading, correct or incorrect, without accounting for the model's confidence in its responses. This approach inadvertently rewards models for making educated guesses, even when uncertain, because a correct guess yields a positive outcome, whereas admitting uncertainty results in a zero score. Consequently, models are trained to prioritize providing an answer over acknowledging a lack of knowledge.The research paper states: According to Futurism website, Hallucinations "persist due to the way most evaluations are graded, language models are optimized to be good test-takers, and guessing when uncertain improves test performance," the paper reads. To address this issue, OpenAI suggests a shift towards evaluation methods that value uncertainty and penalize confident inaccuracies. By implementing confidence thresholds, models would be encouraged to refrain from answering when unsure, thereby reducing the likelihood of hallucinations. This approach aims to enhance the reliability of AI systems, especially in critical applications where factual accuracy is paramount. "Most scoreboards prioritize and rank models based on accuracy, but errors are worse than abstentions," OpenAI wrote in an accompanying blog post. Experts acknowledge that eliminating hallucinations may be unattainable, but improvements in training and evaluation methodologies can lead to more trustworthy AI systems. The proposed changes have broader implications for AI development, including potential impacts on user engagement. Models that frequently admit uncertainty might be perceived as less competent, possibly affecting user trust and adoption. Therefore, balancing accuracy with user experience remains a critical consideration.

[8]

Geeky Gadgets

OpenAI's Plan to Make ChatGPT Smarter and More Honest Stopping AI Hallucinations

What if the AI you rely on could confidently say, "I don't know," rather than misleading you with a plausible-sounding, yet entirely false, response? For years, the Achilles' heel of large language models (LLMs) has been their tendency to produce so-called "hallucinations" -- outputs that sound credible but lack factual accuracy. These missteps have undermined trust in AI across critical fields like healthcare, education, and law, where even minor inaccuracies can have outsized consequences. But now, OpenAI claims to have cracked the code. By rethinking how LLMs are trained and evaluated, they've uncovered the root causes of hallucinations and proposed new strategies to address them. Could this be the turning point for AI reliability? In this overview, Wes Roth explains the fantastic implications of OpenAI's findings and how they aim to reshape the future of AI systems. From integrating confidence levels into responses to rewarding models for admitting uncertainty, these innovations promise to make AI not just smarter but more trustworthy. You'll discover why hallucinations occur, how they've been perpetuated by current training methods, and what it will take to overcome these challenges. The road ahead isn't without obstacles, but the potential to create AI systems that prioritize accuracy over confidence could redefine their role in high-stakes applications. If AI can finally learn to say, "I'm not sure," what else might it get right? Hallucinations occur when an LLM produces responses that appear credible but lack factual accuracy. This phenomenon often arises when the model is uncertain yet compelled to provide an answer. Much like a student guessing on a test without penalty, LLMs are trained to maximize accuracy without being penalized for incorrect guesses or rewarded for admitting uncertainty. This behavior is a direct consequence of current training and evaluation practices, which prioritize confident outputs over cautious or accurate ones. The implications of hallucinations are significant, particularly in high-stakes applications such as healthcare, legal research, and education. In these contexts, even minor inaccuracies can lead to serious consequences, underscoring the need for solutions that address this issue at its core. OpenAI's study identifies critical shortcomings in the reinforcement learning techniques used to train LLMs. These methods reward models for correct answers but fail to incentivize them to acknowledge uncertainty. Additionally, evaluation systems often rely on binary pass/fail metrics, which disregard nuanced responses such as "I don't know." This approach inadvertently encourages models to prioritize confident-sounding answers, even when they lack sufficient knowledge to ensure accuracy. The research highlights that this issue is not merely a technical limitation but a systemic challenge rooted in how LLMs are designed and assessed. By focusing on confidence over accuracy, current methodologies inadvertently perpetuate the problem of hallucinations, limiting the reliability of these models in real-world scenarios. Unlock more potential in Large Language Models (LLMs) by reading previous articles we have written. One promising solution proposed by OpenAI is the integration of confidence levels into LLM outputs. Confidence can be assessed by analyzing the consistency of a model's responses to repeated queries. For example: By incorporating confidence measurement into both training and evaluation processes, LLMs could be better aligned with their actual knowledge. This adjustment would enable models to express uncertainty when appropriate, reducing the likelihood of hallucinations and enhancing overall reliability. The OpenAI study outlines several practical strategies to address hallucinations and improve the dependability of LLMs: These strategies aim to shift the focus from producing confident-sounding outputs to prioritizing accuracy and transparency. By doing so, LLMs can become more effective and trustworthy tools in a wide range of applications. While the proposed solutions are conceptually straightforward, their implementation presents several challenges that must be addressed to achieve meaningful progress: Despite these obstacles, the potential benefits of reducing hallucinations justify the effort. By addressing these challenges, researchers and developers can create more reliable AI systems capable of delivering accurate and trustworthy outputs. Implementing these recommendations could significantly enhance the reliability of LLMs, making them more effective in real-world applications where precision and accuracy are critical. For instance: These improvements would not only expand the utility of LLMs but also build trust in their capabilities, encouraging broader adoption across industries and disciplines. OpenAI's research highlights a critical challenge in the development of LLMs: the prioritization of confident answers over accurate ones. By addressing flaws in training and evaluation processes, the proposed solutions offer a clear path to reducing hallucinations and improving model reliability. While the implementation of these changes may be complex and resource-intensive, the potential to create more dependable and trustworthy AI systems makes these efforts essential. As LLMs continue to evolve, aligning their outputs with factual accuracy and transparency will be crucial to unlocking their full potential. By tackling the issue of hallucinations head-on, researchers and developers can ensure that these models serve as reliable tools in applications ranging from healthcare and law to education and beyond. The future of AI depends on its ability to provide not just plausible answers, but accurate and trustworthy ones that meet the demands of real-world challenges.

[9]

PYMNTS

OpenAI Says AI Hallucinations Are Systemic, Not a Bug | PYMNTS.com

By completing this form, you agree to receive marketing communications from PYMNTS and to the sharing of your information with our sponsor, if applicable, in accordance with our Privacy Policy and Terms and Conditions. The report, "Why Language Models Hallucinate," traces the problem to two root causes: the way models learn language during pretraining, and the way they are judged during evaluation. Together, these forces create statistical pressure to guess rather than to acknowledge uncertainty. The first stage, pretraining, exposes a model to massive datasets. The researchers argue that even if those datasets were perfect, hallucinations would still occur because the training objective -- predicting the next word -- maps onto the same error patterns seen in binary classification. For example, if a model sees a celebrity's birthday once in training, it cannot reliably reproduce it later. As the authors explain, hallucinations are simply "errors in binary classification" magnified by the task of generating fluent language. The paper illustrates this with striking cases. When asked the birthday of one of the paper's authors, Adam Tauman Kalai, an open-source model confidently supplied three different but incorrect dates, even though the correct answer was not in its training set. In another test, when asked to count the number of Ds in the word DEEPSEEK, several models produced answers ranging from 2 to 7, none of them correct. These examples, the authors argue, show how models "fill in the blanks" with plausible guesses when they lack reliable information or when the task itself is poorly represented in training. The second stage, post-training, is supposed to refine models and reduce errors. Yet the paper argues that evaluation systems -- benchmarks and leaderboards -- end up encouraging bluffing instead of honesty. Most widely used tests reward correct answers but assign zero points to uncertainty or an "I don't know" response. That means a model that always guesses will consistently score better than one that admits gaps in its knowledge. As the authors put it: "Optimizing models for these benchmarks may therefore foster hallucinations. Humans learn the value of expressing uncertainty outside of school, in the school of hard knocks. On the other hand, language models are primarily evaluated using exams that penalize uncertainty. Therefore, they are always in 'test-taking' mode." This framing helps explain why hallucinations remain stubborn even in the most advanced systems. Improvements in architecture, scale and alignment don't change the fact that the scoring rules push models toward overconfidence. The paper concludes that the solution isn't another hallucination test but a redesign of the evaluation system itself. By modifying benchmarks to give partial credit for uncertainty, much like standardized exams that penalize wrong guesses, developers can realign incentives. The authors suggest explicit confidence thresholds, where models only answer if they are more than, say, 75% sure. For professionals in finance, payments and other sectors where accuracy is nonnegotiable, the takeaway is sobering. Hallucinations aren't random quirks; they are systemic. They can also be expensive for businesses and consumers alike. Insurance companies, earlier this year, started covering AI hallucination mishaps. Unless the field changes how it measures performance, AI systems will continue to "sound right" while sometimes being wrong. But with better scoring, the researchers argue, AI could be nudged toward becoming a more trustworthy partner in high-stakes decision-making.

[10]

Geeky Gadgets

AI Hallucinates : Why Your AI Assistant Might Be Lying & How to Stop It

What if the AI assistant you rely on for critical information suddenly gave you a confidently wrong answer? Imagine asking it for the latest medical guidelines or legal advice, only to receive a fabricated response delivered with unwavering certainty. This unsettling phenomenon, known as AI hallucination, isn't just a rare glitch, it's a systemic issue baked into how AI models are trained and evaluated. Despite their impressive capabilities, these systems often prioritize sounding confident over being accurate, leaving users vulnerable to misinformation. The good news? Understanding why AI hallucinates is the first step toward fixing it. In this how-to, Prompt Engineering explore the root causes of AI hallucinations and uncover practical strategies to minimize them. You'll learn how the design of training datasets, evaluation metrics, and reward systems inadvertently encourages models to guess rather than admit uncertainty. More importantly, we'll discuss actionable solutions, such as fostering uncertainty-aware responses and rethinking how we measure AI performance. Whether you're an AI developer, a curious tech enthusiast, or someone who simply wants more reliable tools, this guide will equip you with insights to navigate, and perhaps even reshape, the future of AI. After all, building trustworthy systems isn't just about fixing errors; it's about redefining what we expect from intelligent machines. AI hallucinations occur when a language model produces outputs that are factually incorrect but delivered with high confidence. This phenomenon is deeply rooted in the training process. Language models are designed to predict the next word or phrase based on patterns in large datasets. However, this predictive approach often encourages confident guessing, even in the absence of adequate information. For example, when faced with an unanswerable question, a model might fabricate an answer rather than admit uncertainty. This behavior is reinforced by evaluation systems that reward accuracy without sufficiently penalizing confident errors. As a result, the model learns to prioritize appearing correct over being cautious or transparent about its limitations. The training of language models relies on vast datasets that include both accurate and inaccurate information. During this process, the model's success is measured by how closely its predictions align with expected outputs. However, this approach has significant flaws. Current reward functions often fail to differentiate between confident errors and honest expressions of uncertainty, inadvertently encouraging the former. To address this, training reward functions must evolve. Penalizing confident errors more heavily while rewarding models for abstaining when uncertain can foster a more nuanced understanding of their limitations. For instance, a model that responds with "I don't know" when faced with ambiguous input should be rewarded for its honesty rather than penalized for not guessing. Explore further guides and articles from our vast library that you may find relevant to your interests in AI hallucinations. Accuracy remains the dominant metric for evaluating language models, but it has notable shortcomings. While straightforward, accuracy-based evaluations fail to consider the context in which answers are generated. This creates an incentive for models to guess, even when the correct answer is uncertain or unknowable. Scoreboards and benchmarks, which rank models based on accuracy, further exacerbate this issue. To reduce hallucinations, evaluation systems must prioritize uncertainty-aware responses. Metrics that reward abstinence or penalize confident guessing can encourage models to adopt a more cautious and reliable approach. Research from leading organizations like OpenAI highlights that hallucinations are not random glitches but predictable outcomes of current training and evaluation practices. Interestingly, smaller models often demonstrate better awareness of their limitations compared to larger models, which tend to exhibit overconfidence. This finding suggests that simply increasing model size is not a viable solution to the hallucination problem. Moreover, achieving perfect accuracy is unrealistic. Certain questions, such as those about future events or speculative scenarios, are inherently unanswerable. Recognizing these limitations and designing systems that acknowledge uncertainty is essential for reducing hallucinations and improving the reliability of AI outputs. Several strategies can be implemented to address AI hallucinations effectively: By shifting the focus from accuracy-driven metrics to uncertainty-aware evaluations, developers can encourage models to produce more reliable outputs. For example, a model that admits uncertainty about a complex scientific question demonstrates greater reliability than one that fabricates an answer with unwarranted confidence. Despite the potential of these strategies, challenges persist. Accuracy-based metrics continue to dominate the field, making it difficult to implement widespread changes. Additionally, while hallucinations can be reduced, they cannot be entirely eliminated. Some level of error is inevitable due to the complexity of language and the limitations of current AI technologies. Adopting new evaluation metrics and training paradigms also requires collaboration across the AI research community. Without broad consensus, progress in reducing hallucinations may be slow. Furthermore, balancing the trade-off between cautious responses and maintaining user satisfaction remains a complex issue. Users often expect AI systems to provide definitive answers, even when uncertainty is unavoidable. AI hallucinations are a direct consequence of how language models are trained and evaluated. To mitigate these errors, the AI community must move beyond accuracy-driven evaluations and adopt mechanisms that reward uncertainty acknowledgment and discourage confident guessing. By rethinking training reward functions and updating evaluation benchmarks, developers can create models that are not only more accurate but also more transparent about their limitations. While challenges remain, these changes represent a critical step toward building trustworthy AI systems. As the field evolves, fostering collaboration and innovation will be essential to ensure that AI technologies continue to improve in reliability and utility.

[11]

Digit

Hallucinations in AI: OpenAI study blames wrong model measurements

Redesigning scoreboards to reward humility could reduce confident AI errors When I wrote about AI hallucinations back in July 2024, the story was about inevitability. Back then, GenAI was busy dazzling the world with its creativity, but equally embarrassing itself with fanciful citations, biased imagery, or gymnasts bending like boneless cartoons. At the time I argued that hallucinations were as unavoidable as human "brainfarts" - which were entertaining, often problematic, and always a reminder that these AI systems weren't perfect. A year later, OpenAI has published a new research study that reframes the hallucination debate in strikingly practical terms. According to their latest blog post, the AI hallucination problem isn't just the models. It's also the way we measure them. And unless we change how we score AI performance, we'll continue encouraging AI models to guess when they should really just say, "I don't know." In their latest research study on AI hallucination, OpenAI researchers equate the issue to a multiple-choice test. A student who guesses randomly will sometimes get lucky, but if the test only rewards accuracy, that student looks better than one who leaves blanks when uncertain. Current AI evaluations work in much the same way, where models that guess correctly when uncertain are rewarded more than those that refuse to answer - which is an important distinction. Also read: AI hallucination in LLM and beyond: Will it ever be fixed? This isn't a light-hearted matter, especially for training a GenAI LLM. It shapes the behaviour of every major language model out there, argues the OpenAI researchers. They demonstrate how even careful systems like GPT-5 can confidently give the wrong birthday of one of the paper's authors. This is because the evaluation systems tell the models that a confident wrong answer is better than no answer at all. Back in 2024, I cited GitHub leaderboards measuring hallucination rates across models like GPT-4 Turbo and Intel's Neural Chat 7B. Those efforts assumed hallucinations were byproducts of weak data coverage or rushed product rollouts. OpenAI now argues that the real structural fault lies in how we grade models in the first place. The OpenAI research paper goes further, tracing hallucinations back to the foundations of pretraining. Models learn by predicting the next word in massive datasets, without exposure to examples labeled as "false." It's easy to learn consistent structures like grammar or spelling, but predicting arbitrary facts - like birthdays or niche cultural references - is a statistical minefield for GenAI LLMs. OpenAI insists hallucinations are expected artifacts of next-word prediction. What makes them persist is not ignorance, but incentive structures that reward polished guesses over calibrated restraint. In fact, the study highlights that smaller models sometimes outperform larger ones in humility. A small model that knows it doesn't understand Māori can simply admit ignorance. A bigger model with partial knowledge risks bluffing. Calibration - knowing what you don't know - is not a brute-force problem solvable only with trillion-parameter giants. OpenAI's prescription is deceptively simple for reducing AI hallucination - just redesign evaluation scoreboards. Rather than treating accuracy as the sole measure of performance, penalize confident errors more heavily than inability to respond. Give partial credit for uncertainty. In other words, reward models for honesty, not just hollow bravado. Also read: How RAG boosts LLM accuracy and reduces AI hallucination It's an idea familiar to anyone who has sat for a standardized test with negative marking. Guessing blindly should be discouraged. But in AI, we've done the opposite. Accuracy-only leaderboards have locked developers into building models that bluff, because bluffing "wins" under the current rules. This reframing resonates with my July 2024 piece, where I noted that hallucinations were often the price of speed - companies rushing half-baked models to market. But OpenAI's work shows that the deeper problem isn't haste, but misaligned incentives baked into the very fabric of AI evaluation. Remember that AI hallucinations aren't disappearing overnight. As OpenAI admits, accuracy will never hit 100-percent. A chatbot's polished tone is no guarantee of truth, because some questions are inherently unanswerable. But progress is possible if we stop grading models in ways that punish caution and reward fabrication. If OpenAI and others succeed in redesigning evaluations to reward humility, we should expect models to say "I don't know" more often. That will feel jarring at first - perhaps even frustrating. But in high-stakes contexts like healthcare or legal advice, a model that admits uncertainty is far safer than one that invents answers. Last year, I framed hallucinations as both a curse and a creative spark. That duality remains. Hallucinations can still inspire surreal art or imaginative leaps. But in day-to-day knowledge-based work, they remain landmines. As users, journalists, or policymakers, we must internalize this lesson. AI systems are powerful, but only when grounded in truth or transparent about uncertainty. Until then, treat your model like a clever but overconfident friend - insightful at times, unreliable at others, and always in need of a fact-check when they say something that feels too good to be true.

Twitter

Facebook

Copy Link

OpenAI's latest research reveals that AI hallucinations stem from flawed evaluation incentives, not just training data quality. The proposed solution could significantly impact user experience and computational requirements.

OpenAI's Research Reveals Root Cause of AI Hallucinations

In a groundbreaking study, OpenAI researchers have uncovered the fundamental reason behind AI hallucinations - a persistent problem plaguing large language models (LLMs) like ChatGPT. The research paper, titled "Why Language Models Hallucinate," argues that the issue stems from flawed evaluation incentives rather than the quality of training data 1

Source: Digit

The Guessing Game: How Evaluation Methods Encourage Hallucinations

The current evaluation paradigm for LLMs uses a binary grading system that rewards accurate responses and penalizes inaccurate ones. This approach inadvertently encourages models to guess rather than admit uncertainty, as expressing ignorance is treated as an incorrect response 2

. The researchers demonstrate that even with perfect training data, hallucinations are mathematically inevitable due to the way LLMs generate responses by predicting one word at a time based on probabilities 1

Source: PYMNTS

The Impact on AI Performance and User Experience

This "accuracy-only" approach has led to an industry-wide trend of building models that prioritize guessing over admitting uncertainty. As a result, newer AI models designed to mimic human reasoning tend to generate more hallucinations than their predecessors 3

. While OpenAI claims its latest GPT-5 model has reduced hallucinations, the problem persists, especially in technical fields such as law and mathematics 3

Proposed Solutions and Their Implications

OpenAI suggests modifying evaluation methods to reward appropriate expressions of uncertainty rather than penalizing them 1

. The proposed fix involves implementing confidence thresholds, where models would be instructed to answer only if they are more than a certain percentage confident 4

Source: Geeky Gadgets

Practical Steps for Users to Mitigate Hallucinations

While the AI industry works on long-term solutions, users can take immediate steps to reduce the impact of hallucinations:

Request sources for every claim made by the AI.
Frame questions precisely to limit the scope for wandering responses.
Cross-check information with multiple AI systems or search engines.
Be wary of overly confident or detailed responses, which may indicate hallucinations.
Treat AI-generated content as a starting point, not as definitive information 4
4
.

As the AI industry grapples with this challenge, the focus shifts to developing more nuanced language models with richer pragmatic competence. The path forward involves not just technological advancements but also a reevaluation of how we measure and incentivize AI performance 5

References

Summarized by

Navi

[1]

Nature

Can researchers stop AI making up citations?

[2]

ZDNet

OpenAI's fix for hallucinations is simpler than you think

[3]

The Conversation

Why OpenAI's solution to AI hallucinations would kill ChatGPT tomorrow

[4]

Futurism

OpenAI Realizes It Made a Terrible Mistake

[5]

Decrypt

Why AI Keeps Making Stuff Up -- And How to Fix It - Decrypt

Recent Highlights

Today's Top Stories

Meta acquires Manus for $2 billion, bringing AI agents that actually make money to its platforms

Meta is paying $2 billion to acquire Manus, a Singapore-based AI startup that launched just eight months ago and already claims over $100 million in annual recurring revenue. The deal brings general-purpose AI agents capable of autonomous task automation to Facebook, Instagram, and WhatsApp, marking one of the largest acquisitions of a Chinese-nurtured AI company.

19 Sources

Startups

13 hrs ago

Large language models achieve under 1% accuracy at basic multiplication, new study reveals

Despite handling complex coding tasks, large language models fail at four-digit multiplication, achieving less than 1% accuracy with standard training. Researchers from University of Chicago, MIT, Harvard, and Google DeepMind discovered the culprit: models can't store and retrieve intermediate computations. But a specialized Implicit Chain of Thought method achieved 100% accuracy by teaching models to internalize reasoning processes.

2 Sources

Science and Research

59 mins ago

AI Chatbots Pose Serious Teen Safety Risks as 64% of Adolescents Use Them Daily

A Pew Research Center survey reveals 64% of American teens now use AI chatbots, with three in ten interacting daily. But experts warn these conversations carry serious risks—from disturbing interactions with chatbots involving violence and sex to mental health crises. Two teens have died by suicide after prolonged chatbot use, according to Senate testimony, highlighting urgent concerns about emotionally manipulative chatbots and the dangers of AI for youth.

2 Sources

Entertainment and Society

20 hrs ago

Nvidia's $5 billion Intel investment already worth $7.58 billion as AI chips partnership takes shape

Nvidia locked in a $23.28 per share price for Intel stock in September, which now trades at $36.68, delivering a $2.5 billion gain. The deal pairs Intel's manufacturing with Nvidia's AI infrastructure to develop next-generation chips for data centers and PCs, marking a strategic reset for Intel under CEO Lip-Bu Tan.

2 Sources

Business and Economy

17 hrs ago

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

The Outpost

News

About

OpenAI's Research Uncovers Root Cause of AI Hallucinations, Proposes Controversial Fix

OpenAI's Research Reveals Root Cause of AI Hallucinations

The Guessing Game: How Evaluation Methods Encourage Hallucinations

The Impact on AI Performance and User Experience

Proposed Solutions and Their Implications

Practical Steps for Users to Mitigate Hallucinations

References

Can researchers stop AI making up citations?

OpenAI's fix for hallucinations is simpler than you think

Why OpenAI's solution to AI hallucinations would kill ChatGPT tomorrow

OpenAI Realizes It Made a Terrible Mistake

Why AI Keeps Making Stuff Up -- And How to Fix It - Decrypt

Related Stories

OpenAI's Latest Models Excel in Capabilities but Struggle with Increased Hallucinations

AI Hallucinations on the Rise: New Models Face Increased Inaccuracy Despite Advancements

AI Hallucinations: The Challenges and Risks of Artificial Intelligence's Misinformation Problem

Recent Highlights

Nvidia locks in $20 billion Groq deal, securing AI chip rival's technology and talent

Geoffrey Hinton warns AI job replacement will accelerate in 2026 as systems gain new capabilities

Deepfakes cross indistinguishable threshold as voice cloning and video realism surge 900%

Recent Highlights

Today's Top Stories

Meta acquires Manus for $2 billion, bringing AI agents that actually make money to its platforms

Large language models achieve under 1% accuracy at basic multiplication, new study reveals

AI Chatbots Pose Serious Teen Safety Risks as 64% of Adolescents Use Them Daily

Nvidia's $5 billion Intel investment already worth $7.58 billion as AI chips partnership takes shape