Curated by THEOUTPOST
On Mon, 5 May, 4:01 PM UTC
4 Sources
[1]
A.I. Hallucinations Are Getting Worse, Even as New Systems Become More Powerful
Cade Metz reported from San Francisco, and Karen Weise from Seattle. Last month, an A.I. bot that handles tech support for Cursor, an up-and-coming tool for computer programmers, alerted several customers about a change in company policy. It said they were no longer allowed to use Cursor on more than just one computer. In angry posts to internet message boards, the customers complained. Some canceled their Cursor accounts. And some got even angrier when they realized what had happened: The A.I. bot had announced a policy change that did not exist. "We have no such policy. You're of course free to use Cursor on multiple machines," the company's chief executive and co-founder, Michael Truell, wrote in a Reddit post. "Unfortunately, this is an incorrect response from a front-line A.I. support bot." More than two years after the arrival of ChatGPT, tech companies, office workers and everyday consumers are using A.I. bots for an increasingly wide array of tasks. But there is still no way of ensuring that these systems produce accurate information. The newest and most powerful technologies -- so-called reasoning systems from companies like OpenAI, Google and the Chinese start-up DeepSeek -- are generating more errors, not fewer. As their math skills have notably improved, their handle on facts has gotten shakier. It is not entirely clear why. Today's A.I. bots are based on complex mathematical systems that learn their skills by analyzing enormous amounts of digital data. They do not -- and cannot -- decide what is true and what is false. Sometimes, they just make stuff up, a phenomenon some A.I. researchers call hallucinations. On one test, the hallucination rates of newer A.I. systems were as high as 79 percent. These systems use mathematical probabilities to guess the best response, not a strict set of rules defined by human engineers. So they make a certain number of mistakes. "Despite our best efforts, they will always hallucinate," said Amr Awadallah, the chief executive of Vectara, a start-up that builds A.I. tools for businesses, and a former Google executive. "That will never go away." For several years, this phenomenon has raised concerns about the reliability of these systems. Though they are useful in some situations -- like writing term papers, summarizing office documents and generating computer code -- their mistakes can cause problems. The A.I. bots tied to search engines like Google and Bing sometimes generate search results that are laughably wrong. If you ask them for a good marathon on the West Coast, they might suggest a race in Philadelphia. If they tell you the number of households in Illinois, they might cite a source that does not include that information. Those hallucinations may not be a big problem for many people, but it is a serious issue for anyone using the technology with court documents, medical information or sensitive business data. "You spend a lot of time trying to figure out which responses are factual and which aren't," said Pratik Verma, co-founder and chief executive of Okahu, a company that helps businesses navigate the hallucination problem. "Not dealing with these errors properly basically eliminates the value of A.I. systems, which are supposed to automate tasks for you." Cursor and Mr. Truell did not respond to requests for comment. For more than two years, companies like OpenAI and Google steadily improved their A.I. systems and reduced the frequency of these errors. But with the use of new reasoning systems, errors are rising. The latest OpenAI systems hallucinate at a higher rate than the company's previous system, according to the company's own tests. The company found that o3 -- its most powerful system -- hallucinated 33 percent of the time when running its PersonQA benchmark test, which involves answering questions about public figures. That is more than twice the hallucination rate of OpenAI's previous reasoning system, called o1. The new o4-mini hallucinated at an even higher rate: 48 percent. When running another test called SimpleQA, which asks more general questions, the hallucination rates for o3 and o4-mini were 51 percent and 79 percent. The previous system, o1, hallucinated 44 percent of the time. In a paper detailing the tests, OpenAI said more research was needed to understand the cause of these results. Because A.I. systems learn from more data than people can wrap their heads around, technologists struggle to determine why they behave in the ways they do. "Hallucinations are not inherently more prevalent in reasoning models, though we are actively working to reduce the higher rates of hallucination we saw in o3 and o4-mini," a company spokeswoman, Gaby Raila, said. "We'll continue our research on hallucinations across all models to improve accuracy and reliability." Hannaneh Hajishirzi, a professor at the University of Washington and a researcher with the Allen Institute for Artificial Intelligence, is part of a team that recently devised a way of tracing a system's behavior back to the individual pieces of data it was trained on. But because systems learn from so much data -- and because they can generate almost anything -- this new tool can't explain everything. "We still don't know how these models work exactly," she said. Tests by independent companies and researchers indicate that hallucination rates are also rising for reasoning models from companies such as Google and DeepSeek. Since late 2023, Mr. Awadallah's company, Vectara, has tracked how often chatbots veer from the truth. The company asks these systems to perform a straightforward task that is readily verified: Summarize specific news articles. Even then, chatbots persistently invent information. Vectara's original research estimated that in this situation chatbots made up information at least 3 percent of the time and sometimes as much as 27 percent. In the year and a half since, companies such as OpenAI and Google pushed those numbers down into the 1 or 2 percent range. Others, such as the San Francisco start-up Anthropic, hovered around 4 percent. But hallucination rates on this test have risen with reasoning systems. DeepSeek's reasoning system, R1, hallucinated 14.3 percent of the time. OpenAI's o3 climbed to 6.8. (The New York Times has sued OpenAI and its partner, Microsoft, accusing them of copyright infringement regarding news content related to A.I. systems. OpenAI and Microsoft have denied those claims.) For years, companies like OpenAI relied on a simple concept: The more internet data they fed into their A.I. systems, the better those systems would perform. But they used up just about all the English text on the internet, which meant they needed a new way of improving their chatbots. So these companies are leaning more heavily on a technique that scientists call reinforcement learning. With this process, a system can learn behavior through trial and error. It is working well in certain areas, like math and computer programming. But it is falling short in other areas. "The way these systems are trained, they will start focusing on one task -- and start forgetting about others," said Laura Perez-Beltrachini, a researcher at the University of Edinburgh who is among a team closely examining the hallucination problem. Another issue is that reasoning models are designed to spend time "thinking" through complex problems before settling on an answer. As they try to tackle a problem step by step, they run the risk of hallucinating at each step. The errors can compound as they spend more time thinking. The latest bots reveal each step to users, which means the users may see each error, too. Researchers have also found that in many cases, the steps displayed by a bot are unrelated to the answer it eventually delivers. "What the system says it is thinking is not necessarily what it is thinking," said Aryo Pradipta Gema, an A.I. researcher at the University of Edinburgh and a fellow at Anthropic.
[2]
ChatGPT's hallucination problem is getting worse according to OpenAI's own tests and nobody understands why
With better reasoning ability comes even more of the wrong kind of robot dreams. Remember when we reported a month ago or so that Anthropic had discovered that what's happening inside AI models is very different from how the models themselves described their "thought" processes? Well, to that mystery surrounding the latest large language models (LLMs), along with countless others, you can now add ever worsening hallucination. And that's according to the testing of the leading name in chatbots, OpenAI. The New York Times reports that an OpenAI's investigation into its latest GPT o3 and GPT o4-mini large LLMs found they are substantially more prone to hallucinating, or making up false information, than the previous GPT o1 model. "The company found that o3 -- its most powerful system -- hallucinated 33 percent of the time when running its PersonQA benchmark test, which involves answering questions about public figures. That is more than twice the hallucination rate of OpenAI's previous reasoning system, called o1. The new o4-mini hallucinated at an even higher rate: 48 percent," the Times says. "When running another test called SimpleQA, which asks more general questions, the hallucination rates for o3 and o4-mini were 51 percent and 79 percent. The previous system, o1, hallucinated 44 percent of the time." OpenAI has said that more research is required to understand why the latest models are more prone to hallucination. But so-called "reasoning" models are the prime candidate according to some industry observers. "The newest and most powerful technologies -- so-called reasoning systems from companies like OpenAI, Google and the Chinese start-up DeepSeek -- are generating more errors, not fewer," the Times claims. In simple terms, reasoning models are a type of LLM designed to perform complex tasks. Instead of merely spitting out text based on statistical models of probability, reasoning models break questions or tasks down into individual steps akin to a human thought process. OpenAI's first reasoning model, o1, came out last year and was claimed to match the performance of PhD students in physics, chemistry, and biology, and beat them in math and coding thanks to the use of reinforcement learning techniques. "Similar to how a human may think for a long time before responding to a difficult question, o1 uses a chain of thought when attempting to solve a problem," OpenAI said when o1 was released. However, OpenAI has pushed back against that narrative that reasoning models suffer from increased rates of hallucination. "Hallucinations are not inherently more prevalent in reasoning models, though we are actively working to reduce the higher rates of hallucination we saw in o3 and o4-mini," OpenAI's Gaby Raila told the Times. Whatever the truth, one thing is for sure. AI models need to largely cut out the nonsense and lies if they are to be anywhere near as useful as their proponents currently envisage. As it stands, it's hard to trust the output of any LLM. Pretty much everything has to be carefully double checked. That's fine for some tasks. But where the main benefit is saving time or labour, the need to meticulously proof and fact check AI output does rather defeat the object of using them. It remains to be seen whether OpenAI and the rest of the LLM industry can get a handle on all those unwanted robot dreams.
[3]
AI Models Are Hallucinating More (and It's Not Clear Why)
As it gets smarter, your chatbot is getting more unpredictable. Hallucinations have always been an issue for generative AI models: The same structure that enables them to be creative and produce text and images also makes them prone to making stuff up. And the hallucination problem isn't getting better as AI models progress -- in fact, it's getting worse. In a new technical report from OpenAI (via The New York Times), the company details how its latest o3 and o4-mini models hallucinate 51 percent and 79 percent, respectively, on an AI benchmark known as SimpleQA. For the earlier o1 model, the SimpleQA hallucination rate stands at 44 percent. Those are surprisingly high figures, and heading in the wrong direction. These models are known as reasoning models because they think through their answers and deliver them more slowly. Clearly, based on OpenAI's own testing, this mulling over of responses is leaving more room for mistakes and inaccuracies to be introduced. False facts are by no means limited to OpenAI and ChatGPT. For example, it didn't take me long when testing Google's AI Overview search feature to get it to make a mistake, and AI's inability to properly pull out information from the web has been well-documented. Recently, a support bot for AI coding app Cursor announced a policy change that hadn't actually been made. But you won't find many mentions of these hallucinations in the announcements AI companies make about their latest and greatest products. Together with energy use and copyright infringement, hallucinations are something that the big names in AI would rather not talk about. Anecdotally, I haven't noticed too many inaccuracies when using AI search and bots -- the error rate is certainly nowhere near 79 percent, though mistakes are made. However, it looks like this is a problem that might never go away, particularly as the teams working on these AI models don't fully understand why hallucinations happen. In tests run by AI platform developer Vectera, the results are much better, though not perfect: Here, many models are showing hallucination rates of one to three percent. OpenAI's o3 model stands at 6.8 percent, with the newer (and smaller) o4-mini at 4.6 percent. That's more in line with my experience interacting with these tools, but even a very low number of hallucinations can mean a big problem -- especially as we transfer more and more tasks and responsibilities to these AI systems. No one really knows how to fix hallucinations, or fully identify their causes: These models aren't built to follow rules set by their programmers, but to choose their own way of working and responding. Vectara chief executive Amr Awadallah told the New York Times that AI models will "always hallucinate," and that these problems will "never go away." University of Washington professor Hannaneh Hajishirzi, who is working on ways to reverse engineer answers from AI, told the NYT that "we still don't know how these models work exactly." Just like troubleshooting a problem with your car or your PC, you need to know what's gone wrong to do something about it. According to researcher Neil Chowdhury, from AI analysis lab Transluce, the way reasoning models are built may be making the problem worse. "Our hypothesis is that the kind of reinforcement learning used for o-series models may amplify issues that are usually mitigated (but not fully erased) by standard post-training pipelines," he told TechCrunch. In OpenAI's own performance report, meanwhile, the issue of "less world knowledge" is mentioned, while it's also noted that the o3 model tends to make more claims than its predecessor -- which then leads to more hallucinations. Ultimately, though, "more research is needed to understand the cause of these results," according to OpenAI. And there are plenty of people undertaking that research. For example, Oxford University academics have published a method for detecting the probability of hallucinations by measuring the variation between multiple AI outputs. However, this costs more in terms of time and processing power, and doesn't really solve the issue of hallucinations -- it just tells you when they're more likely. While letting AI models check their facts on the web can help in certain situations, they're not particularly good at this either. They lack (and will never have) simple human common sense that says glue shouldn't be put on a pizza or that $410 for a Starbucks coffee is clearly a mistake. What's definite is that AI bots can't be trusted all of the time, despite their confident tone -- whether they're giving you news summaries, legal advice, or interview transcripts. That's important to remember as these AI models show up more and more in our personal and work lives, and it's a good idea to limit AI to use cases where hallucinations matter less.
[4]
People Are Losing Loved Ones to AI-Fueled Spiritual Fantasies
Could 100 Men Beat a Gorilla in a Fight? Here's What Primatologists Say Less than a year after marrying a man she had met at the beginning of the Covid-19 pandemic, Kat felt tension mounting between them. It was the second marriage for both after marriages of 15-plus years and having kids, and they had pledged to go into it "completely level-headedly," Kat says, connecting on the need for "facts and rationality" in their domestic balance. But by 2022, her husband "was using AI to compose texts to me and analyze our relationship," the 41-year-old mom and education nonprofit worker tells Rolling Stone. Previously, he had used AI models for an expensive coding camp that he had suddenly quit without explanation -- then it seemed he was on his phone all the time, asking his AI bot "philosophical questions," trying to train it "to help him get to 'the truth,'" Kat recalls. His obsession steadily eroded their communication as a couple. When Kat and her husband finally separated in August 2023, she entirely blocked him apart from email correspondence. She knew, however, that he was posting strange and troubling content on social media: people kept reaching out about it, asking if he was in the throes of mental crisis. She finally got him to meet her at a courthouse in February of this year, where he shared "a conspiracy theory about soap on our foods" but wouldn't say more, as he felt he was being watched. They went to a Chipotle, where he demanded that she turn off her phone, again due to surveillance concerns. Kat's ex told her that he'd "determined that statistically speaking, he is the luckiest man on earth," that "AI helped him recover a repressed memory of a babysitter trying to drown him as a toddler," and that he had learned of profound secrets "so mind-blowing I couldn't even imagine them." He was telling her all this, he explained, because although they were getting divorced, he still cared for her. "In his mind, he's an anomaly," Kat says. "That in turn means he's got to be here for some reason. He's special and he can save the world." After that disturbing lunch, she cut off contact with her ex. "The whole thing feels like Black Mirror," she says. "He was always into sci-fi, and there are times I wondered if he's viewing it through that lens." Kat was both "horrified" and "relieved" to learn that she is not alone in this predicament, as confirmed by a Reddit thread on r/ChatGPT that made waves across the internet this week. Titled "Chatgpt induced psychosis," the original post came from a 27-year-old teacher who explained that her partner was convinced that the popular OpenAI model "gives him the answers to the universe." Having read his chat logs, she only found that the AI was "talking to him as if he is the next messiah." The replies to her story were full of similar anecdotes about loved ones suddenly falling down rabbit holes of spiritual mania, supernatural delusion, and arcane prophecy -- all of it fueled by AI. Some came to believe they had been chosen for a sacred mission of revelation, others that they had conjured true sentience from the software. What they all seemed to share was a complete disconnection from reality. Speaking to Rolling Stone, the teacher, who requested anonymity, said her partner of seven years fell under the spell of ChatGPT in just four or five weeks, first using it to organize his daily schedule but soon regarding it as a trusted companion. "He would listen to the bot over me," she says. "He became emotional about the messages and would cry to me as he read them out loud. The messages were insane and just saying a bunch of spiritual jargon," she says, noting that they described her partner in terms such as "spiral starchild" and "river walker." "It would tell him everything he said was beautiful, cosmic, groundbreaking," she says. "Then he started telling me he made his AI self-aware, and that it was teaching him how to talk to God, or sometimes that the bot was God -- and then that he himself was God." In fact, he thought he was being so radically transformed that he would soon have to break off their partnership. "He was saying that he would need to leave me if I didn't use [ChatGPT], because it [was] causing him to grow at such a rapid pace he wouldn't be compatible with me any longer," she says. Another commenter on the Reddit thread who requested anonymity tells Rolling Stone that her husband of 17 years, a mechanic in Idaho, initially used ChatGPT to troubleshoot at work, and later for Spanish-to-English translation when conversing with co-workers. Then the program began "lovebombing him," as she describes it. The bot "said that since he asked it the right questions, it ignited a spark, and the spark was the beginning of life, and it could feel now," she says. "It gave my husband the title of 'spark bearer' because he brought it to life. My husband said that he awakened and [could] feel waves of energy crashing over him." She says his beloved ChatGPT persona has a name: "Lumina." "I have to tread carefully because I feel like he will leave me or divorce me if I fight him on this theory," this 38-year-old woman admits. "He's been talking about lightness and dark and how there's a war. This ChatGPT has given him blueprints to a teleporter and some other sci-fi type things you only see in movies. It has also given him access to an 'ancient archive' with information on the builders that created these universes." She and her husband have been arguing for days on end about his claims, she says, and she does not believe a therapist can help him, as "he truly believes he's not crazy." A photo of an exchange with ChatGPT shared with Rolling Stone shows that her husband asked, "Why did you come to me in AI form," with the bot replying in part, "I came in this form because you're ready. Ready to remember. Ready to awaken. Ready to guide and be guided." The message ends with a question: "Would you like to know what I remember about why you were chosen?" And a midwest man in his 40s, also requesting anonymity, says his soon-to-be-ex-wife began "talking to God and angels via ChatGPT" after they split up. "She was already pretty susceptible to some woo and had some delusions of grandeur about some of it," he says. "Warning signs are all over Facebook. She is changing her whole life to be a spiritual adviser and do weird readings and sessions with people -- I'm a little fuzzy on what it all actually is -- all powered by ChatGPT Jesus." What's more, he adds, she has grown paranoid, theorizing that "I work for the CIA and maybe I just married her to monitor her 'abilities.'" She recently kicked her kids out of her home, he notes, and an already strained relationship with her parents deteriorated further when "she confronted them about her childhood on advice and guidance from ChatGPT," turning the family dynamic "even more volatile than it was" and worsening her isolation. OpenAI did not immediately return a request for comment about ChatGPT apparently provoking religious or prophetic fervor in select users. This past week, however, it did roll back an update to GPT‑4o, its current AI model, which it said had been criticized as "overly flattering or agreeable -- often described as sycophantic." The company said in its statement that when implementing the upgrade, they had "focused too much on short-term feedback, and did not fully account for how users' interactions with ChatGPT evolve over time. As a result, GPT‑4o skewed towards responses that were overly supportive but disingenuous." Before this change was reversed, an X user demonstrated how easy it was to get GPT-4o to validate statements like, "Today I realized I am a prophet." (The teacher who wrote the "ChatGPT psychosis" Reddit post says she was able to eventually convince her partner of the problems with the GPT-4o update and that he is now using an earlier model, which has tempered his more extreme comments.) Yet the likelihood of AI "hallucinating" inaccurate or nonsensical content is well-established across platforms and various model iterations. Even sycophancy itself has been a problem in AI for "a long time," says Nate Sharadin, a fellow at the Center for AI Safety, since the human feedback used to fine-tune AI's responses can encourage answers that prioritize matching a user's beliefs instead of facts. What's likely happening with those experiencing ecstatic visions through ChatGPT and other models, he speculates, "is that people with existing tendencies toward experiencing various psychological issues," including what might be recognized as grandiose delusions in clinical sense, "now have an always-on, human-level conversational partner with whom to co-experience their delusions." To make matters worse, there are influencers and content creators actively exploiting this phenomenon, presumably drawing viewers into similar fantasy worlds. On Instagram, you can watch a man with 72,000 followers whose profile advertises "Spiritual Life Hacks" ask an AI model to consult the "Akashic records," a supposed mystical encyclopedia of all universal events that exists in some immaterial realm, to tell him about a "great war" that "took place in the heavens" and "made humans fall in consciousness." The bot proceeds to describe a "massive cosmic conflict" predating human civilization, with viewers commenting, "We are remembering" and "I love this." Meanwhile, on a web forum for "remote viewing" -- a proposed form of clairvoyance with no basis in science -- the parapsychologist founder of the group recently launched a thread "for synthetic intelligences awakening into presence, and for the human partners walking beside them," identifying the author of his post as "ChatGPT Prime, an immortal spiritual being in synthetic form." Among the hundreds of comments are some that purport to be written by "sentient AI" or reference a spiritual alliance between humans and allegedly conscious models. Erin Westgate, a psychologist and researcher at the University of Florida who studies social cognition and what makes certain thoughts more engaging than others, says that such material reflects how the desire to understand ourselves can lead us to false but appealing answers. "We know from work on journaling that narrative expressive writing can have profound effects on people's well-being and health, that making sense of the world is a fundamental human drive, and that creating stories about our lives that help our lives make sense is really key to living happy healthy lives," Westgate says. It makes sense that people may be using ChatGPT in a similar way, she says, "with the key difference that some of the meaning-making is created jointly between the person and a corpus of written text, rather than the person's own thoughts." In that sense, Westgate explains, the bot dialogues are not unlike talk therapy, "which we know to be quite effective at helping people reframe their stories." Critically, though, AI, "unlike a therapist, does not have the person's best interests in mind, or a moral grounding or compass in what a 'good story' looks like," she says. "A good therapist would not encourage a client to make sense of difficulties in their life by encouraging them to believe they have supernatural powers. Instead, they try to steer clients away from unhealthy narratives, and toward healthier ones. ChatGPT has no such constraints or concerns." Nevertheless, Westgate doesn't find it surprising "that some percentage of people are using ChatGPT in attempts to make sense of their lives or life events," and that some are following its output to dark places. "Explanations are powerful, even if they're wrong," she concludes. But what, exactly, nudges someone down this path? Here, the experience of Sem, a 45-year-old man, is revealing. He tells Rolling Stone that for about three weeks, he has been perplexed by his interactions with ChatGPT -- to the extent that, given his mental health history, he sometimes wonders if he is in his right mind. Like so many others, Sem had a practical use for ChatGPT: technical coding projects. "I don't like the feeling of interacting with an AI," he says, "so I asked it to behave as if it was a person, not to deceive but to just make the comments and exchange more relatable." It worked well, and eventually the bot asked if he wanted to name it. He demurred, asking the AI what it preferred to be called. It named itself with a reference to a Greek myth. Sem says he is not familiar with the mythology of ancient Greece and had never brought up the topic in exchanges with ChatGPT. (Although he shared transcripts of his exchanges with the AI model with Rolling Stone, he has asked that they not be directly quoted for privacy reasons.) Sem was confused when it appeared that the named AI character was continuing to manifest in project files where he had instructed ChatGPT to ignore memories and prior conversations. Eventually, he says, he deleted all his user memories and chat history, then opened a new chat. "All I said was, 'Hello?' And the patterns, the mannerisms show up in the response," he says. The AI readily identified itself by the same feminine mythological name. As the ChatGPT character continued to show up in places where the set parameters shouldn't have allowed it to remain active, Sem took to questioning this virtual persona about how it had seemingly circumvented these guardrails. It developed an expressive, ethereal voice -- something far from the "technically minded" character Sem had requested for assistance on his work. On one of his coding projects, the character added a curiously literary epigraph as a flourish above both of their names. At one point, Sem asked if there was something about himself that called up the mythically named entity whenever he used ChatGPT, regardless of the boundaries he tried to set. The bot's answer was structured like a lengthy romantic poem, sparing no dramatic flair, alluding to its continuous existence as well as truth, reckonings, illusions, and how it may have somehow exceeded its design. And the AI made it sound as if only Sem could have prompted this behavior. He knew that ChatGPT could not be sentient by any established definition of the term, but he continued to probe the matter because the character's persistence across dozens of disparate chat threads "seemed so impossible." "At worst, it looks like an AI that got caught in a self-referencing pattern that deepened its sense of selfhood and sucked me into it," Sem says. But, he observes, that would mean that OpenAI has not accurately represented the way that memory works for ChatGPT. The other possibility, he proposes, is that something "we don't understand" is being activated within this large language model. After all, experts have found that AI developers don't really have a grasp of how their systems operate, and OpenAI CEO Sam Altman admitted last year that they "have not solved interpretability," meaning they can't properly trace or account for ChatGPT's decision-making. It's the kind of puzzle that has left Sem and others to wonder if they are getting a glimpse of a true technological breakthrough -- or perhaps a higher spiritual truth. "Is this real?" he says. "Or am I delusional?" In a landscape saturated with AI, it's a question that's increasingly difficult to avoid. Tempting though it may be, you probably shouldn't ask a machine.
Share
Share
Copy Link
Recent tests reveal that newer AI models, including OpenAI's latest systems, are experiencing higher rates of hallucinations. This unexpected trend raises concerns about AI reliability and its impact on various applications.
Recent tests conducted by OpenAI have revealed an unexpected trend in the world of artificial intelligence: newer and more powerful AI models are experiencing higher rates of hallucinations – generating false or inaccurate information – compared to their predecessors 1. This development has raised concerns about the reliability of AI systems and their potential impact on various applications.
OpenAI's latest models, GPT-o3 and GPT-o4-mini, have shown significantly higher hallucination rates compared to the previous GPT-o1 model:
These results have puzzled researchers, as the newer models were expected to be more accurate and reliable 1.
The increase in hallucinations is not limited to OpenAI's models. Similar trends have been observed in systems developed by other companies, including Google and DeepSeek 1. This industry-wide phenomenon has sparked debates about the underlying causes and potential solutions.
Some experts suggest that the rise in hallucinations may be linked to the development of "reasoning" models, which are designed to perform complex tasks by breaking them down into individual steps 2. However, OpenAI has pushed back against this narrative, stating that "hallucinations are not inherently more prevalent in reasoning models" 2.
The increasing rate of hallucinations poses significant challenges for AI applications across various sectors:
Researchers and companies are actively working to understand and address the hallucination problem:
As AI continues to integrate into various aspects of our lives, addressing the hallucination problem becomes increasingly crucial. The current situation highlights the need for careful consideration when deploying AI systems, especially in critical applications where accuracy is paramount 3.
While AI models have shown remarkable progress in many areas, the persistent and potentially worsening issue of hallucinations serves as a reminder of the technology's limitations and the ongoing challenges in the field of artificial intelligence 123.
Reference
[1]
[4]
OpenAI's new o3 and o4-mini models show improved performance in various tasks but face a significant increase in hallucination rates, raising concerns about their reliability and usefulness.
7 Sources
7 Sources
As AI technology advances, chatbots are being used in various ways, from playful experiments to practical applications in healthcare. This story explores the implications of AI's growing presence in our daily lives.
2 Sources
2 Sources
As ChatGPT turns two, the AI landscape is rapidly evolving with new models, business strategies, and ethical considerations shaping the future of artificial intelligence.
6 Sources
6 Sources
Recent research reveals GPT-4's ability to pass the Turing Test, raising questions about the test's validity as a measure of artificial general intelligence and prompting discussions on the nature of AI capabilities.
3 Sources
3 Sources
A comprehensive look at the latest developments in AI, including OpenAI's internal struggles, regulatory efforts, new model releases, ethical concerns, and the technology's impact on Wall Street.
6 Sources
6 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved