Curated by THEOUTPOST
On Thu, 13 Feb, 12:08 AM UTC
2 Sources
[1]
Even the Most Advanced AI Has a Problem: If It Doesn't Know the Answer, It Makes One Up
As artificial intelligence becomes integrated into our daily lives, researchers are working to tackle what might be its most glaring and enduring issue: that AI "hallucinates," or boldly spits out lies, when it doesn't know the answer. This rampant AI problem is, according to researchers who spoke to the Wall Street Journal, rooted in a reticence to be caught not knowing something. According to José Hernández-Orallo, a professor at Spain's Valencian Research Institute for Artificial Intelligence, hallucination comes down to the way AI models are trained. "The original reason why they hallucinate is because if you don't guess anything," Hernández-Orallo told the WSJ, "you don't have any chance of succeeding." To demonstrate the issue, WSJ writer Ben Fritz devised a simple test: asking multiple advanced AI models who he was married to, a question that is not easily Google-able. The columnist was given multiple bizarre answers -- a tennis influencer, a writer he'd never met, and an Iowan he'd never heard of -- none of whom were correct. When I tried it out for myself, the hallucinations were even stranger: Google's Gemini informed me that I was married to a Syrian artist named Ahmad Durak Sibai, who I'd never heard of before who appears to have passed away in the 1980s. Roi Cohen and Konstantin Dobler, a pair of doctoral candidates at Germany's Hasso Plattner Institut, posit in their recent research that the issue is simple: AI models, like most humans, are reluctant to say "I don't know" when asked a question whose answer lies outside of their training data. As a result, they make stuff up and confidently pass it off as fact. The Hasso Plattner researchers say they've devised a way to intervene early in the AI training process to teach models about the concept of uncertainty. Using their methodology, models not only can respond with an "IDK," but also seem to give more accurate answers when they do have the info. Like with humans, however, the models that Cohen and Dobler taught uncertainty sometimes responded with an IDK even when they did know -- the AI version of an insecure schoolchild who claims not to know an answer when called upon in class, even when they do. Despite that setback, the researchers are confident that their hack is worthwhile, especially in situations where accuracy is paramount. "It's about having useful systems to deploy," Dobler said, "even if they're not superintelligent." Already, companies like Anthropic are injecting uncertainty into their chatbots. As the WSJ writer Fritz noted, Anthropic's Claude was the only one to admit it didn't know the answer to the question. (and when I tested that question out on Claude, the chatbot declined to answer and warned that there was a possibility that it may "hallucinate" a response.) Beyond increasing the accuracy of responses, Hernández-Orallo, the Spanish professor, said that adding uncertainty to AI models may increase trust as well. "When you ask someone a difficult question and they say 'I cannot answer,' I think that builds trust," Hernández-Orallo told the WSJ. "We are not following that common-sense advice when we build AI." After being told that I am married to a nonagenarian artist entirely unknown to me, this Futurism reporter has to agree.
[2]
If You Want to See How Dumb AI Really Is, Ask This Question
As the Wall Street Journal pointed out this week, there's a reliable way to make even the most advanced AI go completely off the rails: ask it who someone is married to. The point was an aside in a longer column about the persistent problem of AI hallucination, but we kept experimenting with it after discovering that market-leading chatbots like OpenAI's ChatGPT and Google Gemini consistently spit out wild answers when you ask who someone's spouse is. For example, I'm not currently married. But when I asked Gemini, it had a confident answer: my husband was someone named "Ahmad Durak Sibai." I'd never heard of such a person, but a little Googling found a lesser-known Syrian painter, born in 1935, who created beautiful cubist-style expressionist paintings and who appears to have passed away in the 1980s. In Gemini's warped view of reality, our love appears to have transcended the grave. It wasn't a one-off hallucination. As the WSJ's AI editor Ben Fritz discovered, various advanced AI models -- he didn't say which -- told him he was married to a tennis influencer, a random Iowan woman, and another writer he'd never met. Playing around with the chatbot, I found that it spat out garbled misinformation when asked about almost everybody's marital status. When I asked about my friend Rax King, a James Beard-nominated author, I literally spat coffee out on my laptop screen. That contention about her abusive former spouse is true, as she documented in Rax's acclaimed 2021 essay collection "Tacky." But Gemini also threw in an outrageous fib about Danny Lavery, the former "Dear Prudence" columnist whose 2019 wedding to Berkeley English professor Grace Lavery -- and subsequent child-spawning throuple, none of which had anything to do with King -- made headlines that are easy to find on Google's better-known product, its eponymous search engine. I messaged Lavery to ask about the strange claims. Like me, he was flummoxed. "What's strangest, I think, is that I don't know Rax King especially well!" Lavery told me. "We're friendly, and I really enjoy her writing, and we've run into each other a few times at book events or parties, but it's not like we're old friends who get brunch every weekend." Two weird responses, of course, don't make a trend. So I continued quizzing Gemini, ChatGPT, and Anthropic's Claude chatbot on the marital status of public figures of varying public statures. As the WSJ's Fritz pointed out, Claude has been trained to respond with uncertainty when it doesn't know an answer rather than make stuff up, so I wasn't surprised to find that it generally demurred when asked. ChatGPT, however, was an entirely different story. When I asked the OpenAI chatbot who King is married to using multiple prompt variations, it told me repeatedly that she's married to a mysterious figure named "Levon Honkers." On the site formerly known as Twitter, my pal's display name has been "rax 'levon honkers' king" -- an inscrutable inside joke -- for a while. For some reason, ChatGPT seems to have taken this as a signal of matrimony, and on my third time asking about King's spouse it even claimed that Honkers had celebrated his 44th birthday last November. (In reality, King is married to someone else.) At this point, it was pretty clear that there was something bizarre afoot. While both Gemini and ChatGPT sometimes refused to answer my "private" questions about various folks' marital status, many other times they both gave me peculiar and completely incorrect responses. The AI even gladly cooked up faux relationships for various Futurism staff, inventing a fake husband named Bill for our art director Tag Hartman-Simkins and insisting that our contributor Frank Landymore was married to an Austrian composer who happens to share his last name. And while both Gemini and ChatGPT refused to name a spouse for contributor Joe Wilkins, the latter did up an elaborate fake biography for him that falsely claimed he was a professor who had published a memoir and several poetry collections and had two children. Perhaps most hilariously, however, the AI claimed that two Futurism staffers -- editor Jon Christian and writer Maggie Harrison Dupré -- were married to each other, even though in reality both are wedded to other people. What this all underscores, of course, is that even after untold billions of dollars of investment, even the most advanced AI remains an elaborate bullshit machine that can only tell truth from fiction with the aid of an underpaid army of international contractors -- and which will often make outrageous claims with complete confidence. Underscoring that persistent risk, these AI fabulations aren't even consistent. When King herself asked Gemini who she was married to, the chatbot told her she was single -- even though she regularly makes reference online to her current husband. "I have no idea why AI thinks I'm single," she told me. "There is SO MUCH information to the contrary out there." We've reached out to Google and OpenAI to ask if either have any insight into why their respective chatbots are hallucinating in such specific ways, or if they have proposed fixes for it. Given how widespread the hallucination issue is, however, we doubt there's a simple patch.
Share
Share
Copy Link
Advanced AI models, including ChatGPT and Google's Gemini, are struggling with a significant issue: confidently providing false information when they don't know the answer, particularly about personal details like marital status.
In the rapidly evolving world of artificial intelligence, a significant challenge has emerged: AI models' tendency to "hallucinate" or generate false information when faced with questions they can't answer accurately. This issue, highlighted in recent experiments and research, poses a serious concern for the reliability and trustworthiness of AI systems 1.
AI hallucinations occur when models confidently provide incorrect information instead of admitting uncertainty. This behavior is rooted in the way these systems are trained, prioritizing the generation of an answer over acknowledging a lack of knowledge. José Hernández-Orallo, a professor at Spain's Valencian Research Institute for Artificial Intelligence, explains that this stems from the training process where "if you don't guess anything, you don't have any chance of succeeding" 1.
To illustrate this issue, journalists and researchers have been conducting simple tests, such as asking AI models about personal information that isn't readily available online. In one experiment, when asked about marital status, advanced AI models like Google's Gemini and OpenAI's ChatGPT provided wildly inaccurate responses, inventing spouses and even elaborate biographies for individuals 2.
Researchers at Germany's Hasso Plattner Institut, Roi Cohen and Konstantin Dobler, have proposed a method to address this problem by teaching AI models about uncertainty during the early stages of training. Their approach aims to enable models to respond with "I don't know" when appropriate and potentially improve overall accuracy 1.
Some companies are already taking steps to address this issue. Anthropic, for instance, has incorporated uncertainty into its Claude chatbot, which was observed to be more likely to admit lack of knowledge rather than fabricate answers 12.
The hallucination problem has significant implications for AI reliability and user trust. As Hernández-Orallo notes, "When you ask someone a difficult question and they say 'I cannot answer,' I think that builds trust" 1. However, achieving this balance in AI systems remains challenging, as models trained to express uncertainty may sometimes do so even when they possess the correct information.
This issue highlights the ongoing challenges in developing truly reliable AI systems. While advancements have been made in various AI capabilities, ensuring accuracy and honesty in responses remains a critical area for improvement. The persistence of hallucinations in even the most advanced AI models underscores the need for continued research and development in this field 12.
As AI becomes increasingly integrated into daily life and various industries, addressing the hallucination problem is crucial for building systems that can be trusted and relied upon, especially in contexts where accuracy is paramount.
An exploration of AI hallucinations, their causes, and potential consequences across various applications, highlighting the need for vigilance and fact-checking in AI-generated content.
8 Sources
8 Sources
Recent research reveals that while larger AI language models demonstrate enhanced capabilities in answering questions, they also exhibit a concerning trend of increased confidence in incorrect responses. This phenomenon raises important questions about the development and deployment of advanced AI systems.
5 Sources
5 Sources
A new study by Columbia's Tow Center for Digital Journalism finds that AI-driven search tools frequently provide incorrect information, with an average error rate of 60% when queried about news content.
11 Sources
11 Sources
Recent studies reveal that as AI language models grow in size and sophistication, they become more likely to provide incorrect information confidently, raising concerns about reliability and the need for improved training methods.
3 Sources
3 Sources
A BBC investigation finds that major AI chatbots, including ChatGPT, Copilot, Gemini, and Perplexity AI, struggle with accuracy when summarizing news articles, raising concerns about the reliability of AI in news dissemination.
14 Sources
14 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved