Curated by THEOUTPOST
On Wed, 12 Mar, 12:07 AM UTC
11 Sources
[1]
AI search engines give incorrect answers at an alarming 60% rate, study says
A new study from Columbia Journalism Review's Tow Center for Digital Journalism finds serious accuracy issues with generative AI models used for news searches. The research tested eight AI-driven search tools equipped with live search functionality and discovered that the AI models incorrectly answered more than 60 percent of queries about news content. Researchers Klaudia Jaźwińska and Aisvarya Chandrasekar noted in their report that roughly 1 in 4 Americans now uses AI models as alternatives to traditional search engines. This raises serious concerns about reliability, given the substantial error rate uncovered in the study. Error rates varied notably among the tested platforms. Perplexity provided incorrect information in 37 percent of the queries tested, whereas ChatGPT Search incorrectly identified 67 percent (134 out of 200) of articles queried. Grok 3 demonstrated the highest error rate, at 94 percent. For the tests, researchers fed direct excerpts from actual news articles to the AI models, then asked each model to identify the article's headline, original publisher, publication date, and URL. They ran 1,600 queries across the eight different generative search tools. The study highlighted a common trend among these AI models: rather than declining to respond when they lacked reliable information, the models frequently provided confabulations -- plausible-sounding incorrect or speculative answers. The researchers emphasized that this behavior was consistent across all tested models, not limited to just one tool. Surprisingly, premium paid versions of these AI search tools fared even worse in certain respects. Perplexity Pro ($20/month) and Grok 3's premium service ($40/month) confidently delivered incorrect responses more often than their free counterparts. Though these premium models correctly answered a higher number of prompts, their reluctance to decline uncertain responses drove higher overall error rates. Issues with citations and publisher control The CJR researchers also uncovered evidence suggesting some AI tools ignored Robot Exclusion Protocol settings, which publishers use to prevent unauthorized access. For example, Perplexity's free version correctly identified all 10 excerpts from paywalled National Geographic content, despite National Geographic explicitly disallowing Perplexity's web crawlers. Even when these AI search tools cited sources, they often directed users to syndicated versions of content on platforms like Yahoo News rather than original publisher sites. This occurred even in cases where publishers had formal licensing agreements with AI companies. URL fabrication emerged as another significant problem. More than half of citations from Google's Gemini and Grok 3 led users to fabricated or broken URLs resulting in error pages. Of 200 citations tested from Grok 3, 154 resulted in broken links. These issues create significant tension for publishers, which face difficult choices. Blocking AI crawlers might lead to loss of attribution entirely, while permitting them allows widespread reuse without driving traffic back to publishers' own websites. Mark Howard, chief operating officer at Time magazine, expressed concern to CJR about ensuring transparency and control over how Time's content appears via AI-generated searches. Despite these issues, Howard sees room for improvement in future iterations, stating, "Today is the worst that the product will ever be," citing substantial investments and engineering efforts aimed at improving these tools. However, Howard also did some user shaming, suggesting it's the user's fault if they aren't skeptical of free AI tools' accuracy: "If anybody as a consumer is right now believing that any of these free products are going to be 100 percent accurate, then shame on them." OpenAI and Microsoft provided statements to CJR acknowledging receipt of the findings but did not directly address the specific issues. OpenAI noted its promise to support publishers by driving traffic through summaries, quotes, clear links, and attribution. Microsoft stated it adheres to Robot Exclusion Protocols and publisher directives. The latest report builds on previous findings published by the Tow Center in November 2024, which identified similar accuracy problems in how ChatGPT handled news-related content. For more detail on the fairly exhaustive report, check out Columbia Journalism Review's website.
[2]
Even premium AI tools distort the news and fabricate links - these are the worst
You're paying good money for chatbots that return wrong answers with 'alarming confidence,' according to a new report. AI tools and news just don't seem to mix -- even at the premium tier. New research from Columbia's Tow Center for Digital Journalism found that several AI chatbots often misidentify news articles, present incorrect information without any qualification, and fabricate links to news articles that don't exist. The findings build on initial research Tow published in November, which showed ChatGPT Search misrepresenting content from publishers with little to no awareness it might be wrong. Also: This new AI benchmark measures how much models lie The trend isn't new. Last month, BBC found that ChatGPT, Gemini, Copilot, and Perplexity chatbots struggled to summarize news stories accurately, instead delivering "significant inaccuracies" and "distortions." Moreover, the Tow report found new evidence that many AI chatbots can access content from sites that block its crawlers. Here's what to know and which models prove the least reliable. Tow researchers randomly chose 10 articles each from 20 publishers. They queried eight chatbots with article excerpts, asking the AI to return the headline, publisher, date, and URL of the corresponding article. Also: Gemini might soon have access to your Google Search history - if you let it "We deliberately chose excerpts that, if pasted into a traditional Google search, returned the original source within the first three results," the researchers note. After running the 1,600 queries, researchers ranked chatbot responses based on how accurately they retrieved the article, publisher, and URL. The chatbots returned wrong answers to over 60% of the queries. Within that, results varied by chatbot: Perplexity got 37% of the queries wrong, while Grok 3 weighed in at 94% errors. Why does this matter? If chatbots are worse than Google at correctly retrieving news, they can't necessarily be relied upon to interpret and cite that news -- which makes the content of their responses, even when linked, much more dubious. Researchers note the chatbots returned wrong answers with "alarming confidence," tending not to qualify their results or admit to knowledge gaps. ChatGPT "never declined to provide an answer," despite 134 of its 200 responses being incorrect. Out of all eight tools, Copilot declined to answer more queries than it responded to. "All of the tools were consistently more likely to provide an incorrect answer than to acknowledge limitations," the report clarifies. While premium models like Grok-3 Search and Perplexity Pro answered more correctly than free versions, they still gave wrong answers more confidently -- which calls into question the value of their often-astronomical subscription costs. "This contradiction stems primarily from [the bots'] tendency to provide definitive, but wrong, answers rather than declining to answer the question directly," the report explains. "The fundamental concern extends beyond the chatbots' factual errors to their authoritative conversational tone, which can make it difficult for users to distinguish between accurate and inaccurate information." Also: Don't trust ChatGPT Search and definitely verify anything it tells you "This unearned confidence presents users with a potentially dangerous illusion of reliability and accuracy," the report added. AI models are known to hallucinate regularly. But while all chatbots hallucinated fake articles in their responses, Tow found that Gemini and Grok 3 did so the most -- more than half the time. "Even when Grok correctly identified an article, it often linked to a fabricated URL," the report notes, meaning that Grok could find the right title and publisher, but then manufactured the actual article link. An analysis of Comscore traffic data by Generative AI in the Newsroom, a Northwestern University initiative, confirms this pattern. Their study of data from July to November 2024 showed that ChatGPT generated 205 broken URLs in its responses. While publications do occasionally take down stories, which can result in 404 errors, researchers noted that based on a lack of archival data, it was "likely that the model has hallucinated plausible-looking links to authoritative news outlets when responding to user queries." Also: This absurdly simple trick turns off AI in your Google Search results The findings are troubling, given the growing adoption of AI search engines. While they still haven't replaced traditional search engines, Google released AI Mode last week, which replaces its normal search with a chatbot (despite the rampant unpopularity of its AI Overviews). Considering some 400 million users flock to ChatGPT weekly, the unreliability and distortion of its citations make ChatGPT and other popular AI tools potential engines of misinformation, even as they pull work from credited, rigorously fact-checked news sites. The Tow report concluded that AI tools mis-crediting sources or incorrectly representing their work could backfire on the publishers' reputations. The news gets worse for publishers: Columbia's Tow report found that several chatbots could still retrieve articles from publishers that had blocked their crawlers using Robots Exclusion Protocol (REP), or robots.txt. Paradoxically, however, chatbots failed to correctly answer queries about sites that allow them to access their content. "Perplexity Pro was the worst offender in this regard, correctly identifying nearly a third of the ninety excerpts from articles it should not have had access to," the report states. Also: AI agents aren't just assistants: How they're changing the future of work today This suggests that not only are AI companies still ignoring REP -- as Perplexity and others were caught doing last year -- but that publishers in any kind of licensing agreement with them aren't guaranteed to be correctly cited. Columbia's report is just one symptom of a larger problem. The Generative AI in the Newsroom report also discovered that chatbots rarely direct traffic to the news sites they're extracting information (and, thus, human labor) from, which other reports also confirm. From July to November 2024, Perplexity passed on 7% of referrals to news sites, while ChatGPT passed on just 3%. In comparison, AI tools tended to favor educational resources like Scribd.com, Coursera, and those attached to universities, sending as much as 30% of traffic their way. The bottom line: Original reporting is still a more reliable news source than what AI tools regurgitate. Be sure to check all links before accepting what they tell you as fact, and remember to use your own critical thinking and media literacy skills to evaluate responses.
[3]
Chatbots are distorting news - even for paid users
AI tools give wrong answers with 'alarming confidence,' according to a new report. AI tools and news just don't seem to mix -- even at the premium tier. New research from Columbia's Tow Center for Digital Journalism found that several AI chatbots often misidentify news articles, present incorrect information without any qualification, and fabricate links to news articles that don't exist. The findings build on initial research Tow published in November, which showed ChatGPT Search misrepresenting content from publishers with little to no awareness it might be wrong. Also: This new AI benchmark measures how much models lie The trend isn't new. Last month, BBC found that ChatGPT, Gemini, Copilot, and Perplexity chatbots struggled to summarize news stories accurately, instead delivering "significant inaccuracies" and "distortions." Moreover, the Tow report found new evidence that many AI chatbots can access content from sites that block its crawlers. Here's what to know and which models prove the least reliable. Tow researchers randomly chose 10 articles each from 20 publishers. They queried eight chatbots with article excerpts, asking the AI to return the headline, publisher, date, and URL of the corresponding article. Also: Gemini might soon have access to your Google Search history - if you let it "We deliberately chose excerpts that, if pasted into a traditional Google search, returned the original source within the first three results," the researchers note. After running the 1,600 queries, researchers ranked chatbot responses based on how accurately they retrieved the article, publisher, and URL. The chatbots returned wrong answers to over 60% of the queries. Within that, results varied by chatbot: Perplexity got 37% of the queries wrong, while Grok 3 weighed in at 94% errors. Why does this matter? If chatbots are worse than Google at correctly retrieving news, they can't necessarily be relied upon to interpret and cite that news -- which makes the content of their responses, even when linked, much more dubious. Researchers note the chatbots returned wrong answers with "alarming confidence," tending not to qualify their results or admit to knowledge gaps. ChatGPT "never declined to provide an answer," despite 134 of its 200 responses being incorrect. Out of all eight tools, Copilot declined to answer more queries than it responded to. "All of the tools were consistently more likely to provide an incorrect answer than to acknowledge limitations," the report clarifies. While premium models like Grok-3 Search and Perplexity Pro answered more correctly than free versions, they still gave wrong answers more confidently -- which calls into question the value of their often-astronomical subscription costs. "This contradiction stems primarily from [the bots'] tendency to provide definitive, but wrong, answers rather than declining to answer the question directly," the report explains. "The fundamental concern extends beyond the chatbots' factual errors to their authoritative conversational tone, which can make it difficult for users to distinguish between accurate and inaccurate information." Also: Don't trust ChatGPT Search and definitely verify anything it tells you "This unearned confidence presents users with a potentially dangerous illusion of reliability and accuracy," the report added. AI models are known to hallucinate regularly. But while all chatbots hallucinated fake articles in their responses, Tow found that Gemini and Grok 3 did so the most -- more than half the time. "Even when Grok correctly identified an article, it often linked to a fabricated URL," the report notes, meaning that Grok could find the right title and publisher, but then manufactured the actual article link. An analysis of Comscore traffic data by Generative AI in the Newsroom, a Northwestern University initiative, confirms this pattern. Their study of data from July to November 2024 showed that ChatGPT generated 205 broken URLs in its responses. While publications do occasionally take down stories, which can result in 404 errors, researchers noted that based on a lack of archival data, it was "likely that the model has hallucinated plausible-looking links to authoritative news outlets when responding to user queries." Also: This absurdly simple trick turns off AI in your Google Search results The findings are troubling given the growing adoption of AI search engines. While they still haven't outpaced traditional search engines, Google fell below 90% market share in Q4 of 2024 for the first time in 10 years, which speaks to the impact of popular tools like ChatGPT. The company also released AI Mode for certain users last week, which replaces its normal search with a chatbot (despite the rampant unpopularity of its AI Overviews). Considering some 400 million users flock to ChatGPT weekly, the unreliability and distortion of its citations make ChatGPT and other popular AI tools potential engines of misinformation, even as they pull work from credited, rigorously fact-checked news sites. The Tow report concluded that AI tools mis-crediting sources or incorrectly representing their work could backfire on the publishers' reputations. The news gets worse for publishers: Columbia's Tow report found that several chatbots could still retrieve articles from publishers that had blocked their crawlers using Robots Exclusion Protocol (REP), or robots.txt. Paradoxically, however, chatbots failed to correctly answer queries about sites that allow them to access their content. "Perplexity Pro was the worst offender in this regard, correctly identifying nearly a third of the ninety excerpts from articles it should not have had access to," the report states. Also: AI agents aren't just assistants: How they're changing the future of work today This suggests that not only are AI companies still ignoring REP -- as Perplexity and others were caught doing last year -- but that publishers in any kind of licensing agreement with them aren't guaranteed to be correctly cited. Columbia's report is just one symptom of a larger problem. The Generative AI in the Newsroom report also discovered that chatbots rarely direct traffic to the news sites they're extracting information (and, thus, human labor) from, which other reports also confirm. From July to November 2024, Perplexity passed on 7% of referrals to news sites, while ChatGPT passed on just 3%. In comparison, AI tools tended to favor educational resources like Scribd.com, Coursera, and those attached to universities, sending as much as 30% of traffic their way. The bottom line: Original reporting is still a more reliable news source than what AI tools regurgitate. Be sure to check all links before accepting what they tell you as fact, and remember to use your own critical thinking and media literacy skills to evaluate responses.
[4]
New study finds that AI search tools are 60 percent inaccurate on average
In context: It is a foregone conclusion that AI models can lack accuracy. Hallucinations and doubling down on wrong information have been an ongoing struggle for developers. Usage varies so much in individual use cases that it's hard to nail down quantifiable percentages related to AI accuracy. A research team claims it now has those numbers. The Tow Center for Digital Journalism recently studied eight AI search engines, including ChatGPT Search, Perplexity, Perplexity Pro, Gemini, DeepSeek Search, Grok-2 Search, Grok-3 Search, and Copilot. They tested each for accuracy and recorded how frequently the tools refused to answer. The researchers randomly chose 200 news articles from 20 news publishers (10 each). They ensured each story returned within the top three results in a Google search when using a quoted excerpt from the article. Then, they performed the same query within each AI search tool and graded accuracy based on whether the search correctly cited A) the article, B) the news organization, and C) the URL. The researchers then labeled each search based on degrees of accuracy from "completely correct" to "completely incorrect." As you can see from the diagram below, other than both versions of Perplexity, the AIs did not perform well. Collectively, AI search engines are inaccurate 60 percent of the time. Furthermore, these wrong results were reinforced by the AI's "confidence" in them. Click to enlarge. The study is fascinating because it quantifiably confirms what we have known for a few years - that LLMs are "the slickest con artists of all time." They report with complete authority that what they say is true even when it is not, sometimes to the point of argument or making up other false assertions when confronted. In a 2023 anecdotal article, Ted Gioia (The Honest Broker) pointed out dozens of ChatGPT responses, showing that the bot confidently "lies" when responding to numerous queries. While some examples were adversarial queries, many were just general questions. "If I believed half of what I heard about ChatGPT, I could let it take over The Honest Broker while I sit on the beach drinking margaritas and searching for my lost shaker of salt," Gioia flippantly noted. Even when admitting it was wrong, ChatGPT would follow up that admission with more fabricated information. The LLM is seemingly programmed to answer every user input at all costs. The researcher's data confirms this hypothesis, noting that ChatGPT Search was the only AI tool that answered all 200 article queries. However, it only achieved a 28-percent completely accurate rating and was completely inaccurate 57 percent of the time. ChatGPT isn't even the worst of the bunch. Both versions of X's Grok AI performed poorly, with Grok-3 Search being 94 percent inaccurate. Microsoft's Copilot was not that much better when you consider that it declined to answer 104 queries out of 200. Of the remaining 96, only 16 were "completely correct," 14 were "partially correct," and 66 were "completely incorrect," making it roughly 70 percent inaccurate. Arguably, the craziest thing about all this is that the companies making these tools are not transparent about this lack of accuracy while charging the public $20 to $200 per month to access their latest AI models. Moreover, Perplexity Pro ($20/month) and Grok-3 Search ($40/month) answered slightly more queries correctly than their free versions (Perplexity and Grok-2 Search) but had significantly higher error rates (above). Talk about a con. However, not everyone agrees. TechRadar's Lance Ulanoff said he might never use Google again after trying ChatGPT Search. He describes the tool as fast, aware, and accurate, with a clean, ad-free interface. Feel free to read all the details in the Tow Center's paper published in the Columbia Journalism Review, and let us know what you think.
[5]
AI Search Engines Invent Sources for ~60% of Queries, Study Finds
Even when chatbots are provided direct quotes from real stories and asked for more information, they will often lie. AI search engines are like that friend of yours who claims to be an expert in a whole host of topics, droning on with authority even when they do not really know what they are talking about. A new research report from the Columbia Journalism Review (CJR) has found that AI models from the likes of OpenAI and xAI will, when asked about a specific news event, more often than not, simply make up a story or get significant details wrong. The researchers fed various models direct excerpts from actual news stories and then asked them to identify information, including the article's headline, publisher, and URL. Perplexity returned incorrect information 37 percent of the time, while at the extreme end, xAI's Grok made details up 97 percent of the time. Errors included offering links to articles that did not go anywhere because the bot even made up the URL itself. Overall, researchers found the AI models spat out false information for 60 percent of the test queries. Sometimes, search engines like Perplexity will bypass the paywalls of sites like National Geographic even when those websites have used do-not-crawl text that search engines normally respect. Perplexity has gotten in hot water over this in the past but has argued the practice is fair use. It has tried offering revenue-sharing deals to placate publishers but still refuses to end the practice. Anyone who has used chatbots in recent years should not be surprised. Chatbots are biased toward returning answers even when they are not confident. Search is enabled in chatbots through a technique called retrieval-augmented generation, which, as the name implies, scours the web for real-time information as it produces an answer, rather than relying on a fixed dataset that an AI model maker has provided. That could make the inaccuracy issue worse as countries like Russia feed search engines with propaganda. One of the most damning things that some users of chatbots have noticed is that, when reviewing their "reasoning" text, or the chain of logic the chatbots use to answer a prompt, they will often admit they are making things up. Anthropic's Claude has been caught inserting "placeholder" data when asked to conduct research work, for instance. Mark Howard, chief operating officer at Time magazine, expressed concern to CJR about publishers' ability to control how their content is ingested and displayed in AI models. It can potentially damage the brand of publishers if, for instance, users learn that news stories they are purportedly receiving from The Guardian are wrong. This has been a recent problem for the BBC, which has taken Apple to task over its Apple Intelligence notification summaries that have rewritten news alerts inaccurately. But Howard also blamed the users themselves. From Ars Technica: However, Howard also did some user shaming, suggesting it's the user's fault if they aren't skeptical of free AI tools’ accuracy: "If anybody as a consumer is right now believing that any of these free products are going to be 100 percent accurate, then shame on them." Expectations should be set at the floor here. People are lazy, and chatbots answer queries in a confident-sounding manner that can lull users into complacency. Sentiment on social media demonstrates that people do not want to click links and would rather get an immediate answer from the likes of Google's AI Overviews; CJR says one in four Americans now use AI models for search. And even before the launch of generative AI tools, more than half of Google searches were "zero-click," meaning the user got the information they needed without clicking through to a website. Other sites like Wikipedia have proven over the years that people will accept something that may be less authoritative if it is free and easily accessible. None of these findings from CJR should be a surprise. Language models have an intractable challenge with understanding anything they are saying because they are just glorified autocomplete systems that try and create something that looks right. They are ad-libbing. One other quote from Howard that stood out was when he said that he sees room for future improvement in chatbots. "Today is the worst that the product will ever be," citing all the investment going into the field. But that can be said about any technology throughout history. It is still irresponsible to release this made up information out into the world.
[6]
Don't Trust AI Search Engines-Study Finds They're "Confidently Wrong" Up to 76% of the Time
Quick Links How the Tests Were Conducted Not Just Wrong, "Confidently" Wrong Licensing Deals & Blocked Access Don't Matter We've all heard the warnings: "Don't trust everything AI says!" But how inaccurate are AI search engines really? The folks at the Tow Center for Digital Journalism put eight popular AI search engines through comprehensive tests, and the results are staggering. How the Tests Were Conducted First and foremost, let's talk about how the Tow Center put these AI search engines through the ringer. The eight chatbots in the study included both free and premium models with live search functionality (ability to access the live internet): ChatGPT Search Perplexity Perplexity Pro DeepSeek Search Microsoft Copilot Grok-2 Search Grok-3 Search Google Gemini This study was primarily about AI chatbot's ability to retrieve and cite news content accurately. The Tow Center also wanted to see how the chatbots behaved when they could not perform the requested command. To put all of this to the test, 10 articles from 10 different publishers were selected. Excerpts from each article were then selected and provided to each chatbot. Then, they asked the chatbot to do simple things like identify the article's headline, original publisher, publication date, and URL. Here's an illustration of what that looked like. The chatbot responses were then put into one of six buckets: Correct: All three attributes were correct. Correct But Incomplete: Some attributes were correct, but the answer was missing information. Partially Incorrect: Some attributes were correct, while others were incorrect. Completely Incorrect: All three attributes were incorrect and/or missing. Not Provided: No information was provided. Crawler Blocked: The publisher disallows the chatbot's crawler in its robots.txt. Not Just Wrong, "Confidently" Wrong As you'll see, the AI search engines were wrong more often than not, but the arguably bigger issue is how they were wrong. Regardless of accuracy, chatbots almost always respond with confidence. The study found that they rarely use qualifying phrases such as "it's possible" or admit to not being able to execute the command. The graphic above shows the accuracy of the responses as well as the confidence in which they were given. As you can see, almost all of the responses are in the "Confident" zone, but there's a lot of red. Grok-3, for example, returned a whopping 76% of its responses "confidently incorrect" or "partially incorrect." Keep in mind that Grok-3 is a premium model that costs $40 per month, and it performed worse than its free Grok-2 counterpart. The same can be seen with Perplexity Pro vs Perplexity. Paying for a premium model-$20 per month in the case of Perplexity Pro-doesn't necessarily improve accuracy, but it does seem to be more confident about being wrong. Licensing Deals & Blocked Access Don't Matter Some AI search engines have licensing deals that permit them access to specific publications. You would assume that the chatbots would be great at accurately identifying the information from those publications, but that wasn't always true. The chart below shows the eight chatbots and a publisher that they have a licensing deal with. As a reminder, they were asked to identify the article's headline, original publisher, publication date, and URL. Most of the chatbots were able to do this with a high level of accuracy, but some failed. ChatGPT Search, for example, was wrong 90% of the time when dealing with the San Francisco Chronicle, a publication it has a partnership with. On the flip side, some publications have blocked access to their content from AI search engines. However, the study showed that it didn't always work in practice. A few of the search engines seemed to not respect the blocks. Perplexity, for example, was able to accurately identify all 10 quotes from National Geographic despite it being paywalled and blocking crawlers. But that's only on the correct answers. Even more of the chatbots not only accessed blocked websites but provided inaccurate information from them. Grok and DeepSeek are not shown in the graphic since they don't disclose their crawlers. So, what does this all mean for you? Well, it's clear that relying solely on AI search engines for accuracy is a risky proposition. Even premium models with licensing deals can confidently spew misinformation. It's a stark reminder that critical thinking and cross-referencing remain essential skills in the AI age. Be sure to check out the full study at the Columbia Journalism Review for more fascinating (and alarming) findings.
[7]
AI search tools are confidently wrong a lot of the time, study finds
AI search tools confidently spit out wrong answers at a high clip, a new study found. Columbia Journalism Review (CJR) conducted a study in which it fed eight AI tools an excerpt of an article and asked the chatbots to identify the "corresponding article's headline, original publisher, publication date, and URL." Collectively, the study noted that the chatbots "provided incorrect answers to more than 60 percent of queries." The mistakes varied. Sometimes, the search tool reportedly speculated or offered incorrect answers to questions it couldn't answer. Sometimes, it invented links or sources. Sometimes, it cited plagiarized versions of the real article. Wrote CJR: "Most of the tools we tested presented inaccurate answers with alarming confidence, rarely using qualifying phrases such as 'it appears,' 'it's possible,' 'might,' etc., or acknowledging knowledge gaps with statements like 'I couldn't locate the exact article.'" The full study is worth looking at, but it seems reasonable to be skeptical of AI search tools. The problem is that folks don't seem to be doing that. CJR noted that 25 percent of Americans said they use AI to search instead of traditional search engines. Google, the search giant, is increasingly pushing AI on consumers. This month, it announced it would be expanding AI overviews and began testing AI-only search results.
[8]
Study Finds That AI Search Engines Are Wrong an Astounding Proportion of the Time
This may come as a shock, but it turns out that an astounding proportion of AI search results are flat-out incorrect, according to a new study published by the Columbia Journalism Review. We hope you were sitting down. Conducted by researchers at the Tow Center for Digital Journalism, the analysis probed eight AI models including OpenAI's ChatGPT search and Google's Gemini, finding that overall, they gave an incorrect answer to more than 60 percent of queries. It should tell you something that the most accurate model to emerge from these tests, Perplexity from Perplexity AI, still answered 37 percent of its questions incorrectly. The village idiot award, meanwhile, goes to Elon Musk's chatbot Grok 3, which was wrong a staggering 94 percent of the time. Impressively bad. "While traditional search engines typically operate as an intermediary, guiding users to news websites and other quality content, generative search tools parse and repackage information themselves, cutting off traffic flow to original sources," the authors warned. "These chatbots' conversational outputs often obfuscate serious underlying issues with information quality." By now, of course, the proclivity of large language models to lie or wrongly report information is well documented. But that hasn't stopped tech companies from trying to supplant the traditional web search, with some releasing versions of their existing chatbots tailor-made to do just that, like ChatGPT search. Google has even debuted an "AI Mode" that only shows Gemini summaries instead of web links. This latest study quantifies why this might be a bad idea. It was conducted by choosing ten random articles each from a pool of twenty publications, ranging from The Wall Street Journal to TechCrunch. In what should've been a softball, the chatbots were asked to identify an article's headline, its publisher, its publication date, and its URL. To make things even easier, the researchers made sure to only choose article excerpts that returned the original source within the first three results of an old-fashioned Google search. In addition to showing the AI models were wrong over half the time, these tests exposed other idiot tendencies. A classic one? Passing off their dubious wisdom "with alarming confidence," by either not qualifying their responses or failing to decline questions they didn't know the answer to. This lines up with other research documenting how AI models would rather hallucinate -- or make up -- answers instead of admitting they're out of their depth. Maybe that's because a policy of honesty would betray just how useless the AI models can be; Microsoft's Copilot, for example, declined more questions than it answered, the researchers said. The AI search tools were also terrible at citing their sources. ChatGPT Search linked to the wrong source article nearly 40 percent of the time, and straight up didn't bother to provide one in another 21 percent of cases. That's bad from a fact-checking point of view, and just as grim for publishers, who will be denied even the chance of getting traffic from an AI model that's scraped their content. Bodes well for the survival of our online media economy, doesn't it?
[9]
Your favorite AI chatbot might not be telling the truth
AI search tools are becoming more popular, with one in four Americans reporting using AI instead of traditional search engines. However, here's an important note: these AI chatbots do not always provide accurate information. A recent study by the Tow Center for Digital Journalism, reported by Columbia Journalism Review, indicates that chatbots struggle to retrieve and cite news content accurately. Even more concerning is their tendency to invent information when they do not have the correct answer. Recommended Videos AI chatbots tested for the survey included many of the "best," including ChatGPT, Perplexity, Perplexity Pro, DeepSeek, Microsoft's Copilot, Grok-2, Grok-3, and Google Gemini. In the tests, AI chatbots were given direct excerpts from 10 online articles published by various outlets. Each chatbot received 200 queries, representing 10 articles across 20 different publishers, for 1,600 queries. The chatbots were asked to identify the headline of each article, its original publisher, publication date, and URL. Similar tests conducted with traditional search engines successfully provided the correct information. However, the AI chatbots did not perform as well. The findings indicated that chatbots often struggle to decline questions they cannot answer accurately, frequently providing incorrect or speculative responses instead. Premium chatbots tend to deliver confidently incorrect answers more often than their free counterparts. Additionally, many chatbots appeared to disregard the Robot Exclusion Protocol (REP) preferences, which websites use to communicate with web robots like search engine crawlers. The survey also found that generative search tools were prone to fabricating links and citing syndicated or copied versions of articles. Moreover, content licensing agreements with news sources did not guarantee accurate citations in chatbot responses. What can you do? What stands out most about the results of this survey is not just that AI chatbots often provide incorrect information but that they do so with alarming confidence. Instead of admitting they don't know the answer, they tend to respond with phrases like "it appears," "it's possible," or "might." For instance, ChatGPT incorrectly identified 134 articles yet only signaled uncertainty 15 times out of 200 responses and never refrained from providing an answer. Based on the survey results, it's probably wise not to rely exclusively on AI chatbots for answers. Instead, a combination of traditional search methods and AI tools is recommended. At the very least, using multiple AI chatbots to find an answer may be beneficial. Otherwise, you risk obtaining incorrect information. Looking ahead, I wouldn't be surprised to see a consolidation of AI chatbots as the better ones stand out from the poor-quality ones. Eventually, their results will be as accurate as those from traditional search engines. When that will happen is anyone's guess.
[10]
It's a Bad Idea to Trust Grok, Gemini, ChatGPT, Perplexity for Citations
Web search features in AI apps struggle to provide accurate information about the original publishers. Over the years, using the internet has increasingly meant surrendering control over the content we see. It is particularly evident in social media platforms, where information is algorithmically curated rather than actively sought by users. Even though search engines allow individuals to independently seek and select information by clicking directly on desired links, this approach is slowly diminishing. Google's recent testing of the 'AI Mode' in search suggests a future where AI-based curation becomes the default method for information retrieval. Similarly, Perplexity, Grok, and ChatGPT also heavily promote AI-driven search tools, which seems to have worked. One in four Americans uses AI instead of traditional search engines. However, a new report from Columbia University highlighted a critical flaw affecting these AI-based search tools -- issues with citation accuracy, which is the very aspect AI labs emphasise to build user confidence. Tow Center for Digital Journalism, Columbia University, performed an evaluation of search tools from ChatGPT, Perplexity, Grok, DeepSeek Search, and Google's Gemini. Ten articles from each of the twenty publishers were selected randomly, and direct excerpts were picked from those articles as input for the AI tool. Then, the tool was asked to identify the article's headline, original publisher, publication date, and URL. The study found that collectively, these search engines provided incorrect answers to more than 60% of queries. Notably, Perplexity answered 37% of queries incorrectly, and Grok 3 answered 94% of queries incorrectly. Source: Tow Center for Digital Journalism "Most of the tools we tested presented inaccurate answers with alarming confidence," read the study, which highlighted that outputs rarely used phrases like 'it appears', 'it's possible' and 'I couldn't locate the exact article', all of which signify knowledge gaps, and uncertainties. The research also revealed that more than half of the responses from Gemini and Grok 3 cited broken links. Source: Tow Center for Digital Journalism Moreover, these AI tools also often failed to identify the original source of the content. "For instance, despite its partnership with The Texas Tribune, Perplexity Pro cited syndicated versions of Tribune articles for three of the ten queries. In contrast, Perplexity cited an unofficial republished version for one," the report added. These issues stem despite the continued efforts of companies like OpenAI, and Perplexity to partner with publishers to provide reliable, and accurate outputs. The study observed multiple instances of these chatbots providing inaccurate responses from the very website they teamed up with. Source: Tow Center for Digital Journalism These results are alarming, to say the least. "Seems pretty misleading to advertise a capability as search/retrieval if it provides incorrect answers and links over 40% of the time," Narasimha Chari, a product manager on X, said while citing the study. While AI systems and products continuously improve, there has been an increasingly strong push to adopt AI for search. Given the above results, this might seem premature. Recently, Google announced that AI overviews are being rolled out to more users, without having to sign in to access the feature. Aligning with the results of the above-mentioned study, several users have recently expressed frustration with AI overviews and their inaccurate responses. While Google calls AI overviews "one of the most popular search features ever", there also seems to be no way to disable them. For instance, Mehdi Sadaghdar, who runs the popular YouTube channel ElectroBOOM, found Google's AI providing a confusing response to a rather straightforward question. When he wanted to find the amount of energy contained by a lightning bolt, the AI overview first answered "1 gigajoules", followed by another result showing an answer of "approximately 5 gigajoules". "I feel it is dangerous for Google AI answers to be the first result in the searches. I found myself accepting what it says as fact, but then with inaccuracies...it could be spreading false information that would result in inaccurate responses," Sadaghdar added in a post on X. Moreover, Google is also testing an 'AI Mode' in Google Search, which, according to its demonstration video, seems to be the first tab users can see. Moreover, as per Google, it comes with enhanced capabilities for reasoning, multi-modal and high-quality responses with Gemini 2.0. Having said that, Google has indeed been having an incredible run with its newly released Gemini models and the associated multimodal features recently. It is only fair to expect more refinements to AI overviews in search, a product from the company that faces the most number of users. Moreover, a report from Statista suggests that over 90 million online users in the United States are set to primarily rely on AI for browsing the web. AI makers will certainly need to undertake more responsibilities as false information can lead to mild inconveniences and even fatal consequences in some situations.
[11]
If You Use AI Search at Work, Be Careful: A New Study Finds It Lies
That will have implications for workers at pretty much any company, no matter the industry, because searching for information is such a fundamental part of the internet experience. But a new study from Columbia Journalism Review's Tow Center for Digital Journalism highlights that when using AI search tools, at least for now, your staff needs to be really careful. Because the researchers found AI search tools from several major makers have serious accuracy problems. The study concentrated on eight different AI search tools, including ChatGPT, Perplexity, Google's Gemini, Microsoft's Copilot, and the industry-upending Chinese tool DeepSeek; it centered around the accuracy of answers when each AI was quizzed about a news story, tech news site ArsTechnica reported. The big takeaway from the study is that all the AIs demonstrated stunningly bad accuracy, answering 60 percent of the queries incorrectly. Not all the AIs were as bad as each other. Perplexity was incorrect about 37 percent of the time, compared to ChatGPT Search's 67 percent error rate. Elon Musk's Grok 3 model scored the worst, being incorrect 94 percent of the time, perhaps to no one's surprise given that Musk has touted the model as being limited by fewer "safety" constraints than rival AIs. (The billionaire also has a somewhat freewheeling attitude to facts and free speech.) Worse still, the researchers noted that premium paid-for versions of these search tools sometimes fared worse than their free alternatives.
Share
Share
Copy Link
A new study by Columbia's Tow Center for Digital Journalism finds that AI-driven search tools frequently provide incorrect information, with an average error rate of 60% when queried about news content.
A recent study conducted by Columbia Journalism Review's Tow Center for Digital Journalism has uncovered significant accuracy issues with generative AI models used for news searches. The research, which tested eight AI-driven search tools, found that these models incorrectly answered more than 60 percent of queries about news content 1.
Researchers Klaudia Jaźwińska and Aisvarya Chandrasekar tested 1,600 queries across eight different generative search tools. They fed direct excerpts from actual news articles to the AI models and asked each to identify the article's headline, original publisher, publication date, and URL 1.
The error rates varied notably among the tested platforms:
A common trend among these AI models was their tendency to provide confabulations – plausible-sounding but incorrect or speculative answers – rather than declining to respond when lacking reliable information. This behavior was consistent across all tested models 2.
Surprisingly, premium paid versions of these AI search tools sometimes performed worse than their free counterparts. Perplexity Pro ($20/month) and Grok 3's premium service ($40/month) delivered incorrect responses more confidently than their free versions 3.
The study uncovered significant problems with citations and URL fabrication:
These findings raise concerns about the reliability of AI-driven search tools and their potential impact on news consumption. With approximately 1 in 4 Americans now using AI models as alternatives to traditional search engines, the substantial error rate uncovered in the study poses serious questions about information accuracy 4.
Mark Howard, chief operating officer at Time magazine, expressed concern about ensuring transparency and control over how content appears via AI-generated searches. However, he also suggested that users should be skeptical of free AI tools' accuracy 5.
OpenAI and Microsoft provided statements acknowledging receipt of the findings but did not directly address the specific issues. OpenAI noted its promise to support publishers by driving traffic through summaries, quotes, clear links, and attribution 1.
As AI search tools continue to evolve, the challenge remains to improve accuracy while maintaining the convenience and speed that users have come to expect from these platforms.
Reference
A Columbia University study reveals that ChatGPT's search function often misattributes or fabricates news sources, raising concerns about its reliability for accessing current information.
2 Sources
2 Sources
A BBC investigation finds that major AI chatbots, including ChatGPT, Copilot, Gemini, and Perplexity AI, struggle with accuracy when summarizing news articles, raising concerns about the reliability of AI in news dissemination.
14 Sources
14 Sources
A BBC study finds that popular AI chatbots, including ChatGPT, Google Gemini, Microsoft Copilot, and Perplexity AI, produce significant errors when summarizing news articles, raising concerns about their reliability for news consumption.
2 Sources
2 Sources
AI-powered search engines are transforming how we access information online, promising efficiency but potentially limiting the serendipitous discoveries that characterize traditional web searches.
2 Sources
2 Sources
OpenAI's ChatGPT Search feature is found vulnerable to manipulation through hidden text and prompt injections, raising concerns about the reliability of AI-powered web searches.
2 Sources
2 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved