2 Sources
2 Sources
[1]
Google AI overviews might hallucinate tens of millions of times per hour
Serving tech enthusiasts for over 25 years. TechSpot means tech analysis and advice you can trust. Cutting corners: Most search engines now present users with AI-generated overviews by default, sparking controversy over concerns about accuracy and lost click-through traffic. While testing suggests that Google's AI overviews are accurate most of the time, the enormous volume of queries the search engine processes each day likely still results in millions of incorrect responses. According to The New York Times, testing suggests that approximately one in 10 Google AI search overviews contains false information. Given that the search engine processes roughly 5 trillion queries per year, users could be exposed to more than 57 million inaccurate answers each hour - nearly 1 million per minute. The figures come from AI startup Oumi, which the Times asked to evaluate Gemini's accuracy using SimpleQA, a widely used generative AI benchmark. After analyzing 4,326 Google searches, Oumi found that Google's AI assistant, Gemini version 2, produced accurate overviews 85 percent of the time in October. By February, Gemini 3 had improved that figure to 91 percent. However, Oumi can evaluate large volumes of results only by relying on AI tools, which may also introduce errors. In addition, Google sometimes generates different AI overviews for the same query, even when it is repeated seconds apart. A Google spokesperson called Oumi's testing flawed, arguing that it does not reflect real-world search behavior. The company's internal testing indicates that Gemini 3, when operating independently of Google Search, hallucinates 28 percent of the time. Sourcing presents another challenge. Google attempts to support its AI overview results with relevant links, but those sources often do not substantiate Gemini's claims - whether accurate or not. In some cases, an incorrect AI overview is immediately followed by a link containing correct information; in others, an accurate overview cites a source with inaccurate information; and sometimes the linked pages contain no relevant information at all. Notably, discrepancies between AI overviews and their sources increased after the February update, rising from 37 percent of searches with Gemini 2 to 56 percent with Gemini 3. Researchers also found that AI overviews are susceptible to manipulation. In one example, a BBC journalist published a blog post containing false information and later found that Google repeated those claims the following day. Tellingly, Google and other AI companies acknowledge the technology's tenuous relationship with the truth in the fine print. Microsoft's terms of service describe its Copilot AI tool as intended for entertainment purposes, not for making important decisions. Google's AI overviews advise users to double-check responses, while xAI acknowledges that hallucinations can occur.
[2]
Study claims nearly 1 in 10 Google AI answers contain errors
Google's AI Overviews face scrutiny over accuracy, with reports indicating that nearly one in 10 responses contains false information. The New York Times highlights a significant potential impact on users as Google processes approximately 5 trillion queries annually, leading to over 57 million inaccurate answers each hour. The findings, released by AI startup Oumi, suggest that while Google's Gemini version 2 provided accurate results 85 percent of the time in October, this figure improved to 91 percent with the February release of Gemini 3. However, Oumi's testing methodology relies on AI tools which may introduce their own errors. A Google spokesperson criticized Oumi's evaluation as flawed and unrepresentative of typical search behavior. Internal tests show that Gemini 3 produces false outputs, or "hallucinates," 28 percent of the time when used outside the framework of Google Search. Sourcing challenges add to the concerns surrounding these AI Overviews. Google attempts to provide relevant links to support its responses but often these sources fail to substantiate the claims made by Gemini. Significant discrepancies in responses were noted, increasing from 37 percent of searches for Gemini 2 to 56 percent for Gemini 3 after updates in February. Researchers have highlighted the vulnerability of AI Overviews to manipulation. In one instance, a BBC journalist's inaccurate claims were echoed by Google the following day. Google and other AI firms, including Microsoft, have endorsed the need for users to verify the information provided by AI systems, emphasizing that such tools are not fully reliable.
Share
Share
Copy Link
A study by AI startup Oumi found that nearly 1 in 10 Google AI Overviews contains false information. With Google processing roughly 5 trillion queries annually, this translates to over 57 million inaccurate AI answers each hour. The research also uncovered growing discrepancies between AI-generated search overviews and their cited sources, raising concerns about reliability and AI susceptibility to manipulation.
Google AI has come under intense scrutiny as research reveals that AI Overviews, the company's AI-generated search overviews feature, may be delivering inaccurate AI answers at an alarming scale. According to analysis reported by The New York Times, approximately one in 10 Google AI answers contain errorsβa figure that translates to more than 57 million inaccurate responses each hour given that Google processes roughly 5 trillion queries per year, or nearly 1 million flawed answers per minute
1
.Source: TechSpot
The study by AI startup Oumi evaluated Gemini's accuracy using SimpleQA, a widely used generative AI benchmark. After analyzing 4,326 Google searches, Oumi discovered that Google's AI assistant, Gemini 2, produced accurate overviews 85 percent of the time in October. By February, Gemini 3 accuracy had improved to 91 percent
2
. While this represents progress, the sheer volume of user queries means that even a 9 percent error rate results in millions of users encountering false information daily.A Google spokesperson disputed Oumi's testing methodology, calling it flawed and arguing that it does not reflect real-world search behavior. However, internal Google tests paint an even more concerning picture. According to the company's own evaluation, Gemini 3 produces AI hallucinations 28 percent of the time when operating independently of Google Search
1
. This hallucination rate is significantly higher than what Oumi's external testing methodology detected, suggesting that the problem may be more severe than initially reported.Oumi's testing methodology relies on AI tools to evaluate large volumes of results, which may introduce their own errors. Additionally, researchers discovered that Google sometimes generates different AI Overviews for the same query, even when repeated seconds apart, making consistent evaluation challenging
1
.Beyond accuracy concerns, AI source attribution has emerged as a critical weakness. Google attempts to support its AI Overview results with relevant links, but those sources often fail to substantiate Gemini's claimsβwhether accurate or not. In some cases, an incorrect AI Overview is immediately followed by a link containing correct information. In others, an accurate overview cites a source with inaccurate information. Sometimes the linked pages contain no relevant information at all
1
.The problem has worsened over time. Discrepancies between AI Overviews and their sources increased significantly after the February update, rising from 37 percent of searches with Gemini 2 to 56 percent with Gemini 3
2
. This means that more than half of all AI-generated responses now cite sources that don't properly support the claims being made.Related Stories
Researchers also uncovered troubling evidence of AI manipulation. In one documented example, a BBC journalist published a blog post containing deliberately false information and found that Google repeated those claims in its AI Overviews the following day
1
. This vulnerability demonstrates how easily bad actors could exploit search engines to spread misinformation at scale.The implications extend beyond Google. Microsoft acknowledges in its terms of service that its Copilot AI tool is intended for entertainment purposes, not for making important decisions. Google's AI Overviews advise users to double-check responses, while xAI acknowledges that hallucinations can occur
1
. These disclaimers signal that AI companies themselves recognize the tenuous relationship between their tools and factual accuracy, placing the burden of information verification squarely on users who may not realize the technology's limitations.π‘ inexperienced at this point. That's why they sometimes repeat claims from blog posts containing deliberately false information, as happened with a BBC journalist's experiment.The implications extend beyond Google. Microsoft acknowledges in its terms of service that its Copilot AI tool is intended for entertainment purposes, not for making important decisions. Google's AI Overviews advise users to double-check responses, while xAI acknowledges that hallucinations can occur. These disclaimers signal that AI companies themselves recognize the tenuous relationship between their tools and factual accuracy, placing the burden of information verification squarely on users who may not realize the technology's limitations.
Summarized by
Navi
1
Technology

2
Science and Research

3
Science and Research
