Google AI Overviews could deliver 57 million inaccurate answers per hour, study reveals

Reviewed byNidhi Govil

2 Sources

Share

A study by AI startup Oumi found that nearly 1 in 10 Google AI Overviews contains false information. With Google processing roughly 5 trillion queries annually, this translates to over 57 million inaccurate AI answers each hour. The research also uncovered growing discrepancies between AI-generated search overviews and their cited sources, raising concerns about reliability and AI susceptibility to manipulation.

Google AI Overviews face mounting accuracy concerns

Google AI has come under intense scrutiny as research reveals that AI Overviews, the company's AI-generated search overviews feature, may be delivering inaccurate AI answers at an alarming scale. According to analysis reported by The New York Times, approximately one in 10 Google AI answers contain errorsβ€”a figure that translates to more than 57 million inaccurate responses each hour given that Google processes roughly 5 trillion queries per year, or nearly 1 million flawed answers per minute

1

.

Source: TechSpot

Source: TechSpot

The study by AI startup Oumi evaluated Gemini's accuracy using SimpleQA, a widely used generative AI benchmark. After analyzing 4,326 Google searches, Oumi discovered that Google's AI assistant, Gemini 2, produced accurate overviews 85 percent of the time in October. By February, Gemini 3 accuracy had improved to 91 percent

2

. While this represents progress, the sheer volume of user queries means that even a 9 percent error rate results in millions of users encountering false information daily.

Internal Google tests reveal higher hallucination rate

A Google spokesperson disputed Oumi's testing methodology, calling it flawed and arguing that it does not reflect real-world search behavior. However, internal Google tests paint an even more concerning picture. According to the company's own evaluation, Gemini 3 produces AI hallucinations 28 percent of the time when operating independently of Google Search

1

. This hallucination rate is significantly higher than what Oumi's external testing methodology detected, suggesting that the problem may be more severe than initially reported.

Oumi's testing methodology relies on AI tools to evaluate large volumes of results, which may introduce their own errors. Additionally, researchers discovered that Google sometimes generates different AI Overviews for the same query, even when repeated seconds apart, making consistent evaluation challenging

1

.

Source attribution problems compound reliability issues

Beyond accuracy concerns, AI source attribution has emerged as a critical weakness. Google attempts to support its AI Overview results with relevant links, but those sources often fail to substantiate Gemini's claimsβ€”whether accurate or not. In some cases, an incorrect AI Overview is immediately followed by a link containing correct information. In others, an accurate overview cites a source with inaccurate information. Sometimes the linked pages contain no relevant information at all

1

.

The problem has worsened over time. Discrepancies between AI Overviews and their sources increased significantly after the February update, rising from 37 percent of searches with Gemini 2 to 56 percent with Gemini 3

2

. This means that more than half of all AI-generated responses now cite sources that don't properly support the claims being made.

AI susceptibility to manipulation raises trust concerns

Researchers also uncovered troubling evidence of AI manipulation. In one documented example, a BBC journalist published a blog post containing deliberately false information and found that Google repeated those claims in its AI Overviews the following day

1

. This vulnerability demonstrates how easily bad actors could exploit search engines to spread misinformation at scale.

The implications extend beyond Google. Microsoft acknowledges in its terms of service that its Copilot AI tool is intended for entertainment purposes, not for making important decisions. Google's AI Overviews advise users to double-check responses, while xAI acknowledges that hallucinations can occur

1

. These disclaimers signal that AI companies themselves recognize the tenuous relationship between their tools and factual accuracy, placing the burden of information verification squarely on users who may not realize the technology's limitations.🟑 inexperienced at this point. That's why they sometimes repeat claims from blog posts containing deliberately false information, as happened with a BBC journalist's experiment.

The implications extend beyond Google. Microsoft acknowledges in its terms of service that its Copilot AI tool is intended for entertainment purposes, not for making important decisions. Google's AI Overviews advise users to double-check responses, while xAI acknowledges that hallucinations can occur. These disclaimers signal that AI companies themselves recognize the tenuous relationship between their tools and factual accuracy, placing the burden of information verification squarely on users who may not realize the technology's limitations.

Today's Top Stories

TheOutpost.ai

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

Β© 2026 Triveous Technologies Private Limited
Instagram logo
LinkedIn logo