Google's Controversial Policy Change for Gemini AI Evaluation Raises Accuracy Concerns

6 Sources

Google has instructed contractors evaluating Gemini AI responses to rate prompts outside their expertise, potentially compromising the accuracy of AI-generated information on specialized topics.

News article

Google's New Evaluation Policy for Gemini AI

Google has implemented a controversial change in its evaluation process for Gemini AI, raising concerns about the accuracy and reliability of the AI's responses. The tech giant has instructed contractors working with GlobalLogic, an outsourcing firm owned by Hitachi, to rate AI-generated responses even when the topics fall outside their areas of expertise 12.

Previous Evaluation Process

Prior to this change, contractors evaluating Gemini's outputs were allowed to skip prompts that required specialized knowledge beyond their expertise. The previous guidelines stated, "If you do not have critical expertise (e.g. coding, math) to rate this prompt, please skip this task" 2. This approach ensured that only qualified individuals assessed technical responses, potentially reducing instances of AI hallucinations and improving overall accuracy 3.

New Guidelines and Concerns

The new internal guidelines, as reported by TechCrunch, now instruct contractors: "You should not skip prompts that require specialized domain knowledge" 2. Instead, they are asked to "rate the parts of the prompt you understand" and include a note acknowledging their lack of domain knowledge 3. This change has sparked worries about the potential impact on Gemini's accuracy, especially for highly sensitive topics like healthcare 2.

Limited Exceptions

Under the new policy, contractors can only skip prompts in two scenarios:

  1. When the information is completely missing
  2. If the content is harmful and requires special consent forms for evaluation 24

Potential Implications

This policy shift has raised several concerns:

  1. Accuracy Issues: There are fears that Gemini could become more prone to providing inaccurate information on highly technical subjects 2.

  2. Quality of Evaluations: The change may lead to a drop in the quality of Gemini's responses, particularly for specialized topics 3.

  3. AI Development Goals: Questions have arisen about how this approach aligns with Google's AI development objectives, particularly in improving accuracy and reducing hallucinations 5.

Industry Reactions

The decision has generated controversy within the AI community. One contractor noted, "I thought the point of skipping was to increase accuracy by giving it to someone better?" highlighting the potential drawbacks of this new approach 24.

Broader Context

This development comes at a time when AI companies are under scrutiny for the accuracy and reliability of their systems. The use of human evaluators is a standard practice in AI development, aimed at grounding responses and reducing errors. However, Google's new policy appears to diverge from this established approach 35.

As of now, Google has not responded to requests for comment on this policy change 4. The tech community and users alike will be closely watching how this new evaluation process impacts the performance and trustworthiness of Gemini AI in the coming months.

Explore today's top stories

Thinking Machines Lab Raises Record $2 Billion in Seed Funding, Valued at $12 Billion

Mira Murati's AI startup Thinking Machines Lab secures a historic $2 billion seed round, reaching a $12 billion valuation. The company plans to unveil its first product soon, focusing on collaborative general intelligence.

TechCrunch logoWired logoReuters logo

11 Sources

Startups

17 hrs ago

Thinking Machines Lab Raises Record $2 Billion in Seed

Google's AI Agent 'Big Sleep' Thwarts Cyberattack Before It Happens, Marking a Milestone in AI-Driven Cybersecurity

Google's AI agent 'Big Sleep' has made history by detecting and preventing a critical vulnerability in SQLite before it could be exploited, showcasing the potential of AI in proactive cybersecurity.

The Hacker News logoDigital Trends logoAnalytics India Magazine logo

4 Sources

Technology

9 hrs ago

Google's AI Agent 'Big Sleep' Thwarts Cyberattack Before It

AI Researchers Urge Preservation of Chain-of-Thought Monitoring as Critical Safety Measure

Leading AI researchers from major tech companies and institutions have published a position paper calling for urgent action to preserve and enhance Chain-of-Thought (CoT) monitoring in AI systems, warning that this critical safety measure could soon be lost as AI technology advances.

TechCrunch logoVentureBeat logoDigit logo

4 Sources

Technology

9 hrs ago

AI Researchers Urge Preservation of Chain-of-Thought

Google's AI-Powered Cybersecurity Breakthroughs: Big Sleep Agent Foils Live Attack

Google announces major advancements in AI-driven cybersecurity, including the first-ever prevention of a live cyberattack by an AI agent, ahead of Black Hat USA and DEF CON 33 conferences.

Google Blog logoSiliconANGLE logo

2 Sources

Technology

9 hrs ago

Google's AI-Powered Cybersecurity Breakthroughs: Big Sleep

Mistral Unveils Voxtral: Open-Source AI Audio Model Challenges Industry Giants

French AI startup Mistral releases Voxtral, an open-source speech recognition model family, aiming to provide affordable and accurate audio processing solutions for businesses while competing with established proprietary systems.

TechCrunch logoThe Register logoVentureBeat logo

7 Sources

Technology

17 hrs ago

Mistral Unveils Voxtral: Open-Source AI Audio Model
TheOutpost.ai

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

© 2025 Triveous Technologies Private Limited
Instagram logo
LinkedIn logo