Google's Controversial Policy Change for Gemini AI Evaluation Raises Accuracy Concerns

6 Sources

Google has instructed contractors evaluating Gemini AI responses to rate prompts outside their expertise, potentially compromising the accuracy of AI-generated information on specialized topics.

News article

Google's New Evaluation Policy for Gemini AI

Google has implemented a controversial change in its evaluation process for Gemini AI, raising concerns about the accuracy and reliability of the AI's responses. The tech giant has instructed contractors working with GlobalLogic, an outsourcing firm owned by Hitachi, to rate AI-generated responses even when the topics fall outside their areas of expertise 12.

Previous Evaluation Process

Prior to this change, contractors evaluating Gemini's outputs were allowed to skip prompts that required specialized knowledge beyond their expertise. The previous guidelines stated, "If you do not have critical expertise (e.g. coding, math) to rate this prompt, please skip this task" 2. This approach ensured that only qualified individuals assessed technical responses, potentially reducing instances of AI hallucinations and improving overall accuracy 3.

New Guidelines and Concerns

The new internal guidelines, as reported by TechCrunch, now instruct contractors: "You should not skip prompts that require specialized domain knowledge" 2. Instead, they are asked to "rate the parts of the prompt you understand" and include a note acknowledging their lack of domain knowledge 3. This change has sparked worries about the potential impact on Gemini's accuracy, especially for highly sensitive topics like healthcare 2.

Limited Exceptions

Under the new policy, contractors can only skip prompts in two scenarios:

  1. When the information is completely missing
  2. If the content is harmful and requires special consent forms for evaluation 24

Potential Implications

This policy shift has raised several concerns:

  1. Accuracy Issues: There are fears that Gemini could become more prone to providing inaccurate information on highly technical subjects 2.

  2. Quality of Evaluations: The change may lead to a drop in the quality of Gemini's responses, particularly for specialized topics 3.

  3. AI Development Goals: Questions have arisen about how this approach aligns with Google's AI development objectives, particularly in improving accuracy and reducing hallucinations 5.

Industry Reactions

The decision has generated controversy within the AI community. One contractor noted, "I thought the point of skipping was to increase accuracy by giving it to someone better?" highlighting the potential drawbacks of this new approach 24.

Broader Context

This development comes at a time when AI companies are under scrutiny for the accuracy and reliability of their systems. The use of human evaluators is a standard practice in AI development, aimed at grounding responses and reducing errors. However, Google's new policy appears to diverge from this established approach 35.

As of now, Google has not responded to requests for comment on this policy change 4. The tech community and users alike will be closely watching how this new evaluation process impacts the performance and trustworthiness of Gemini AI in the coming months.

Explore today's top stories

OpenAI Launches ChatGPT Study Mode: A New Approach to AI-Assisted Learning

OpenAI introduces Study Mode for ChatGPT, designed to enhance learning experiences by encouraging critical thinking rather than providing direct answers. This new feature aims to address concerns about AI's impact on education and student learning.

Ars Technica logoTechCrunch logoMIT Technology Review logo

20 Sources

Technology

11 hrs ago

OpenAI Launches ChatGPT Study Mode: A New Approach to

Microsoft and OpenAI in Advanced Talks to Reshape AI Partnership Amid Cloud Competition

Microsoft and OpenAI are negotiating a new deal that could ensure Microsoft's continued access to OpenAI's technology, even after achieving AGI. This comes as OpenAI diversifies its cloud partnerships, potentially challenging Microsoft's AI edge.

Bloomberg Business logoReuters logoEconomic Times logo

11 Sources

Technology

19 hrs ago

Microsoft and OpenAI in Advanced Talks to Reshape AI

Anthropic Nears $170 Billion Valuation in Potential $5 Billion Funding Round

Anthropic, the AI startup, is close to securing a massive funding round led by Iconiq Capital, potentially valuing the company at $170 billion. This development highlights the growing investor interest in AI companies and the increasing involvement of Middle Eastern capital in the sector.

TechCrunch logoBloomberg Business logoCNBC logo

4 Sources

Business and Economy

11 hrs ago

Anthropic Nears $170 Billion Valuation in Potential $5

Meta's Aggressive AI Talent Hunt and Superintelligence Push: High Costs, Uncertain Returns

Meta CEO Mark Zuckerberg's ambitious pursuit of AI talent and superintelligence capabilities faces challenges as the company reports slower growth amid rising costs. The tech giant's strategy includes massive investments in AI infrastructure and high-profile hires, but questions remain about its open-source approach and the performance of its Llama 4 model.

Wired logoReuters logoCNBC logo

7 Sources

Technology

11 hrs ago

Meta's Aggressive AI Talent Hunt and Superintelligence

Google Enhances AI Mode with New Features for Students and Researchers

Google introduces new AI Mode features including Canvas for study planning, image and PDF uploads on desktop, and real-time video input for Search Live, aimed at improving research and learning experiences.

TechCrunch logoThe Verge logoengadget logo

11 Sources

Technology

11 hrs ago

Google Enhances AI Mode with New Features for Students and
TheOutpost.ai

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

© 2025 Triveous Technologies Private Limited
Instagram logo
LinkedIn logo