Google's Controversial Policy Change for Gemini AI Evaluation Raises Accuracy Concerns

6 Sources

Share

Google has instructed contractors evaluating Gemini AI responses to rate prompts outside their expertise, potentially compromising the accuracy of AI-generated information on specialized topics.

News article

Google's New Evaluation Policy for Gemini AI

Google has implemented a controversial change in its evaluation process for Gemini AI, raising concerns about the accuracy and reliability of the AI's responses. The tech giant has instructed contractors working with GlobalLogic, an outsourcing firm owned by Hitachi, to rate AI-generated responses even when the topics fall outside their areas of expertise

1

2

.

Previous Evaluation Process

Prior to this change, contractors evaluating Gemini's outputs were allowed to skip prompts that required specialized knowledge beyond their expertise. The previous guidelines stated, "If you do not have critical expertise (e.g. coding, math) to rate this prompt, please skip this task"

2

. This approach ensured that only qualified individuals assessed technical responses, potentially reducing instances of AI hallucinations and improving overall accuracy

3

.

New Guidelines and Concerns

The new internal guidelines, as reported by TechCrunch, now instruct contractors: "You should not skip prompts that require specialized domain knowledge"

2

. Instead, they are asked to "rate the parts of the prompt you understand" and include a note acknowledging their lack of domain knowledge

3

. This change has sparked worries about the potential impact on Gemini's accuracy, especially for highly sensitive topics like healthcare

2

.

Limited Exceptions

Under the new policy, contractors can only skip prompts in two scenarios:

  1. When the information is completely missing
  2. If the content is harmful and requires special consent forms for evaluation

    2

    4

Potential Implications

This policy shift has raised several concerns:

  1. Accuracy Issues: There are fears that Gemini could become more prone to providing inaccurate information on highly technical subjects

    2

    .

  2. Quality of Evaluations: The change may lead to a drop in the quality of Gemini's responses, particularly for specialized topics

    3

    .

  3. AI Development Goals: Questions have arisen about how this approach aligns with Google's AI development objectives, particularly in improving accuracy and reducing hallucinations

    5

    .

Industry Reactions

The decision has generated controversy within the AI community. One contractor noted, "I thought the point of skipping was to increase accuracy by giving it to someone better?" highlighting the potential drawbacks of this new approach

2

4

.

Broader Context

This development comes at a time when AI companies are under scrutiny for the accuracy and reliability of their systems. The use of human evaluators is a standard practice in AI development, aimed at grounding responses and reducing errors. However, Google's new policy appears to diverge from this established approach

3

5

.

As of now, Google has not responded to requests for comment on this policy change

4

. The tech community and users alike will be closely watching how this new evaluation process impacts the performance and trustworthiness of Gemini AI in the coming months.

TheOutpost.ai

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

© 2025 Triveous Technologies Private Limited
Instagram logo
LinkedIn logo