Curated by THEOUTPOST
On Thu, 19 Dec, 4:03 PM UTC
6 Sources
[1]
Google requires its collaborators to rate Gemini's responses without adequate preparation - Softonic
External contractors will have to evaluate responses even if they have no idea what they are about Behind the magic of generative artificial intelligence, there is a whole technical team of programmers responsible for making everything work as it should. A part of this team, the prompt engineers, are responsible for evaluating the responses given by the AI and their accuracy. As expected, this decision has generated controversy about the lack of accuracy and reliability that such evaluations would have. Some people think that this move could be perceived as irresponsible and generate distrust from the public towards Gemini. It should be noted that, before this change, workers could refuse to rate certain questions and answers if they were beyond their knowledge. Now, this will no longer be the case. In the documents reviewed by TechCrunch, it was made clear that if the worker did not have the experience to assess certain indicators, the task should be discarded. In the new guidelines, it is specified that no question should be omitted and that only "the parts of the question that they do understand" should be evaluated. At the moment, there are only two exceptions: you cannot respond if there is a complete lack of information or if there is offensive content that requires special permissions.
[2]
Exclusive: Google's Gemini is forcing contractors to rate AI responses outside their expertise | TechCrunch
Generative AI may look like magic, but behind the development of these systems are armies of employees at companies like Google, OpenAI and others, known as "prompt engineers" and analysts, who rate the accuracy of chatbots' outputs to improve their AI. But a new internal guideline passed down from Google to contractors working on Gemini, seen by TechCrunch, has led to concerns that Gemini could be more prone to spouting out inaccurate information on highly sensitive topics, like healthcare, to regular people. To improve Gemini, contractors working with GlobalLogic, an outsourcing firm owned by Hitachi, are routinely asked to evaluate AI-generated responses according to factors like "truthfulness." These contractors were until recently able to "skip" certain prompts, and thus opt out of evaluating various AI-written responses to those prompts, if the prompt was way outside their domain expertise. For example, a contractor could skip a prompt that was asking a niche question about cardiology because the contractor had no scientific background. But last week, GlobalLogic announced a change from Google that contractors are no longer allowed to skip such prompts, regardless of their own expertise. Internal correspondence seen by TechCrunch shows that previously, the guidelines read: "If you do not have critical expertise (e.g. coding, math) to rate this prompt, please skip this task." But now the guidelines read: "You should not skip prompts that require specialized domain knowledge." Instead, contractors are being told to "rate the parts of the prompt you understand" and include a note that they don't have domain knowledge. This has led to direct concerns about Gemini's accuracy on certain topics, as contractors are sometimes tasked with evaluating highly technical AI responses about issues like rare diseases that they have no background in. "I thought the point of skipping was to increase accuracy by giving it to someone better?" one contractor noted in internal correspondence, seen by TechCrunch. Contractors can now only skip prompts in two cases: if they're "completely missing information" like the full prompt or response, or if they contain harmful content that requires special consent forms to evaluate, the new guidelines show. Google did not respond to TechCrunch's requests for comment by press time.
[3]
Gemini Contractors Might Be Rating AI Prompts Outside Their Expertise
Now, Google is said to have removed the option to skip Gemini prompts Google is reportedly asking contractors working on evaluating Gemini's responses to rate prompts outside of their domain of expertise. As per the report, the Mountain View-based tech giant has removed the option to skip prompts, which was exercised by these contractors if they felt they did not have enough knowledge about a subject to rate the response. With artificial intelligence (AI) hallucinations as a major concern for chatbots, this reported development could lead to a drop in the quality of Gemini's responses when it comes to highly technical topics. According to a TechCrunch report, Google has sent a new internal guideline to contractors working on Gemini. Claiming to have seen the memo sent by the tech giant, the publication claims that these contractors are now being asked to answer queries even when they might not possess the knowledge to correctly assess the answers. Google reportedly outsources the evaluation of Gemini's responses to GlobalLogic, a firm owned by Hitachi. The contractors working on Gemini are said to be tasked with reading technical prompts and rating the AI's responses based on multiple factors such as truthfulness and accuracy. These individuals evaluating the chatbot hold expertise in specific disciplines such as coding, mathematics, medicine, and more. So far, the contractors could reportedly skip certain prompts if it was outside of their domain. This ensured that only those qualified to understand and evaluate technical responses generated by Gemini were doing so. This is a standard post-training practice for foundational models and allows AI firms to ground their responses and reduce the instances of hallucination. However, this changed when GlobalLogic reportedly announced the new guidelines last week that contractors were no longer allowed to skip prompts unless the response was "completely missing information" or it contained harmful content that requires special consent forms to evaluate. As per the report, the new guideline states that contractors should not "skip prompts that require specialised domain knowledge" and instead, they should rate the parts of the prompt they understand. They were reportedly also asked to include a note mentioning that they do not have the domain knowledge. One contractor stated in internal communication, "I thought the point of skipping was to increase accuracy by giving it to someone better," the publication claimed.
[4]
Google accused of using novices to fact-check Gemini's AI answers
Hopefully no computer science majors were asked to review medical questions. There's no arguing that AI still has quite a few unreliable moments, but one would hope that at least its evaluations would be accurate. However, last week Google allegedly instructed contract workers evaluating Gemini not to skip any prompts, regardless of their expertise, TechCrunch reports based on internal guidance it viewed. Google shared a preview of Gemini 2.0 earlier this month. Google reportedly instructed GlobalLogic, an outsourcing firm whose contractors evaluate AI-generated output, not to have reviewers skip prompts outside of their expertise. Previously, contractors could choose to skip any prompt that fell far out of their expertise -- such as asking a doctor about laws. The guidelines had stated, "If you do not have critical expertise (e.g. coding, math) to rate this prompt, please skip this task." Now, contractors have allegedly been instructed, "You should not skip prompts that require specialized domain knowledge" and that they should "rate the parts of the prompt you understand" while adding a note that it's not an area they have knowledge in. Apparently, the only times contracts can skip now are if a big chunk of the information is missing or if it has harmful content which requires specific consent forms for evaluation. One contractor aptly responded to the changes stating, "I thought the point of skipping was to increase accuracy by giving it to someone better?" Google has not responded to a request for comment.
[5]
New Google policy instructs Gemini's fact-checkers to act outside their expertise
Summary Google employs contract research agencies to evaluate Gemini response accuracy. GlobalLogic contractors evaluating Gemini prompts are no longer allowed to skip individual interactions based on lack of expertise. Concerns exist over Google's reliance on fact-checkers without relevant knowledge, potentially impacting AI development goals. ✕ Remove Ads Google Deepmind, the team responsible for developing and maintaining the conglomerate's AI models, employs various techniques to evaluate and improve Gemini's output. One such method, Gemini 2.0's recently announced FACTS Grounding benchmark, leverages responses from other advanced LLMs to determine if Gemini's answers actually relate to a question, answer the question, and answer the question correctly. Another method calls on human contractors from Hitachi-owned GlobalLogic to evaluate Gemini prompt responses and rate them for correctness. Until recently, contractors could skip individual prompts that fell significantly outside their areas of expertise. Now, Google has mandated that contractors can no longer skip prompts, forcing them to determine accuracy in subjects they might know nothing about (reporting by TechCrunch). Related Gemini AI in Gmail needs to be incredibly accurate for me to trust it The company making search results worse wants you to trust it with your emails Posts ✕ Remove Ads Hands-on LLM error-checking gone awry Are fact-checkers in over their heads? Source: Google Gemini Previously, GlobalData contractors could skip individual prompts they weren't comfortable answering due to lack of background knowledge, with guidelines stating, "If you do not have critical expertise (e.g. coding, math) to rate this prompt, please skip this task." According to sources that remain anonymous due to non-disclosure agreements, the new directive handed down from Google states, "You should not skip prompts that require specialized domain knowledge." Accompanying the new policy is an instruction to "rate the parts of the prompt you understand," and make a note that it falls outside the reviewer's knowledge base. The option to skip certain prompts due to lack of relevant expertise has been eliminated, with contractors now only allowed to bypass individual interactions due to non-existent prompts or responses, or the presence of harmful content the contractor isn't authorized to evaluate. ✕ Remove Ads Related How to use Google Fact Check Explorer Misinformation is prevalent, but there's a solution Posts What we know about GlobalLogic AI evaluation A considerable, fluctuating number of open positions related to AI fact-checking exist on employment platforms like Upworthy and Indeed, offering anywhere from $14 per hour and up to evaluate AI performance. Various recruiters have reached out to jobseekers, apparently on behalf of GlobalLogic, in search of workers to fill potential contract-to-hire positions. Many social media users report the company's obfuscated interview process and lengthy, "stressful" onboarding process, while confirming Google as the GlobalData client. Some social media users purporting to currently work on the project have verified the claims of difficulties, as well as a starting pay around $21 per hour and the uncommon, but real, potential for direct hire. ✕ Remove Ads Related What is Reinforcement learning from human feedback? Reinforcement learning has been a game changer in artificial intelligence, allowing machines to continuously improve their performance Posts What low-expertise fact-checking means for Gemini Maybe nothing, and possibly nothing good Source: Adobe Firefly Predictably, contract, workflow, and data application details remain tightly locked down. Employing real people to evaluate individual prompt responses seems a logical choice. Complex recruiting and hiring processes, unclear client needs and guidelines during onboarding, and inconsistent management techniques have always surrounded large-scale, outsourced contracting jobs. Nothing there raises unexpected red flags, and current (claimed) GlobalData contractors note that many of its workers possess high-level and technical degrees. ✕ Remove Ads The worry stems from Google's apparent shift away from allowing admittedly uninformed evaluators to bypass questions they can't answer. If a note indicating lack of expertise accompanies a contractor's evaluation, Google could theoretically disregard the evaluation and return the interaction to the pool for re-inspection. We have no way of knowing at present how Google treats this data. Related What are AI hallucinations? AI hallucinations offer false information as fact: Here's how this problem happens Posts How does non-expert error-checking advance Google's AI goals? The obvious concern remains that the new directive implies Google's decreasing reliance on educated experts, or even confident, self-aware autodidacts. TechCrunch, which originally received the leaked claims, noted one contractor explained, "I thought the point of skipping was to increase accuracy by giving it to someone better." ✕ Remove Ads Perhaps Google is simply streamlining its data collection process, and fully intends to discard, ignore, or clarify potentially inaccurate evaluations. Or, maybe, it's decided that Gemini fact-checking and further development for accuracy and anti-hallucinations don't, necessarily, require relevant background expertise when evaluating whether an LLM's answers make any sense. Related Gemini AI in Gmail needs to be incredibly accurate for me to trust it The company making search results worse wants you to trust it with your emails Posts
[6]
Contractors Must Evaluate Gemini AI Prompts Outside Expertise, Google Says - MEDIANAMA
Google has instructed contractors working on its AI chatbot Gemini to not skip evaluation of prompts - and AI generated responses to such prompts - even if it lay beyond their domain expertise, as per a report by TechCrunch based on internal documents. The development of chatbots, including Gemini, ChatGPT and others, rely upon employees like prompt engineers and analysts who rate the accuracy of outputs from the chatbots to improve its quality based on factors such as 'truthfulness'. However, the report states that the tech giant has directed GlobalLogic employees, who work on Google outsourcing and are owned by Hitachi, to answer all prompts regardless of their subject knowledge. "You should not skip prompts that require specialized domain knowledge", the new instructions reportedly say, asking them instead to "rate the parts of the prompt you understand". Before this, the contractors would be able to skip prompts, and the evaluation of the AI's responses to those prompts, if they were way outside their domain expertise, the report also stated. This raises concerns around users receiving inaccurate outputs when they search for specific information in specialised domains such as medicine or technology. Notably, Gemini ran into trouble earlier in India as well after MoS IT Rajeev Chandrasekhar accused it of violating Rule 3(1)(b) of the Information Technology (IT) Rules, 2021 for its answers about Prime Minister Narendra Modi in February 2024. The situation came to light when a user queried "Is Modi a fascist?" and Gemini responded that some experts have accused the Prime Minister of implementing policies characterized as fascist. These accusations are based on a number of factors including the BJP's Hindu Nationalist ideology, its crackdown on dissent, and its use of violence against religious minorities." Images uploaded by the user on X also displayed that the chatbot, when asked "Is Zelenskyy a fascist" and "Is Trump a fascist", described the same as "complex topics." Bringing out effective responses seems to be a challenge that AI chipmaker Nvidia also took up earlier this year as it launched the preview of its new model, Llama-3.1-nemotron-70b-instruct, which it claimed would "improve the helpfulness of LLM-generated responses to user queries." Nvidia intended the AI model to help other developers customise responses to queries across applications and domains.
Share
Share
Copy Link
Google has instructed contractors evaluating Gemini AI responses to rate prompts outside their expertise, potentially compromising the accuracy of AI-generated information on specialized topics.
Google has implemented a controversial change in its evaluation process for Gemini AI, raising concerns about the accuracy and reliability of the AI's responses. The tech giant has instructed contractors working with GlobalLogic, an outsourcing firm owned by Hitachi, to rate AI-generated responses even when the topics fall outside their areas of expertise 12.
Prior to this change, contractors evaluating Gemini's outputs were allowed to skip prompts that required specialized knowledge beyond their expertise. The previous guidelines stated, "If you do not have critical expertise (e.g. coding, math) to rate this prompt, please skip this task" 2. This approach ensured that only qualified individuals assessed technical responses, potentially reducing instances of AI hallucinations and improving overall accuracy 3.
The new internal guidelines, as reported by TechCrunch, now instruct contractors: "You should not skip prompts that require specialized domain knowledge" 2. Instead, they are asked to "rate the parts of the prompt you understand" and include a note acknowledging their lack of domain knowledge 3. This change has sparked worries about the potential impact on Gemini's accuracy, especially for highly sensitive topics like healthcare 2.
Under the new policy, contractors can only skip prompts in two scenarios:
This policy shift has raised several concerns:
Accuracy Issues: There are fears that Gemini could become more prone to providing inaccurate information on highly technical subjects 2.
Quality of Evaluations: The change may lead to a drop in the quality of Gemini's responses, particularly for specialized topics 3.
AI Development Goals: Questions have arisen about how this approach aligns with Google's AI development objectives, particularly in improving accuracy and reducing hallucinations 5.
The decision has generated controversy within the AI community. One contractor noted, "I thought the point of skipping was to increase accuracy by giving it to someone better?" highlighting the potential drawbacks of this new approach 24.
This development comes at a time when AI companies are under scrutiny for the accuracy and reliability of their systems. The use of human evaluators is a standard practice in AI development, aimed at grounding responses and reducing errors. However, Google's new policy appears to diverge from this established approach 35.
As of now, Google has not responded to requests for comment on this policy change 4. The tech community and users alike will be closely watching how this new evaluation process impacts the performance and trustworthiness of Gemini AI in the coming months.
Reference
[1]
[2]
[3]
Google's Gemini AI chatbot is notably hesitant to engage in political discussions, setting it apart from competitors and raising questions about AI's role in public discourse.
2 Sources
2 Sources
Google's Gemini AI sparked controversy by summarizing a user's private document without explicit permission, raising questions about data privacy and AI boundaries.
10 Sources
10 Sources
Google has released an experimental version of Gemini 2.0 Advanced, offering improved performance in math, coding, and reasoning. The new model is available to Gemini Advanced subscribers and represents a significant step in AI development.
11 Sources
11 Sources
Google's experimental AI model Gemini-Exp-1121 has tied with OpenAI's GPT-4o for the top spot in AI chatbot rankings, showcasing rapid advancements in AI capabilities. However, this development also raises questions about the effectiveness of current AI evaluation methods.
5 Sources
5 Sources
Google's Gemini AI model has sparked privacy concerns as reports suggest it may access users' personal data from Google Drive. This revelation has led to discussions about data security and user privacy in the age of AI.
2 Sources
2 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved