Summary
- Google employs contract research agencies to evaluate Gemini response accuracy.
- GlobalLogic contractors evaluating Gemini prompts are no longer allowed to skip individual interactions based on lack of expertise.
- Concerns exist over Google’s reliance on fact-checkers without relevant knowledge, potentially impacting AI development goals.
Google Deepmind, the team responsible for developing and maintaining the conglomerate’s AI models, employs various techniques to evaluate and improve Gemini’s output. One such method, Gemini 2.0‘s recently announced FACTS Grounding benchmark, leverages responses from other advanced LLMs to determine if Gemini’s answers actually relate to a question, answer the question, and answer the question correctly.
Another method calls on human contractors from Hitachi-owned GlobalLogic to evaluate Gemini prompt responses and rate them for correctness. Until recently, contractors could skip individual prompts that fell significantly outside their areas of expertise. Now, Google has mandated that contractors can no longer skip prompts, forcing them to determine accuracy in subjects they might know nothing about (reporting by TechCrunch).
Related
Google joins the war on AI hallucination with its massive Data Commons knowledge graph
A step closer to hailing new robot overlords
Hands-on LLM error-checking gone awry
Are fact-checkers in over their heads?
Previously, GlobalData contractors could skip individual prompts they weren’t comfortable answering due to lack of background knowledge, with guidelines stating, “If you do not have critical expertise (e.g. coding, math) to rate this prompt, please skip this task.” According to sources that remain anonymous due to non-disclosure agreements, the new directive handed down from Google states, “You should not skip prompts that require specialized domain knowledge.”
Accompanying the new policy is an instruction to “rate the parts of the prompt you understand,” and make a note that it falls outside the reviewer’s knowledge base. The option to skip certain prompts due to lack of relevant expertise has been eliminated, with contractors now only allowed to bypass individual interactions due to non-existent prompts or responses, or the presence of harmful content the contractor isn’t authorized to evaluate.
What we know about GlobalLogic AI evaluation
A considerable, fluctuating number of open positions related to AI fact-checking exist on employment platforms like Upworthy and Indeed, offering anywhere from $14 per hour and up to evaluate AI performance. Various recruiters have reached out to jobseekers, apparently on behalf of GlobalLogic, in search of workers to fill potential contract-to-hire positions.
Many social media users report the company’s obfuscated interview process and lengthy, “stressful” onboarding process, while confirming Google as the GlobalData client. Some social media users purporting to currently work on the project have verified the claims of difficulties, as well as a starting pay around $21 per hour and the uncommon, but real, potential for direct hire.
Related
What is Reinforcement learning from human feedback?
Reinforcement learning has been a game changer in artificial intelligence, allowing machines to continuously improve their performance
What low-expertise fact-checking means for Gemini
Maybe nothing, and possibly nothing good
Predictably, contract, workflow, and data application details remain tightly locked down. Employing real people to evaluate individual prompt responses seems a logical choice. Complex recruiting and hiring processes, unclear client needs and guidelines during onboarding, and inconsistent management techniques have always surrounded large-scale, outsourced contracting jobs. Nothing there raises unexpected red flags, and current (claimed) GlobalData contractors note that many of its workers possess high-level and technical degrees.
The worry stems from Google’s apparent shift away from allowing admittedly uninformed evaluators to bypass questions they can’t answer. If a note indicating lack of expertise accompanies a contractor’s evaluation, Google could theoretically disregard the evaluation and return the interaction to the pool for re-inspection. We have no way of knowing at present how Google treats this data.
Related
What are AI hallucinations?
AI hallucinations offer false information as fact: Here’s how this problem happens
How does non-expert error-checking advance Google’s AI goals?
The obvious concern remains that the new directive implies Google’s decreasing reliance on educated experts, or even confident, self-aware autodidacts. TechCrunch, which originally received the leaked claims, noted one contractor explained, “I thought the point of skipping was to increase accuracy by giving it to someone better.”
Perhaps Google is simply streamlining its data collection process, and fully intends to discard, ignore, or clarify potentially inaccurate evaluations. Or, maybe, it’s decided that Gemini fact-checking and further development for accuracy and anti-hallucinations don’t, necessarily, require relevant background expertise when evaluating whether an LLM’s answers make any sense.
Related
Gemini AI in Gmail needs to be incredibly accurate for me to trust it
The company making search results worse wants you to trust it with your emails