Google's AI Training Practices Raise Concerns Over Publisher Opt-Outs and Data Usage

4 Sources

Share

Google's DeepMind VP reveals that the company's search organization can train AI on publisher content even after opt-outs, sparking debates on data usage and monopolistic practices.

News article

Google's AI Training Practices Unveiled

In a recent antitrust trial, Google's DeepMind Vice President Eli Collins revealed that the company's search organization can train its AI models on web content even when publishers have opted out of AI training

1

. This admission has sparked concerns about Google's data usage practices and its potential monopolistic behavior in the AI and search markets.

The Opt-Out Loophole

Collins confirmed that while publishers can opt out of AI training for DeepMind models, this doesn't extend to other parts of Google, including its search organization

2

. This means that Google's search-specific AI products, such as AI Overviews and the recently launched AI Mode, can still use content from publishers who have opted out of AI training

3

.

Scale of Data Usage

An internal document from 2024 cited during the trial showed that Google had collected 160 billion tokens for AI training data. Half of these tokens were removed due to publisher opt-outs, but based on Collins' testimony, these 80 billion tokens may still be used to train AI within Google's search organization

2

.

Impact on Publishers

This revelation has raised concerns about revenue loss for publishers. As Google summarizes answers to search queries using AI at the top of results, users may not click through to independent websites, potentially hurting publishers' ad revenue

4

. The irony is that Google is using data from these same sites to generate AI-powered answers.

Opt-Out Challenges

Google maintains that publishers can manage their content in Search via the robots.txt web standard

1

. However, opting out of being indexed for search is seen as a "death sentence" for websites, effectively leaving publishers with no real choice but to allow their content to be used for AI training

2

.

Antitrust Implications

The ongoing antitrust case aims to prove that Google has a monopoly in the search and AI space. The U.S. Department of Justice is urging the court to take measures such as forcing Google to sell its Chrome browser, share key search data, and restrict its ability to pay for default search engine status on devices and services

4

.

Future of AI Training and Competition

The trial has also revealed Google's exploration of using its vast search data to improve AI models. A document shown in court indicated that Google's CEO of DeepMind, Demis Hassabis, had considered training an AI model with search data, including rankings, to assess the improvement over models not trained with such data

4

.

Broader Implications for AI and Web Content

This case highlights the complex relationship between AI development, web content, and publisher rights. It raises questions about the future of AI training practices, the value of web content in the AI era, and the balance between technological advancement and fair competition in the digital landscape.

As the trial continues, the outcome could have significant implications for how tech giants like Google use web data for AI training and potentially reshape the landscape of search and AI technologies.

TheOutpost.ai

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

© 2025 Triveous Technologies Private Limited
Instagram logo
LinkedIn logo