Major publishers join Google AI lawsuit alleging massive copyright infringement in Gemini training

4 Sources

Share

Hachette Book Group and Cengage Group filed a motion to intervene in an existing class-action lawsuit against Google, accusing the tech giant of orchestrating historic copyright infringement to build its Gemini AI platform. The complaint alleges Google scraped books from piracy-linked websites rather than obtaining proper licenses, with the C4 training dataset containing content from at least 28 sites identified by the U.S. government as markets for piracy.

Publishers Join Lawsuit Against Google Over AI Training Practices

Hachette Book Group and Cengage Group filed a motion Thursday in California federal court seeking permission to intervene in an existing class-action lawsuit against Google, alleging the tech company "engaged in one of the most prolific infringements of copyrighted materials in history" to build its AI capabilities

1

. The publishers claim Google copied content from Hachette books and Cengage textbooks without permission to train its Gemini large language model, escalating a legal battle that could significantly increase potential damages at stake

3

.

Source: Decrypt

Source: Decrypt

Maria Pallante, CEO of the Association of American Publishers, stated, "We believe our participation will bolster the case, especially because publishers are uniquely positioned to address many of the legal, factual, and evidentiary questions before the Court"

1

. The motion to intervene in this Google AI lawsuit marks a critical expansion of copyright infringement claims originally filed in 2023 by visual artists and individual authors.

Evidence of Training AI with Copyrighted Books from Piracy Sites

The complaint alleges Google deliberately chose to steal content rather than obtain proper licenses, engaging in copyright infringement "at every stage" of development

2

. According to the publishers, Google downloaded books from pirate sites and repeatedly copied them during AI training—first into computer memory, then into formats AI systems could read, and again into training sets for each new model version.

Source: ET

Source: ET

Google's C4 training dataset allegedly contains copyrighted works scraped from Z-Library, a pirate collection from which authorities have seized more than 350 websites and web domains

2

. The dataset pulls from at least 28 piracy-linked websites identified by the U.S. government as markets for piracy and counterfeits. Books were copied from b-ok.org, a Z-Library domain now displaying a federal seizure notice, along with OceanofPDF and WeLib, "another prolific site with access to troves of unauthorized copyrighted content"

2

.

The copyright symbol (©) appears more than 200 million times in the C4 dataset, the complaint notes, while Google allegedly excluded "policy notices" and "terms of use" warnings but included "vast categories of copyrighted works, pirated works, and works taken from behind paywalls"

2

. The publishers also allege Google copied works from subscription-based libraries like Scribd.com, circumventing legitimate licensing agreements.

Specific Examples and Damages Sought in Historic Copyright Infringement Case

The publishers cited 10 examples of their textbooks and other books that Google allegedly misused from authors, including Scott Turow and N.K. Jemisin, to train its Gemini platform

1

. They seek statutory damages, injunctions to halt further infringement, and an order requiring Google to destroy all unauthorized copies of their works and disclose which books were used to train Gemini

2

. The publishers requested an unspecified amount of monetary damages on behalf of themselves and a larger class of authors and publishers

4

.

The lawsuit alleges Gemini now produces outputs that "substitute for copyrighted works," including verbatim reproductions, detailed summaries, and "knockoffs that copy creative elements of original works"

2

. U.S. District Judge Eumi Lee will decide whether to approve the publishers' request to join the case

3

.

Broader Implications for Intellectual Property Law and AI Development

This case represents one of many high-stakes lawsuits brought by artists, authors, music labels, and other copyright owners against tech companies over their AI training practices

1

. Anthropic settled a lawsuit for $1.5 billion last year with a group of authors over its use of their work to train its AI chatbot Claude, signaling the substantial financial exposure tech companies face in these intellectual property law disputes

3

.

Recent federal court rulings have delivered partial victories to Meta and Anthropic, with judges ruling that their use of copyrighted books to train models constituted fair use under copyright law, though courts criticized the companies for maintaining permanent libraries of pirated books

2

. The outcome of this expanded class-action lawsuit against Google could establish critical precedents for how AI companies must handle copyrighted material, potentially forcing fundamental changes to AI training methodologies and licensing practices across the industry. Google spokespeople did not immediately respond to requests for comment on the publishers' bid

1

.

Today's Top Stories

TheOutpost.ai

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

© 2026 Triveous Technologies Private Limited
Instagram logo
LinkedIn logo