4 Sources
4 Sources
[1]
Publishers seek to join lawsuit against Google over AI training
Jan 15 (Reuters) - Publishers Hachette Book Group and Cengage Group (CNGO.PK), opens new tab asked a California federal court on Thursday for permission to intervene in a proposed class action lawsuit against Google (GOOGL.O), opens new tab over the alleged misuse of copyrighted material used to train its artificial intelligence systems. The publishers said in their proposed complaint, opens new tab that the tech company "engaged in one of the most prolific infringements of copyrighted materials in history" to build its AI capabilities, copying content from Hachette books and Cengage textbooks without permission. Spokespeople for Google did not immediately respond to a request for comment on the publishers' bid, which could increase the potential damages at stake in the case. "We believe our participation will bolster the case, especially because publishers are uniquely positioned to address many of the legal, factual, and evidentiary questions before the Court," Maria Pallante, CEO of the publishers' trade group the Association of American Publishers, said in a statement. The lawsuit currently involves a group of visual artists who sued Google for allegedly misusing their work to train an AI-powered image generator. The case is one of many high-stakes lawsuits brought by artists, authors, music labels and other copyright owners against tech companies over their AI training. Anthropic settled a lawsuit, opens new tab for $1.5 billion last year with a group of authors suing over its use of their work to train its AI chatbot Claude. The publishers on Thursday cited 10 examples of their textbooks and other books that Google allegedly misused from authors, including Scott Turow and N.K. Jemisin to train its Gemini large language model. They asked the court for an unspecified amount of monetary damages on behalf of themselves and a larger class of authors and publishers. U.S. District Judge Eumi Lee will decide whether to approve the publishers' request to join the case. Reporting by Blake Brittain in Washington Editing by Rod Nickel Our Standards: The Thomson Reuters Trust Principles., opens new tab Blake Brittain Thomson Reuters Blake Brittain reports on intellectual property law, including patents, trademarks, copyrights and trade secrets, for Reuters Legal. He has previously written for Bloomberg Law and Thomson Reuters Practical Law and practiced as an attorney.
[2]
Book Publishers Seek Entry Into Google AI Copyright Fight - Decrypt
Google's C4 training dataset allegedly pulls from at least 28 piracy-linked websites, with the copyright symbol appearing more than 200 million times. Major book publishers Hachette Book Group and Cengage Group filed a motion Thursday to intervene in an existing class action lawsuit filed last year against Google, accusing the tech giant of orchestrating "historic copyright infringement" to build its Gemini platform. The complaint filed in California federal court alleges Google "chose to steal a massive body of content from Plaintiffs and the Class to train its AI model" rather than obtain proper licenses, engaging in deliberate infringement "at every stage" of development. The consolidated case was originally filed in 2023 by individual authors as a proposed copyright class action accusing Google of copying books to train its generative AI models. The publishers claim Google downloaded books from pirate sites and then repeatedly copied them during the AI training process, first into computer memory, then into formats the AI systems could read, and again into training sets for each new model version. Google's C4 training dataset contains copyrighted works scraped from Z-Library, a pirate collection from which authorities have seized more than 350 websites and web domains, the lawsuit alleges. The publishers noted how books were copied from b-ok.org, a Z-Library domain now displaying a federal seizure notice, along with OceanofPDF and WeLib, "another prolific site with access to troves of unauthorized copyrighted content." The C4 dataset contains works from at least 28 sites identified by the U.S. government as markets for piracy and counterfeits, the complaint notes. "The copyright symbol (©) appears more than 200 million times in the C4 dataset," the complaint reads, noting Google allegedly excluded "policy notices" and "terms of use" warnings but included "vast categories of copyrighted works, pirated works, and works taken from behind paywalls." The publishers allege that Google copied works from subscription-based libraries like Scribd.com, circumventing legitimate licensing agreements. When confronted about this practice, nonprofit dataset provider Common Crawl allegedly responded with "a blame the victim mentality, proclaiming 'You shouldn't have put your content on the internet if you didn't want it to be on the internet.'" The lawsuit alleges Gemini now produces outputs that "substitute for copyrighted works," including verbatim reproductions, detailed summaries, and "knockoffs that copy creative elements of original works." Decrypt has reached out to Google and the publishers' counsel. Google is simultaneously defending against antitrust claims from Penske Media Corporation over its AI Overviews feature, with the tech giant claiming that displaying AI-generated summaries constitutes "lawful product improvement rather than anti-competitive behavior." The publishers seek statutory damages, injunctions to halt further infringement, and an order requiring Google to destroy all unauthorized copies of their works and disclose which books were used to train Gemini. The motion to intervene follows a series of copyright lawsuits that authors filed against AI companies in 2023, with federal judges delivering partial victories to Meta and Anthropic, ruling that their use of copyrighted books to train their models constituted fair use under copyright law, but criticized the companies for maintaining permanent libraries of pirated books.
[3]
Publishers seek to join lawsuit against Google over AI training - The Economic Times
The publishers said in their proposed complaint that the tech company "engaged in one of the most prolific infringements of copyrighted materials in history" to build its AI capabilities, copying content from Hachette books and Cengage textbooks without permission.Publishers Hachette Book Group and Cengage Group asked a California federal court on Thursday for permission to intervene in a proposed class action lawsuit against Google over the alleged misuse of copyrighted material used to train its artificial intelligence systems. The publishers said in their proposed complaint that the tech company "engaged in one of the most prolific infringements of copyrighted materials in history" to build its AI capabilities, copying content from Hachette books and Cengage textbooks without permission. Spokespeople for Google did not immediately respond to a request for comment on the publishers' bid, which could increase the potential damages at stake in the case. "We believe our participation will bolster the case, especially because publishers are uniquely positioned to address many of the legal, factual, and evidentiary questions before the Court," Maria Pallante, CEO of the publishers' trade group the Association of American Publishers, said in a statement. The lawsuit currently involves groups of visual artists and authors who sued Google for allegedly misusing their work to train its generative AI systems. The case is one of many high-stakes lawsuits brought by artists, authors, music labels and other copyright owners against tech companies over their AI training. Anthropic settled a lawsuit for $1.5 billion last year with a group of authors suing over its use of their work to train its AI chatbot Claude. The publishers on Thursday cited 10 examples of their textbooks and other books that Google allegedly misused from authors, including Scott Turow and N.K. Jemisin to train its Gemini large language model. They asked the court for an unspecified amount of monetary damages on behalf of themselves and a larger class of authors and publishers. U.S. District Judge Eumi Lee will decide whether to approve the publishers' request to join the case.
[4]
Publishers seek to join lawsuit against Google over AI training
Jan 15 (Reuters) - Publishers Hachette Book Group and Cengage Group asked a California federal court on Thursday for permission to intervene in a proposed class action lawsuit against Google over the alleged misuse of copyrighted material used to train its artificial intelligence systems. The publishers said in their proposed complaint that the tech company "engaged in one of the most prolific infringements of copyrighted materials in history" to build its AI capabilities, copying content from Hachette books and Cengage textbooks without permission. Spokespeople for Google did not immediately respond to a request for comment on the publishers' bid, which could increase the potential damages at stake in the case. "We believe our participation will bolster the case, especially because publishers are uniquely positioned to address many of the legal, factual, and evidentiary questions before the Court," Maria Pallante, CEO of the publishers' trade group the Association of American Publishers, said in a statement. The lawsuit currently involves a group of visual artists who sued Google for allegedly misusing their work to train an AI-powered image generator. The case is one of many high-stakes lawsuits brought by artists, authors, music labels and other copyright owners against tech companies over their AI training. Anthropic settled a lawsuit for $1.5 billion last year with a group of authors suing over its use of their work to train its AI chatbot Claude. The publishers on Thursday cited 10 examples of their textbooks and other books that Google allegedly misused from authors, including Scott Turow and N.K. Jemisin to train its Gemini large language model. They asked the court for an unspecified amount of monetary damages on behalf of themselves and a larger class of authors and publishers. U.S. District Judge Eumi Lee will decide whether to approve the publishers' request to join the case. (Reporting by Blake Brittain in WashingtonEditing by Rod Nickel)
Share
Share
Copy Link
Hachette Book Group and Cengage Group filed a motion to intervene in an existing class-action lawsuit against Google, accusing the tech giant of orchestrating historic copyright infringement to build its Gemini AI platform. The complaint alleges Google scraped books from piracy-linked websites rather than obtaining proper licenses, with the C4 training dataset containing content from at least 28 sites identified by the U.S. government as markets for piracy.
Hachette Book Group and Cengage Group filed a motion Thursday in California federal court seeking permission to intervene in an existing class-action lawsuit against Google, alleging the tech company "engaged in one of the most prolific infringements of copyrighted materials in history" to build its AI capabilities
1
. The publishers claim Google copied content from Hachette books and Cengage textbooks without permission to train its Gemini large language model, escalating a legal battle that could significantly increase potential damages at stake3
.
Source: Decrypt
Maria Pallante, CEO of the Association of American Publishers, stated, "We believe our participation will bolster the case, especially because publishers are uniquely positioned to address many of the legal, factual, and evidentiary questions before the Court"
1
. The motion to intervene in this Google AI lawsuit marks a critical expansion of copyright infringement claims originally filed in 2023 by visual artists and individual authors.The complaint alleges Google deliberately chose to steal content rather than obtain proper licenses, engaging in copyright infringement "at every stage" of development
2
. According to the publishers, Google downloaded books from pirate sites and repeatedly copied them during AI training—first into computer memory, then into formats AI systems could read, and again into training sets for each new model version.
Source: ET
Google's C4 training dataset allegedly contains copyrighted works scraped from Z-Library, a pirate collection from which authorities have seized more than 350 websites and web domains
2
. The dataset pulls from at least 28 piracy-linked websites identified by the U.S. government as markets for piracy and counterfeits. Books were copied from b-ok.org, a Z-Library domain now displaying a federal seizure notice, along with OceanofPDF and WeLib, "another prolific site with access to troves of unauthorized copyrighted content"2
.The copyright symbol (©) appears more than 200 million times in the C4 dataset, the complaint notes, while Google allegedly excluded "policy notices" and "terms of use" warnings but included "vast categories of copyrighted works, pirated works, and works taken from behind paywalls"
2
. The publishers also allege Google copied works from subscription-based libraries like Scribd.com, circumventing legitimate licensing agreements.The publishers cited 10 examples of their textbooks and other books that Google allegedly misused from authors, including Scott Turow and N.K. Jemisin, to train its Gemini platform
1
. They seek statutory damages, injunctions to halt further infringement, and an order requiring Google to destroy all unauthorized copies of their works and disclose which books were used to train Gemini2
. The publishers requested an unspecified amount of monetary damages on behalf of themselves and a larger class of authors and publishers4
.The lawsuit alleges Gemini now produces outputs that "substitute for copyrighted works," including verbatim reproductions, detailed summaries, and "knockoffs that copy creative elements of original works"
2
. U.S. District Judge Eumi Lee will decide whether to approve the publishers' request to join the case3
.Related Stories
This case represents one of many high-stakes lawsuits brought by artists, authors, music labels, and other copyright owners against tech companies over their AI training practices
1
. Anthropic settled a lawsuit for $1.5 billion last year with a group of authors over its use of their work to train its AI chatbot Claude, signaling the substantial financial exposure tech companies face in these intellectual property law disputes3
.Recent federal court rulings have delivered partial victories to Meta and Anthropic, with judges ruling that their use of copyrighted books to train models constituted fair use under copyright law, though courts criticized the companies for maintaining permanent libraries of pirated books
2
. The outcome of this expanded class-action lawsuit against Google could establish critical precedents for how AI companies must handle copyrighted material, potentially forcing fundamental changes to AI training methodologies and licensing practices across the industry. Google spokespeople did not immediately respond to requests for comment on the publishers' bid1
.Summarized by
Navi
[4]
23 Dec 2025•Policy and Regulation

18 Jul 2025•Policy and Regulation

14 Sept 2025•Technology

1
Policy and Regulation

2
Technology

3
Technology
