Major Publishers Sue Meta and Zuckerberg Over Alleged Copyright Infringement in AI Training

Reviewed byNidhi Govil

22 Sources

Share

Five major publishers including Elsevier, Hachette, and Macmillan have filed a class-action lawsuit against Meta and CEO Mark Zuckerberg, alleging the company illegally used millions of copyrighted books and research papers to train its Llama AI models. The case marks the first AI copyright lawsuit brought by major scientific publishers and could reshape the legal landscape of AI development.

Publishers Sue Meta in Landmark Copyright Case

Meta and CEO Mark Zuckerberg face a class-action lawsuit against Meta filed on May 5 in the Southern District of New York by five major publishers and bestselling author Scott Turow

1

. The plaintiffs—Elsevier, Hachette, Macmillan, McGraw Hill, and Cengage—allege that Meta engaged in "one of the most massive infringements of copyrighted materials in history" by using their works without permission to train Llama AI models

2

. This marks the first time major scientific publishers have taken legal action against an AI company over copyright infringement, according to the Association of American Publishers

1

.

Source: Nature

Source: Nature

The lawsuit alleges Meta pirated millions of copyrighted works ranging from scientific journal articles published in Cell and The Lancet to textbooks and novels including "The Fifth Season" by N.K. Jemisin

3

. According to the complaint, Meta accessed scraped research papers and other copyrighted material for AI training through multiple sources, including the Common Crawl data set and file-sharing sites like LibGen and Sci-Hub

1

. Evidence presented includes internal emails between Meta employees revealed during a previous case, Kadrey v. Meta

1

.

Zuckerberg Personally Named in Allegations

Source: The Hill

Source: The Hill

The lawsuit specifically names Mark Zuckerberg as a defendant, claiming he "personally authorized and actively encouraged" the alleged copyright infringement

2

. The filing alleges that Zuckerberg instructed the company to abandon licensing negotiations with publishers and deliberately stripped works of attribution data to conceal training sources

4

. The publishers argue that Meta's generative AI platform functions as "an infinite substitution machine," producing imitation versions of original works that displace human-authored content in the marketplace

4

.

Scott Turow, former Authors Guild President and plaintiff in the case, expressed his frustration: "I find it distressing and infuriating that one of the top-10 richest corporations in the world knowingly used pirated sources of my books, and thousands of other authors, to train Llama"

2

. The plaintiffs seek unspecified monetary damages and aim to represent a broader class of copyright owners

3

.

Meta Defends Fair Use Doctrine

Meta has vowed to "fight this lawsuit aggressively," arguing that training AI on copyrighted material qualifies as fair use under US copyright law

1

. The company maintains that "AI is powering transformative innovations, productivity and creativity for individuals and companies, and courts have rightly found that training AI on copyrighted material can qualify as fair use"

5

. This defense mirrors Meta's successful 2025 ruling where a judge found insufficient evidence that Llama AI models would harm the market for human-created content

4

.

Source: FT

Source: FT

However, the legal landscape of AI remains unsettled. While US courts have generally backed claims that large language model use of copyrighted material is "transformative," two landmark 2025 rulings warned that acquiring and storing pirated content can constitute infringement, particularly if copyright holders demonstrate substantial market harm

1

. US District Court Judge Vince Chhabria noted in a previous Meta ruling that "the market for the typical human-created romance or spy novel could be diminished substantially by the proliferation of similar AI-created works"

2

.

Implications for AI Development and Research Access

Academic texts represent valuable training resources for Llama AI models because they contain high-quality, human-written information that can boost accuracy on scientific topics

1

. Repositories like PubMed are commonly used to build specialized training data sets for scientific domains

1

. Yet the proliferation of licensing deals between publishers and tech companies raises concerns about access for researchers building open-source models, potentially strengthening big AI firms while limiting broader innovation

1

.

The case joins dozens of similar lawsuits against AI companies, with Anthropic recently settling author claims for $1.5 billion, or approximately $3,000 per pirated work

2

. The plaintiffs point to AI-generated books already flooding Amazon's marketplace as evidence of market displacement

4

. As this case unfolds, it will test whether courts continue to favor tech companies' fair use arguments or shift toward protecting creators' economic interests in the rapidly evolving AI ecosystem.

Today's Top Stories

TheOutpost.ai

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

Instagram logo
LinkedIn logo
Youtube logo
© 2026 TheOutpost.AI All rights reserved