Meta Sued for Copyright Infringement in AI Training

Publishers Sue Meta in Landmark Copyright Case

Meta and CEO Mark Zuckerberg face a class-action lawsuit against Meta filed on May 5 in the Southern District of New York by five major publishers and bestselling author Scott Turow 1

. The plaintiffs—Elsevier, Hachette, Macmillan, McGraw Hill, and Cengage—allege that Meta engaged in "one of the most massive infringements of copyrighted materials in history" by using their works without permission to train Llama AI models 2

. This marks the first time major scientific publishers have taken legal action against an AI company over copyright infringement, according to the Association of American Publishers 1

Source: Nature

The lawsuit alleges Meta pirated millions of copyrighted works ranging from scientific journal articles published in Cell and The Lancet to textbooks and novels including "The Fifth Season" by N.K. Jemisin 3

. According to the complaint, Meta accessed scraped research papers and other copyrighted material for AI training through multiple sources, including the Common Crawl data set and file-sharing sites like LibGen and Sci-Hub 1

. Evidence presented includes internal emails between Meta employees revealed during a previous case, Kadrey v. Meta 1

Zuckerberg Personally Named in Allegations

Source: The Hill

The lawsuit specifically names Mark Zuckerberg as a defendant, claiming he "personally authorized and actively encouraged" the alleged copyright infringement 2

. The filing alleges that Zuckerberg instructed the company to abandon licensing negotiations with publishers and deliberately stripped works of attribution data to conceal training sources 4

. The publishers argue that Meta's generative AI platform functions as "an infinite substitution machine," producing imitation versions of original works that displace human-authored content in the marketplace 4

Scott Turow, former Authors Guild President and plaintiff in the case, expressed his frustration: "I find it distressing and infuriating that one of the top-10 richest corporations in the world knowingly used pirated sources of my books, and thousands of other authors, to train Llama" 2

. The plaintiffs seek unspecified monetary damages and aim to represent a broader class of copyright owners 3

Meta Defends Fair Use Doctrine

Meta has vowed to "fight this lawsuit aggressively," arguing that training AI on copyrighted material qualifies as fair use under US copyright law 1

. The company maintains that "AI is powering transformative innovations, productivity and creativity for individuals and companies, and courts have rightly found that training AI on copyrighted material can qualify as fair use" 5

. This defense mirrors Meta's successful 2025 ruling where a judge found insufficient evidence that Llama AI models would harm the market for human-created content 4

Source: FT

However, the legal landscape of AI remains unsettled. While US courts have generally backed claims that large language model use of copyrighted material is "transformative," two landmark 2025 rulings warned that acquiring and storing pirated content can constitute infringement, particularly if copyright holders demonstrate substantial market harm 1

. US District Court Judge Vince Chhabria noted in a previous Meta ruling that "the market for the typical human-created romance or spy novel could be diminished substantially by the proliferation of similar AI-created works" 2

Implications for AI Development and Research Access

Academic texts represent valuable training resources for Llama AI models because they contain high-quality, human-written information that can boost accuracy on scientific topics 1

. Repositories like PubMed are commonly used to build specialized training data sets for scientific domains 1

. Yet the proliferation of licensing deals between publishers and tech companies raises concerns about access for researchers building open-source models, potentially strengthening big AI firms while limiting broader innovation 1

The case joins dozens of similar lawsuits against AI companies, with Anthropic recently settling author claims for $1.5 billion, or approximately $3,000 per pirated work 2

. The plaintiffs point to AI-generated books already flooding Amazon's marketplace as evidence of market displacement 4

. As this case unfolds, it will test whether courts continue to favor tech companies' fair use arguments or shift toward protecting creators' economic interests in the rapidly evolving AI ecosystem.

Major Publishers Sue Meta and Zuckerberg Over Alleged Copyright Infringement in AI Training

Publishers Sue Meta in Landmark Copyright Case

Zuckerberg Personally Named in Allegations

Meta Defends Fair Use Doctrine

Implications for AI Development and Research Access

References

Elsevier vs. Meta: first science publisher sues over scraped research papers

Even More Authors, Publishers Sue Meta Over Copyright in AI Training: What's Different Now

Major publishers sue Meta for copyright infringement over AI training

Meta and Zuckerberg sued by publishers over 'massive' copyright infringement

Book publishers accuse Meta and Mark Zuckerberg of copyright infringement - Engadget

Related Stories

Meta Faces Legal Challenges Over Alleged Use of Pirated Books for AI Training

Meta Faces Legal Scrutiny Over Alleged Copyright Infringement in AI Training

Meta's Alleged Use of Pirated Books for AI Training Sparks Legal Debate on Fair Use

Recent Highlights

OpenAI releases GPT-5.6 models after government review, unveils ChatGPT Work to compete in AI agent race

Apple sues OpenAI over alleged trade secrets theft as 400+ former employees caught in scandal

SK Hynix raises $26.5B in largest foreign US IPO as AI boom fuels memory chip demand

Recent Highlights

Today's Top Stories

200+ Economists Warn AI Economic Impact Could Dwarf Industrial Revolution in Just Years

Waze integrates Google Gemini AI with personalized navigation and motorcycle-focused updates

Satya Nadella warns companies using AI are paying twice: once in cash, once in secrets

Samsung Health forces users to choose: consent to AI training or lose your health data