Meta Faces Legal Challenges Over Alleged Use of Pirated Books for AI Training

Meta's AI Training Practices Under Scrutiny

Meta, the parent company of Facebook, is facing a class-action lawsuit over allegations that it used pirated books to train its artificial intelligence models, including the popular Llama series. Court filings and internal communications have revealed that Meta allegedly downloaded and used vast amounts of copyrighted material without proper authorization 1

Alleged Copyright Infringement

According to the lawsuit, Meta is accused of downloading nearly 82 terabytes of pirated books from shadow libraries such as Anna's Archive, Z-Library, and LibGen 2

. The plaintiffs, including bestselling authors Sarah Silverman and Ta-Nehisi Coates, allege that Meta infringed upon their copyrights and potentially harmed their livelihoods 3

Internal Concerns and Concealment Attempts

Unsealed court documents reveal that some Meta employees raised ethical concerns as early as 2022. One researcher explicitly stated, "I don't think we should use pirated material," while another employee commented that "torrenting from a corporate laptop doesn't feel right" 4

Despite these internal warnings, Meta allegedly took steps to conceal its activities. Employees discussed ways to prevent Meta's infrastructure from being directly linked to the downloads, including using servers outside of Facebook's main network in what was referred to as "stealth mode" 5

Meta's Response and Legal Defense

Meta has defended its practices by invoking the "fair use" doctrine, asserting that using publicly available materials to train AI tools is legal in certain cases. The company argues that it uses text to statistically model language and generate original expression 3

Broader Implications for AI Industry

This case is part of a larger trend of legal challenges against tech companies developing AI technologies. OpenAI and Nvidia have also faced similar accusations regarding their use of copyrighted materials for AI training 2

Ongoing Legal Proceedings

U.S. District Judge Vince Chhabria has dismissed some claims but allowed the authors to amend their complaint to include new allegations, including those related to the removal of copyright management information 3

. The outcome of this lawsuit could have significant implications for the tech industry, particularly concerning the use of copyrighted materials in AI training.

Meta Faces Legal Challenges Over Alleged Use of Pirated Books for AI Training

Meta's AI Training Practices Under Scrutiny

Alleged Copyright Infringement

Internal Concerns and Concealment Attempts

Meta's Response and Legal Defense

Broader Implications for AI Industry

Ongoing Legal Proceedings

References

Court filings show Meta paused efforts to license books for AI training | TechCrunch

Meta purportedly trained its AI on more than 80TB of pirated content and then open-sourced Llama for the greater good

Meta faces lawsuit for training AI with pirated books

Meta's Llama AI in hot water: Alleged copyright theft leads to class action lawsuit, accused of pirating 82TB of books for AI training

Meta used pirated books to train its AI models, and there are emails to prove it

Related Stories

Meta Faces Legal Scrutiny Over Alleged Copyright Infringement in AI Training

Meta claims BitTorrent piracy is fair use for AI training in bold legal defense

Zuckerberg's YouTube Defense in Meta's AI Copyright Battle Sparks Debate

Recent Highlights

AI chatbots validate you too much, making you less kind to others, Stanford study reveals

Anthropic's Claude Code Source Leak Reveals Hidden AI Agent Plans and Extensive System Access

Judge blocks Pentagon from branding Anthropic a security risk over AI safety guardrails dispute

Recent Highlights

Today's Top Stories

AI Models Lie, Cheat, and Steal to Protect Other Models From Being Deleted

Google launches Gemma 4 open AI models with Apache 2.0 license, unlocking local deployment

Google Vids now lets you direct AI avatars through prompts and integrates Veo 3.1 for video creation

Palantir says militaries decide how AI targeting is used as Maven processes 11,000+ Iran strikes