Meta Faces Legal Scrutiny Over Alleged Copyright Infringement in AI Training

Meta's AI Training Practices Under Legal Scrutiny

Meta, the parent company of Facebook and Instagram, is facing a significant legal challenge over its artificial intelligence (AI) training practices. A group of authors, including Richard Kadrey, Sarah Silverman, and Christopher Golden, have filed a lawsuit accusing Meta of copyright infringement by using their copyrighted works to train its Llama AI model without permission 2

The Allegations: Torrenting and Copyright Infringement

The plaintiffs allege that Meta resorted to torrenting terabytes of pirated book data to train its AI models after attempts to download pirated books individually proved too slow and strained Meta's networks 1

. This decision, they argue, was made with full awareness of the legal risks involved, as torrenting has long been associated with copyright infringement 4

The authors claim that Meta's actions constitute clear copyright infringement, stating, "Whatever the merits of generative artificial intelligence, or GenAI, stealing copyrighted works off the Internet for one's own benefit has always been unlawful" 1

. They further allege that Meta attempted to conceal its torrenting activities by using Amazon Web Services rather than its own infrastructure 4

The "Bob Dylan Defense" and Legal Arguments

The plaintiffs mockingly refer to Meta's stance as the "Bob Dylan defense," citing lyrics from Dylan's "Sweetheart Like You": "Steal a little and they throw you in jail / Steal a lot and they make you king" 1

. They argue that Meta's use of peer-to-peer (P2P) file sharing to obtain copyrighted material cannot be considered fair use 1

Meta has admitted to using the Books3 dataset, which contains 195,000 copyrighted books, to train its Llama 1 large language model 3

. However, the company maintains that its actions fall under fair use doctrine 4

Potential Implications for AI Development and Copyright Law

The outcome of this case could have far-reaching implications for AI development and copyright law. If the court rules in favor of Meta, it could set a precedent allowing AI developers to use copyrighted material for training without compensation to intellectual property owners 4

. Conversely, a ruling in favor of the authors could strengthen similar cases and potentially lead to copyright reform 4

Legal Proceedings and Judge's Considerations

Judge Vince Chhabria, who is presiding over the case, has allowed it to proceed, stating that "Copyright infringement is obviously a concrete injury sufficient for standing" 2

. However, he has also expressed some unfamiliarity with torrenting terminology, which may influence how the case proceeds 4

The authors have filed for a partial summary judgment, arguing that Meta's use of torrenting leaves no room for legal ambiguity 4

. Judge Chhabria is scheduled to evaluate these claims in a hearing on May 1, considering whether ruling on the summary judgment at this stage might be unfair to Meta 1

Broader Context and Similar Cases

This case is part of a larger trend of legal challenges facing AI companies over copyright issues. The New York Times has sued OpenAI and Microsoft, News Corp. has sued Perplexity, and several Canadian news organizations have sued OpenAI 2

. A recent ruling in favor of Thomson Reuters against Ross Intelligence has already suggested that indiscriminate ingestion of copyrighted material for AI training may have financial consequences 3

As the legal battle unfolds, the tech industry and copyright holders alike are closely watching this case, which could significantly shape the future landscape of AI development and intellectual property rights in the digital age.