Curated by THEOUTPOST
On Tue, 11 Mar, 12:04 AM UTC
6 Sources
[1]
Meta mocked for raising "Bob Dylan defense" of torrenting in AI copyright fight
Authors think that Meta's admitted torrenting of a pirated books data set used to train its AI models is evidence enough to win their copyright fight -- which previously hinged on a court ruling that AI training on copyrighted works isn't fair use. Moving for summary judgment on a direct copyright infringement claim on Monday in a US district court in California, the authors alleged that "whatever the merits of generative artificial intelligence, or GenAI, stealing copyrighted works off the Internet for one's own benefit has always been unlawful." In their filing, the authors accused Meta of brazenly deciding to torrent terabytes of pirated book data after attempts to download pirated books one by one "posed an immense strain on Meta's networks and proceeded very slowly." Knowing that such activity has been deemed infringing for more than two decades, the authors alleged, Meta took a risk, seemingly hoping to evade detection while struggling to catch up in the AI race and needing speedier access to large chunks of data. To cover its tracks, the social media company allegedly deviated from usual practices and attempted to conceal the torrenting by using Amazon Web Services. "In most cases, and in this case too, users who download via torrent also upload the same file they are downloading to reap the benefits of faster file sharing," the authors alleged. In February, authors argued that Meta's torrenting of the pirated books was infringing, even if Meta limited seeding when the downloads were completed, as the company claims it does. They explained that Meta's leeching during the download process (allowing other users to download partial files before the download was completed) is allegedly evidence enough that Meta shared pirated books with others. "There is no genuine dispute that Meta made widely available and even reuploaded to other online pirates at least some quantity of the pirated data as part of the peer-to-peer (P2P) sharing process," the authors alleged. "Meta's response in this case seems to be that a powerful technology corporation should not be held to the same standard as everyone else for illegal conduct." The authors mocked Meta for raising what they call "the Bob Dylan defense" of its torrenting, citing song lyrics from "Sweetheart Like You" that say, "Steal a little and they throw you in jail / Steal a lot and they make you king." Meta opposes requests for leeching evidence Meta does not want the court to weigh these leeching claims. Last week, Meta argued that authors should not be allowed to do more discovery on Meta's alleged leeching or introduce a new expert to potentially discuss why the leeching may have clinched the case for the authors. While resisting introducing new evidence on leeching, Meta simultaneously argued that the authors' motion for summary judgment based on the leeching theory is inappropriate because Meta has not had a chance to defend against the claims. "They intend to move for summary judgment on torrenting issues, presumably in reliance on this new theory in a new expert report from a new expert, to which Meta has not had an opportunity to investigate or respond," Meta's letter said. On May 1, Judge Vince Chhabria will weigh these arguments at a hearing where Meta will get a chance to respond to the leeching claims. Last week, Chhabria wrote in an order that consideration will be given to whether "it would be unfair to Meta" to rule on the summary judgment at this stage. The authors, however, think that torrenting pirated works is so notoriously illegal that they now have an "open-and-shut case" of copyright infringement. "Meta's reproduction of Plaintiffs' Copyrighted Books without permission, including through peer-to-peer file sharing, is not fair use," the authors alleged, citing a major court ruling against Napster and insisting that "Meta infringed each of their copyrights, full stop." Chhabria may be curious to learn more about leeching, though. Last month, he admitted at a hearing that the term was foreign to him, Meta's letter said in a footnote. "I don't remember hearing it before," Chhabria said. The authors are hoping to make Meta pay after Meta allegedly shirked offers to license their data for a fee. "Meta plainly attributed significant value to the copyrighted works it took for free: a windfall to Meta, but not for authors, who were paid nothing," the authors alleged. Further, "Whether another user actually downloaded the content that Meta made available" through torrenting "is irrelevant," the authors alleged. "Meta 'reproduced' the works as soon as it made them available to other peers." Meta resists request to depose Zuckerberg The authors want Chhabria to agree that Meta's alleged leeching is key to winning their case. Their filing even pointed out that Meta's pirating included copies of books written by at least 10 Supreme Court justices, seemingly hoping the judge will see that Meta's activity harms more than just authors. To further their case, the authors had asked for additional discovery requiring Meta to provide written answers about their torrenting and leeching. They also sought to depose both Meta employees who previously testified, including Mark Zuckerberg, as well as those whose roles in Meta's torrenting, they suggested, were only recently clarified in unsealed emails. "That Meta knew taking copyrighted works from pirated databases could expose the company to enormous risk is beyond dispute: it triggered an escalation to Mark Zuckerberg and other Meta executives for approval," the authors argued. "Their gamble should not pay off." Meta said the authors' new discovery requests were "unnecessary, unwarranted, and infeasible." The company would only agree to allow six employees to be deposed ahead of the May hearing, including Nikolay Bashlykov, a software engineer who sent an internal message at Meta saying, "Torrenting from a corporate laptop doesn't feel right." However, the authors "have made no showing to justify additional deposition time with Meta's CEO Mr. Zuckerberg," Meta claimed, offering instead two senior-level employees "who can speak to executive decision-making." Piracy can never be fair use, authors say The authors claimed there are gaps in the court's understanding about Meta's torrenting, pointing out that Meta's expert failed to replicate the company's torrenting in her analysis, leaving it unclear "how much data Meta uploaded and/or seeded." Meta's expert also allegedly ignored that "BitTorrent's default configuration provides for continuous uploading during the 'leeching' phase -- simultaneous to downloading." Although the authors expect their leeching theory may be a winning one, they noted that fair-use findings typically come from juries, not from judges at the summary judgment stage. They also acknowledged that the court may decide "that the fair use analysis applies to Meta's unmitigated piracy and use of torrenting." But "it should nevertheless grant summary judgment under the four fair use factors regarding Meta's decision to make available to other P2P pirates millions of copyrighted books in exchange for faster download speed," they argued. Considering that Meta hasn't found a single case where a court determined downloading or uploading pirated works on P2P networks is fair use, the authors warned, "The use of piracy to further piracy can never be 'fair use.'"
[2]
Judge Allows Authors' AI Copyright Case Against Meta to Proceed
Meta's AI training practices are about to face legal scrutiny, as a judge has allowed a copyright infringement case against the company to proceed. The lawsuit, filed by authors Richard Kadrey and Christopher Golden and comedian Sarah Silverman in July 2023, accuses Meta of using material from their copyrighted books to train its Llama AI model. Other authors, including Ta-Nehisi Coates, joined the case a few months later. The plaintiffs claim that some of Llama's responses were pulled directly from their work without consent, enriching Meta in the process. They additionally claim that Meta removed copyright management information (CMI), such as ISBNs, copyright symbols, and disclaimers, to hide their infringement. As noted by TechCrunch, Meta has tried unsuccessfully to get the case dismissed. In his Friday ruling, Judge Vince Chhabria allowed the case to proceed, stating: "Copyright infringement is obviously a concrete injury sufficient for standing." He also said that there's a "reasonable, if not particularly strong, inference that Meta removed CMI to try to prevent Llama from outputting CMI and thus revealing that it was trained on copyrighted material." Judge Chhabria did dismiss one of the plaintiffs' claims, which cited the California Comprehensive Computer Data Access and Fraud Act (CDAFA), because the authors did not "allege that Meta accessed their computers or servers -- only their data." The ruling comes a month after Thomson Reuters secured a first-of-its-kind win in an AI copyright lawsuit. A judge dismissed Ross Intelligence's fair use claim since it affected the market value of Thomson Reuters' copyrighted material. Like Meta, multiple AI companies are facing lawsuits for copyright violations. The New York Times has filed a lawsuit against OpenAI and Microsoft; News Corp. has sued Perplexity; and several large Canadian news organizations have sued OpenAI.
[3]
Meta may have illegally removed copyright info in AI corpus
Facebook giant allegedly didn't want neural networks to emit results that would give the game away A judge has found Meta must answer a claim it allegedly removed so-called copyright management information from material used to train its AI models. The Friday ruling by Judge Vince Chhabria concerned the case Kadrey et al vs Meta Platforms, filed in July 2023 in a San Francisco federal court as a proposed class action by authors Richard Kadrey, Sarah Silverman, and Christopher Golden, who reckon the Instagram titan's use of their work to train its neural networks was illegal. Their case burbled along until January 2025 when the plaintiffs made the explosive allegation that Meta knew it used copyrighted material for training, and that its AI models would therefore produce results that included copyright management information (CMI) - the fancy term for things like the creator of a copyrighted work, its license and terms of use, its date of creation, and so on, that accompany copyrighted material. The miffed scribes alleged Meta therefore removed all of this copyright info from the works it used to train its models so users wouldn't be made aware the results they saw stemmed from copyrighted stuff. Judge Chhabria last week allowed the plaintiff's claim that Meta violated the US Digital Millennium Copyright Act (DMCA) by removing copyright notices from works used to train the Facebook giant's Llama family of models to continue. That decision makes it more likely the case will end in settlement or trial. "[The plaintiffs'] allegations raise a 'reasonable, if not particularly strong, inference' that Meta removed CMI to try to prevent Llama from outputting CMI and thus revealing that it was trained on copyrighted material," Judge Chhabria wrote in his order [PDF]. "This use of copyrighted material is clearly an identifiable (alleged) infringement." Meta has already admitted [PDF] it used a dataset named Books3 to train its Llama 1 large language model. The dataset has been found to include copyrighted works. The news isn't all bad for Meta because Judge Chhabria tossed one of the plaintiffs' claims - that Meta's Llama use of unlicensed books obtained from peer-to-peer torrents violated California's Comprehensive Computer Data Access & Fraud Act (CDAFA). Edward Lee, a professor of law at Santa Clara University, told The Register we should not infer anything about fair use based on the author's DMCA 1202(b)(1) claim about the scrubbed CMI. "At the hearing, Judge Chhabria also expressed some skepticism the plaintiffs would prove the DMCA [claim] and said it could be revisited on summary judgment," Lee said. "What it does show is that the plaintiffs' attorneys were able to find a more particularized factual basis for their DMCA claim, which had been dismissed earlier in the case." By allowing the CMI claim to advance, Chabria has delivered a second ruling that suggests the indiscriminate ingestion of copyrighted material to train AI models may have financial consequences. The first came last month when Thomson Reuters won a partial summary judgment against shuttered AI firm Ross Intelligence that prevents the defendant firm from avoiding liability by claiming fair use. Legal scholars have argued that AI inference - apps that produce outputs based on AI models - is more likely to be deemed copyright infringement because it's obvious when a model spits out an author's work verbatim. Inputting copyrighted material into models for training has been viewed as more likely to qualify for fair use defenses. However, the Thomson Reuters decision and the survival of the DMCA claim against Meta look likely to strengthen plaintiffs in other AI-related litigation. For example, Tremblay et al vs OpenAI et al was amended [PDF] last week. It seeks to revive its previously dismissed DMCA claim based on new but redacted evidence supporting allegations of CMI removal. Citing revelations that followed from discovery, the revised complaint argues, "As amended, the DMCA claim sufficiently alleges that OpenAI actually removed CMI for training its large language models."
[4]
Piracy lawsuit against Meta could set precedent for torrenting copyrighted works in AI training
A hot potato: Meta is embroiled in a ground-breaking AI lawsuit that could change how courts view copyright law. The case seems open-and-shut from the plaintiffs' view. However, if a judge sees otherwise, it could set a monumental precedent allowing corporations to pirate copyrighted material to train AI systems. In January 2024, a group of writers filed a lawsuit in California against Meta for using their works to train various versions of the Llama large language model. Meta openly admitted to using the Book3 dataset, a well-known 37GB compilation of 195,000 copyrighted books used by developers to train LLMs since 2020. The company defends its actions, citing the Fair Use doctrine. Earlier this year, the court unsealed documents Showing that Meta had used torrenting to gather its AI training data. On Monday, the authors filed for a partial summary judgment in a California U.S. District Court, arguing that Meta's alleged use of pirated data leaves no room for legal ambiguity. The plaintiffs claim Meta's use of torrenting to acquire copyrighted books for artificial intelligence training amounts to clear-cut copyright infringement. "Whatever the merits of generative artificial intelligence, or GenAI, stealing copyrighted works off the Internet for one's own benefit has always been unlawful," the authors stated in their filing. According to the unsealed documents, Meta initially attempted to download pirated books individually, but this process was too slow and placed excessive strain on its networks. The company then allegedly turned to torrenting - an infamous file-sharing method long associated with copyright infringement - to acquire terabytes of copyrighted books in bulk far beyond the scope of the Books3 dataset. Motion for Partial Summary Judgment via Ars Technica The authors claim that Meta was fully aware of the legal risks involved and took deliberate action to obscure its activities. The company allegedly ran the torrent client through Amazon Web Services rather than Meta's infrastructure - an action that is not standard practice for the social media giant. The heavily redacted motion, obtained by Ars Technica, points out that torrent users typically download (leech) and upload (seed) chunks of a file to allow faster downloads. Leeching and seeding are widely considered illegal if the files contain copyrighted material. Furthermore, by seeding a torrent, Meta may have actively facilitated piracy by distributing copyrighted books. The plaintiffs feel that a trial is no longer necessary and seek immediate judgment. The authors contend that the company's actions clearly violate copyright law, falling far outside Meta's fair-use defense. A decision in Meta's favor could set a dangerous precedent going far beyond books, allowing AI developers to infringe on copyrights without compensating the IP owners. "[The court] should nevertheless grant summary judgment under the four fair use factors regarding Meta's decision to make available to other P2P pirates millions of copyrighted books in exchange for faster download speed," the motion argues. While it seems like a relatively open-and-shut case, presiding judge Vince Chhabria admitted that he was unfamiliar with torrenting and related terminology like seeding and leeching. For this reason, Judge Chhabria may deny the motion for summary judgment, choosing to hear experts testify and explain the case so that he can make a fair and honest ruling. The final decision in the lawsuit will be ground-breaking no matter which way it goes. If Meta prevails, it opens the door for other AI developers to pirate books, images, or videos to train their models. If the authors win, it sets a precedence for similar cases, including those currently in the judicial system. It could also lead to further copyright reform akin to the Digital Millennium Copyright Act.
[5]
Did Meta used torrented books to train AI?
Meta is being accused of violating copyright laws through its admitted torrenting of a pirated books dataset utilized for training its AI models, according to a summary judgment filing in a US district court in California reported by ArsTechnica. In their legal filing, the authors argued that "whatever the merits of generative artificial intelligence, or GenAI, stealing copyrighted works off the Internet for one's own benefit has always been unlawful." They alleged that Meta decided to torrent terabytes of pirated book data, claiming that downloading pirated books individually "posed an immense strain on Meta's networks and proceeded very slowly." The authors accused Meta of being aware of the legal implications, stating that the company took a risk by hoping to evade detection while needing faster access to large datasets in the competitive AI landscape. The filing indicated that Meta attempted to conceal its torrenting activities by using Amazon Web Services. The authors contended that in most torrenting scenarios, users who download also upload the files to enable faster sharing. They asserted, "In this case too, users who download via torrent also upload the same file they are downloading." They further argued that even if Meta limited seeding after downloads, its leeching during the downloading process implies the sharing of pirated books. According to the authors, "There is no genuine dispute that Meta made widely available and even reuploaded to other online pirates at least some quantity of the pirated data as part of the peer-to-peer (P2P) sharing process." They criticized Meta's defense, stating that the company believed it should not be held to the same standards as others engaged in illegal activities. The authors referred to Meta's defense as the "Bob Dylan defense," citing lyrics from Dylan's "Sweetheart Like You" to illustrate their point: "Steal a little and they throw you in jail / Steal a lot and they make you king." Deep fakes are fooling millions: Meta's Mosseri sounds the alarm Meta has contested the claims related to leeching, arguing that the authors should not be permitted further discovery or to introduce new expert testimony regarding the alleged leeching. They have claimed that the authors' motion for summary judgment based on this theory is inappropriate since Meta has not had a chance to defend against it. Judge Vince Chhabria is set to evaluate these claims in a hearing scheduled for May 1. Chhabria has expressed his intention to consider whether ruling on the summary judgment at this stage might be unfair to Meta. The authors insist that torrenting pirated works is sufficiently illegal to warrant a clear case of copyright infringement, stating, "Meta's reproduction of Plaintiffs' Copyrighted Books without permission, including through peer-to-peer file sharing, is not fair use." They referenced a major ruling against Napster to support their claim that Meta infringed each author's copyrights. The authors also highlighted that the data Meta allegedly pirated included works by at least ten Supreme Court justices, aiming to emphasize the broader implications of Meta's actions. They have requested additional discovery to compel Meta to provide answers regarding its torrenting and leeching practices, including depositions of Meta employees such as Mark Zuckerberg. In their argument, they noted that the risk associated with taking copyrighted works from pirated databases was significant enough to escalate matters to Meta's executives, including Zuckerberg. "Their gamble should not pay off," the authors stated. Meta responded by deeming the authors' new discovery requests as "unnecessary, unwarranted, and infeasible." The company has offered to allow the deposition of six employees but rejected requests for additional time with Zuckerberg, suggesting alternative senior staff who could speak on executive decision-making. The authors pointed out perceived gaps in the understanding of Meta's torrenting by the court, alleging that Meta's expert had failed to replicate the torrenting in her analysis and did not account for how data uploading occurs during the leeching phase. They consider their leeching theory critical to their case and anticipate that the court may evaluate fair-use factors, although they recognize these are typically decided by juries, not at the summary judgment stage. Despite these complexities, the authors assert that the use of piracy cannot qualify as fair use, stating, "The use of piracy to further piracy can never be 'fair use.'" They contend that Meta's actions, including making millions of copyrighted books available for faster download speeds, constitute clear copyright infringement.
[6]
Meta vs. Kadrey Sets Precedent for AI Copyright Battles
Judge Vince Chhabria determined that Kadrey et al's Digital Millennium Copyright Act (DMCA) claim could move ahead. Credit: Pexels. A California Judge has partially granted Meta's motion to dismiss a lawsuit brought by a group of authors, including novelist Richard Kadrey and comedian Sarah Silverman. Judge Vince Chhabria's decision in Meta vs. Kadrey adds to a growing body of case law pertaining to copyright infringement claims against AI developers. And with each case, the nature of intellectual property rights with respect to AI training becomes clearer. Kadrey et al. Copyright Lawsuit Moves Ahead The lawsuit, filed in the Northern District of California, alleges that Meta infringed upon the copyrights of various authors by using their books to train its AI model, Llama. While Meta sought a full dismissal, the court ruled that some claims could proceed. The court dismissed the plaintiffs' claims under the California Comprehensive Computer Data Access and Fraud Act (CDAFA), ruling that the allegations were preempted by federal copyright law. However, the court allowed the claims related to the Digital Millennium Copyright Act (DMCA) to move forward, recognizing that Meta's alleged removal of copyright management information (CMI) constituted a potential violation. Digital Millennium Claims Still Alive A crucial part of the ruling addressed the plaintiffs' allegations that Meta intentionally removed CMI from copyrighted works used in training its AI models. The court found that such removal could be considered an effort to conceal copyright infringement, a key element under the DMCA. This ruling aligns with recent lawsuits against OpenAI and other AI developers, where plaintiffs have cited the DMCA in their legal arguments. For example, in The Intercept Media, Inc. v. Ope nAI , a court ruled that the removal of CMI could form the basis of a DMCA claim, underscoring the importance of digital attribution in AI training disputes. While Meta argued that DMCA protections do not align with traditional copyright concerns, the court noted that the law's purpose is to safeguard digital property rights, making it relevant in the context of AI training. The decision suggests that AI companies cannot rely solely on broad copyright defenses to evade accountability for metadata manipulation. AI Copyright Claims and "Concrete Harm" Another critical issue in the lawsuit is whether the plaintiffs have suffered "concrete harm," a requirement for legal standing in copyright cases. Courts have scrutinized such claims in AI-related lawsuits, often ruling that plaintiffs must demonstrate tangible damages. In his latest decision, Judge Chhabria referenced Raw Story Media, Inc. v. OpenAI . In this 2024 case, a court deemed that an AI model's reproduction of text doesn't constitute harm significant enough to support a copyright claim. However, Chhabria determined that the removal of CMI itself may represent a legally recognizable injury as Meta likely removed it to obscure the copyright protection of the works in question This aspect of the ruling could shape future AI copyright litigation as courts increasingly reject the notion that AI training with copyrighted materials is in and of itself a violation of intellectual property rights.
Share
Share
Copy Link
Meta is embroiled in a lawsuit accusing the company of using torrented copyrighted books to train its AI models, potentially setting a precedent for how courts view copyright law in AI development.
Meta, the parent company of Facebook and Instagram, is facing a significant legal challenge over its artificial intelligence (AI) training practices. A group of authors, including Richard Kadrey, Sarah Silverman, and Christopher Golden, have filed a lawsuit accusing Meta of copyright infringement by using their copyrighted works to train its Llama AI model without permission 2.
The plaintiffs allege that Meta resorted to torrenting terabytes of pirated book data to train its AI models after attempts to download pirated books individually proved too slow and strained Meta's networks 1. This decision, they argue, was made with full awareness of the legal risks involved, as torrenting has long been associated with copyright infringement 4.
The authors claim that Meta's actions constitute clear copyright infringement, stating, "Whatever the merits of generative artificial intelligence, or GenAI, stealing copyrighted works off the Internet for one's own benefit has always been unlawful" 1. They further allege that Meta attempted to conceal its torrenting activities by using Amazon Web Services rather than its own infrastructure 4.
The plaintiffs mockingly refer to Meta's stance as the "Bob Dylan defense," citing lyrics from Dylan's "Sweetheart Like You": "Steal a little and they throw you in jail / Steal a lot and they make you king" 15. They argue that Meta's use of peer-to-peer (P2P) file sharing to obtain copyrighted material cannot be considered fair use 1.
Meta has admitted to using the Books3 dataset, which contains 195,000 copyrighted books, to train its Llama 1 large language model 3. However, the company maintains that its actions fall under fair use doctrine 4.
The outcome of this case could have far-reaching implications for AI development and copyright law. If the court rules in favor of Meta, it could set a precedent allowing AI developers to use copyrighted material for training without compensation to intellectual property owners 4. Conversely, a ruling in favor of the authors could strengthen similar cases and potentially lead to copyright reform 4.
Judge Vince Chhabria, who is presiding over the case, has allowed it to proceed, stating that "Copyright infringement is obviously a concrete injury sufficient for standing" 2. However, he has also expressed some unfamiliarity with torrenting terminology, which may influence how the case proceeds 4.
The authors have filed for a partial summary judgment, arguing that Meta's use of torrenting leaves no room for legal ambiguity 4. Judge Chhabria is scheduled to evaluate these claims in a hearing on May 1, considering whether ruling on the summary judgment at this stage might be unfair to Meta 1.
This case is part of a larger trend of legal challenges facing AI companies over copyright issues. The New York Times has sued OpenAI and Microsoft, News Corp. has sued Perplexity, and several Canadian news organizations have sued OpenAI 2. A recent ruling in favor of Thomson Reuters against Ross Intelligence has already suggested that indiscriminate ingestion of copyrighted material for AI training may have financial consequences 3.
As the legal battle unfolds, the tech industry and copyright holders alike are closely watching this case, which could significantly shape the future landscape of AI development and intellectual property rights in the digital age.
Reference
[3]
[4]
[5]
Meta is embroiled in a lawsuit alleging the company used pirated books to train its AI models, including Llama. Internal communications reveal ethical concerns and attempts to conceal the practice.
11 Sources
11 Sources
Meta claims it didn't seed pirated books used for AI training, sparking debate on copyright infringement and data acquisition methods in AI development.
2 Sources
2 Sources
Meta CEO Mark Zuckerberg defends the use of copyrighted e-books to train AI models, comparing it to YouTube's content moderation challenges. The case raises questions about fair use in AI development.
17 Sources
17 Sources
French publishing and authors' associations have filed a lawsuit against Meta, accusing the tech giant of using copyrighted content without permission to train its AI models. This marks the first such legal action against an AI company in France.
11 Sources
11 Sources
A group of authors has filed a lawsuit against AI company Anthropic, alleging copyright infringement in the training of their AI chatbot Claude. The case highlights growing concerns over AI's use of copyrighted material.
14 Sources
14 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved