22 Sources
[1]
Elsevier vs. Meta: first science publisher sues over scraped research papers
A scientific publisher has joined the dozens of firms and individuals suing artificial intelligence companies over their alleged use of copyrighted works in training AI models. Elsevier - which publishes thousands of journals, including Cell and The Lancet - was part of a class-action lawsuit filed on 5 May against technology company Meta and its chief executive Mark Zuckerberg in the Southern District of New York. Also named as plaintiffs on the lawsuit are book-publishing giants Hachette and Macmillan, and the US fiction author and lawyer Scott Turow. The publishers allege that Meta obtained and reproduced copyrighted works in developing its large language model (LLM) Llama. "This case is the first AI action brought by major publishing houses, who have their own story to tell about Meta's flagrant violation of their rights," said the Association of American Publishers, in a statement. The case mirrors those of authors and media companies - including The New York Times - suing AI firms on similar grounds. Some cases have been settled but, overall, they have yet to establish a clear precedent on whether it is legal to use copyrighted works to train an LLM. A Meta spokesperson has said the company would "fight this lawsuit aggressively". Although AI firms are cagey about their training data, it is widely assumed that paywalled research papers, as well as open-access ones, formed part of the billions of web pages that models were trained on. To train Llama, the lawsuit alleges that Meta used the Common Crawl data set, a sample of billions of web pages made by trawling the Internet, which the plaintiffs say is likely to have included unauthorized copies of copyrighted works, such as scientific abstracts and paywalled papers. The publishers also allege that Meta downloaded and torrented (sourced using a file-sharing method) works from sites including LibGen, a database of books, research papers and textbooks; and Sci-Hub, a repository that gives free access to millions of research articles and books regardless of copyright. Both sites have been the subject of legal challenges. Much of the evidence relies on e-mails between Meta employees that were revealed during a separate case in which several book authors sued Meta last year (Kadrey v. Meta). Meta has suggested that it will argue that training on copyrighted documents constitutes 'fair use', a copyright exemption in US law. "AI is powering transformative innovations, productivity and creativity for individuals and companies, and courts have rightly found that training AI on copyrighted material can qualify as fair use," its spokesperson said. So far, US courts have mainly backed claims from AI firms that the way LLMs use copyrighted material is 'transformative', which is one of the tests for what counts as fair use. But judges in two landmark rulings in 2025 said that the act of acquiring and storing pirated content can constitute an infringement, and that arguments against fair use could be made if copyright holders can show that the commercial markets for their products are substantially affected by models' outputs. Academic texts are valuable to AI firms as a training resource for LLMs because they are considered high quality, human-written and full of rich information, says Stefan Baack, an independent researcher based in Berlin and a proponent of open data. They can also boost an LLM's accuracy on scientific topics. Repositories of open-access papers and abstracts, such as PubMed, are often used to build bespoke training data sets to improve AI models' knowledge in specialist scientific domains. Some academics might be happy to have their papers used to train LLMs if it makes models more accurate. In Baack's experience, researchers tend to care more about how generative AI products are being used than how they were trained - for example, by imitating authors' writing or referencing their work without the right attribution, he says. But increasingly, publishers are agreeing deals with tech companies to sell or license their data. This could strengthen big AI firms but make it harder for researchers trying to build open-source models to access the content, he says. Ideally there should be ways to respect creators' wishes about how their work gets used by AI systems "without relying on exclusive deals between publishers and AI companies only", he says.
[2]
Even More Authors, Publishers Sue Meta Over Copyright in AI Training: What's Different Now
Academic and entertainment publishers say Meta "engaged in one of the most massive infringements of copyrighted materials in history" in a new lawsuit filed on Tuesday in a US District Court in New York. The claims are familiar: Publishers, including McGraw-Hill, Elsevier, Cengage, Hachette and Macmillan, allege that Meta illegally acquired, or pirated, copies of their copyright-protected materials -- scientific journal articles, textbooks and other books -- to train its Llama AI models. Author, lawyer and former Authors Guild President Scott Turow is also joining the publishers in the lawsuit. Meta CEO Mark Zuckerberg is specifically named as a defendant, with the complaint saying the CEO "personally authorized and actively encouraged" the alleged illegal behavior. As a result, Meta's AI "readily generates, at speed and scale, substitutes for [authors] works on which it was trained." "Meta chose to live by its motto of 'move fast, and break things,' and now must be held accountable for what it broke, including the copyright laws," the American Association of Publishers said in a statement. An attorney for the plaintiffs did not immediately respond to a request for comment. A Meta spokesperson told CNET: "Courts have rightly found that training AI on copyrighted material can qualify as fair use. We will fight this lawsuit aggressively." Copyright is one of the most contentious legal issues around AI. Tech companies like Meta need high-quality, human-created data to build and refine their AI models. Nearly all of this material is protected by copyright. That means tech companies have to enter into licensing agreements or defend their use of the content as fair use under a provision of copyright law. Meta and Anthropic have both won previous cases in lawsuits brought by authors, successfully defending their fair use. Anthropic agreed to settle some piracy claims with authors for $1.5 billion, or about $3,000 per pirated work. Both judges warned in their decisions that this won't be the result in every lawsuit. US District Court Judge Vince Chhabria wrote in his 2025 ruling for Meta, "The market for the typical human-created romance or spy novel could be diminished substantially by the proliferation of similar AI-created works." One of the biggest considerations in these cases is whether tech companies' use of copyrighted books will make it harder for human authors to sell their work or otherwise affect the marketplace. The plaintiffs argue Meta's AI models can pop out entirely AI-generated scientific articles and novels, pointing to a number of authors selling AI-written works on Amazon. This is especially concerning for authors who say people are using AI to create content in their specific style. "I find it distressing and infuriating that one of the top-10 richest corporations in the world knowingly used pirated copies of my books, and thousands of other authors, to train Llama, which can and has produced competing material, including works supposedly in my style," Turow told The New York Times. Precedent -- the history of prior court rulings -- always plays a role in how current lawsuits unfold. But it's too soon to tell whether this case will play out differently from previous cases in which judges sided with tech companies.
[3]
Major publishers sue Meta for copyright infringement over AI training
May 5 (Reuters) - Publishers Elsevier, Cengage [RIC:RIC:TLACQ.UL], Hachette (ALHG.PA), opens new tab, Macmillan and McGraw Hill (MH.N), opens new tab sued Meta Platforms (META.O), opens new tab in Manhattan federal court on Tuesday, alleging that the tech giant misused their books and journal articles to train its artificial intelligence model Llama. The publishers, as well as author Scott Turow, alleged in the proposed class action complaint, opens new tab that Meta pirated millions of their works and used them without permission to train its large language models to respond to human prompts. Spokespeople for Meta did not immediately respond to a Reuters request for comment on the complaint on Tuesday. "Meta's mass-scale infringement isn't public progress, and AI will never be properly realized if tech companies prioritize pirate sites over scholarship and imagination," Maria Pallante, president of the Association of American Publishers, said in a statement. The publishers allege that Meta pirated works ranging from textbooks to scientific articles to novels including "The Fifth Season" by N.K. Jemisin and "The Wild Robot" by Peter Brown for its AI training. They asked the court for permission to represent a larger class of copyright owners and an unspecified amount of monetary damages. The lawsuit opens a new front in the ongoing copyright battle between creators and tech companies over AI training, in which dozens of authors, news outlets, visual artists and other plaintiffs have sued companies including Meta, OpenAI and Anthropic for infringement. All of the pending cases will likely revolve around whether AI systems make fair use of copyrighted material by using it to create new, transformative content. The first two judges to consider the matter issued diverging rulings last year. Amazon- and Google-backed Anthropic was the first major AI company to settle one of the cases, agreeing last year to pay a group of authors $1.5 billion to resolve a class-action lawsuit that could have cost the company billions more in damages for alleged piracy. Reporting by Blake Brittain in Washington; Editing by Emelia Sithole-Matarise Our Standards: The Thomson Reuters Trust Principles., opens new tab * Suggested Topics: * Boards, Policy & Regulation * Consumer Protection Blake Brittain Thomson Reuters Blake Brittain reports on intellectual property law, including patents, trademarks, copyrights and trade secrets, for Reuters Legal. He has previously written for Bloomberg Law and Thomson Reuters Practical Law and practiced as an attorney.
[4]
Meta and Zuckerberg sued by publishers over 'massive' copyright infringement
Meta and its chief executive Mark Zuckerberg face a lawsuit from a coalition of major publishers, alleging the social media platform illegally used copyrighted works to train its Llama AI models. Five publishers -- Hachette, Macmillan, McGraw Hill, Elsevier and Cengage -- along with bestselling author Scott Turow, are suing the Big Tech company and its founder over "one of the most massive infringements of copyrighted materials in history". According to the filing to the Manhattan federal court on Tuesday, Meta accessed millions of copyrighted books and journal articles from websites hosting pirated material, and also downloaded unauthorised scrapes of "virtually the entire internet" in order to train its generative AI models. The $1.5tn company then reproduced and distributed the material without permission, the filing said. The plaintiffs also claim Zuckerberg "himself personally authorised and actively encouraged the infringement", and that the company deliberately stripped the works of attribution data in order to conceal its training sources. The case is the latest in a string of fierce copyright battles filed by artists, authors and newspapers alleging that AI groups such as Microsoft and OpenAI have used copyrighted content without compensation or permission to train their chatbots. Last year, AI start-up Anthropic agreed to pay $1.5bn to settle a copyright lawsuit over its use of pirated texts to train its models. However, in June, Meta won a similar copyright lawsuit brought by authors including Ta-Nehisi Coates and Richard Kadrey. Here, the judge ruled that the plaintiffs had not provided enough evidence that the company's AI would harm the market for human-created content by flooding it with AI-generated works, calling this a "potentially winning argument". Meta's usage of the copyrighted material was therefore found to be "fair use" for developing a transformative technology. In a statement on Tuesday, Meta said it would fight the latest lawsuit "aggressively", adding: "AI is powering transformative innovations, productivity and creativity for individuals and companies, and courts have rightly found that training AI on copyrighted material can qualify as fair use." According to the lawsuit, Meta initially sought to negotiate licensing deals with publishers but abandoned them on "Zuckerberg's personal instruction". The publishers argue that authors have been harmed because Llama has been used to produce imitation versions of their works, calling the technology "an infinite substitution machine". They added that AI-generated books were "already flooding the world's largest book marketplace, Amazon, in volumes that materially displace human-authored works". The plaintiffs are seeking unspecified damages and aim to represent a broader group of copyright owners.
[5]
Book publishers accuse Meta and Mark Zuckerberg of copyright infringement - Engadget
Meta's AI endeavors have drawn another legal challenge. The social media company and its CEO Mark Zuckerberg are facing a class action lawsuit from five book publishers and one author on claims that it illegally used copyrighted works to train its Llama generative AI platform. The plaintiffs in the case are Hachette, Macmillan, McGraw Hill, Elsevier and Cengage; they're joined by best-selling author Scott Turow. "Defendants reproduced and distributed millions of copyrighted works without permission, without providing any compensation to authors or publishers, and with full knowledge that their conduct violated copyright law," the complaint reads. "Zuckerberg himself personally authorized and actively encouraged the infringement." Meta has been sued multiple times regarding the materials it used to train Llama. A different group of authors attempted a copyright infringement lawsuit in 2023, but were ultimately unsuccessful in the effort. Zuckerberg's involvement in reportedly encouraging use of copyrighted works was called out in a case brought by LibGen. And while it doesn't appear to have reached court yet, a group of authors in the UK also raised the alarm last year about Meta potentially violating copyright laws. In a similar lawsuit against Anthropic, a judge seemed unswayed by the copyright infringement argument, but did present piracy as an alternative way for authors to win damages from the AI company. Meta representative Dave Arnold echoed the lack of court support for copyright infringement in a statement to The New York Times about today's class action: "AI is powering transformative innovations, productivity and creativity for individuals and companies, and courts have rightly found that training AI on copyrighted material can qualify as fair use."
[6]
Mark Zuckerberg 'personally authorized' Meta's copyright infringement, publishers allege
NEW YORK (AP) -- Five publishing houses and author Scott Turow sued Meta and CEO Mark Zuckerberg on Tuesday, alleging the company illegally used millions of copyrighted works to train its AI language system Llama. The class action lawsuit, filed in federal court in Manhattan, accuses the tech giant of copyright infringement and opens up a new front in the ongoing battle between the book community and developers of AI. The plaintiffs allege that Zuckerberg and Meta "followed their well-known motto 'move fast and break things'" by illegally drawing upon a massive trove of books and journal articles for Llama. "Defendants reproduced and distributed millions of copyrighted works without permission, without providing any compensation to authors or publishers, and with full knowledge that their conduct violated copyright law," the complaint reads in part. "Zuckerberg himself personally authorized and actively encouraged the infringement." Authors published by the five companies suing -- Elsevier, Cengage, Hachette Book Group, Macmillan and McGraw Hill -- include Turow, James Patterson, Donna Tartt, former President Joe Biden and at least two of the Pulitzer Prize winners announced Monday, Yiyun Li and Amanda Vaill. In a statement Monday, Meta vowed to "fight this lawsuit aggressively." "AI is powering transformative innovations, productivity and creativity for individuals and companies, and courts have rightly found that training AI on copyrighted material can qualify as fair use," the statement reads in part. Over the past few years, numerous authors have pursued legal action involving AI. In 2025, Anthropic agreed to pay $1.5 billion to settle a class action suit initiated by thriller novelist Andrea Bartz and nonfiction writers Charles Graeber and Kirk Wallace Johnson. A final approval hearing is scheduled for next week.
[7]
Five Publishers and Scott Turow Sue Meta and Mark Zuckerberg
The class-action lawsuit accuses the tech giant and its founder and chief executive of infringing on authors' copyrights. Five major publishers -- Hachette, Macmillan, McGraw Hill, Elsevier and Cengage -- and the best-selling novelist Scott Turow have filed a class-action copyright infringement lawsuit against Meta and its founder and chief executive, Mark Zuckerberg. The complaint, which was filed on Tuesday morning in United States District Court for the Southern District of New York, accuses Meta and Zuckerberg of illegally using millions of copyrighted works to train their artificial intelligence program Llama, and of removing copyright notices and other copyright management information from those works. The lawsuit asserts that Meta's engineers relied on pirated books and journal articles to train the program by downloading unlicensed copies through websites like Anna's Archive, an open source search engine for piracy sites including LibGen and Sci-Hub. The suit also claims that "Zuckerberg himself personally authorized and actively encouraged the infringement." Representatives for Meta did not immediately respond to a request for comment. The plaintiffs argue that Meta's A.I. program poses a threat to the livelihoods of writers and publishers because the technology can be used to quickly produce A.I.-generated copycat books and to summarize the plot and themes of copyrighted books in such great detail that readers don't have to buy them. "These A.I.-generated books are already flooding the world's largest book marketplace, Amazon, in volumes that materially displace human-authored works," the complaint states. The filing cites several authors whose works the plaintiffs claim were used to train Llama, including V.E. Schwab, N.K. Jemisin, Lemony Snicket and Turow. Some of the evidence cited in the complaint purportedly comes directly from Llama. When asked to produce a travel guide in the style of the writer Becky Lomax, Llama rapidly produced "a convincing rendition of Lomax's local insider voice," the complaint says. Then, when asked how it was able to reproduce Lomax's style so accurately, Llama allegedly replied, "While I don't have personal interactions with Becky Lomax, I've been trained on a vast amount of text data, including her published works." Llama is also able to summarize books in detail. When asked to give a synopsis of Turow's "Presumed Innocent," Llama confirmed that it had "been trained on a digital version of the book, which allows me to access and analyze its content," according to the complaint. In an email to The Times, Turow said Meta's use of pirated works amounted to "shameless, damaging and unjust behavior." "I find it distressing and infuriating that one of the top-10 richest corporations in the world knowingly used pirated copies of my books, and thousands of other authors, to train Llama, which can and has produced competing material, including works supposedly in my style," Turow wrote. By producing "knockoffs and imitations" of authors' works, Meta's A.I. program could "dilute the overall market for literary works," the plaintiffs argue. "These outputs are similar enough to copyrighted works -- in subject matter, plot details, sequencing of events, character names and traits, or other creative choices -- that they replace the original work for many readers or consumers," the complaint says. The lawsuit is the latest effort by authors and publishers to rein in tech companies' use of copyrighted works to train their large language models. Writers have brought lawsuits against tech companies including OpenAI, Anthropic, Google and xAI for the companies' unauthorized use of their work. Last fall, Anthropic agreed to pay a $1.5 billion settlement to writers whose books had been used to train its A.I. program. (The New York Times has sued OpenAI and Microsoft, as well as Perplexity, accusing the companies of copyright infringement of news content related to A.I. systems. The companies have rebutted the claims.) Authors have challenged Meta in court before. In June 2025, a judge ruled in Meta's favor, finding that the plaintiffs had not presented enough evidence that Meta's A.I. product would create "market dilution" by producing a flood of A.I.-generated books. The lawsuit filed on Tuesday against Meta brought together trade publishers, academic publishers of scientific and medical journals and a best-selling author of legal thrillers. The plaintiffs are seeking an order requiring Meta to destroy all illegally acquired copies of works copyrighted by the plaintiffs that Meta used in A.I. training and to "cease all unlawful activities," as well as requesting any "further relief as the Court deems proper." "We're focused on a much more sustainable A.I. landscape -- something that's transparent and fair and participatory and has guardrails against harm for authors and publishers," said Maria A. Pallante, president and chief executive of the Association of American Publishers, a trade group that acts as a law and policy advocate for the book publishing industry. "The harm is already evident."
[8]
Five major publishers are suing Meta over Llama
Elsevier, Cengage, Hachette, Macmillan, and McGraw Hill, joined by author Scott Turow, filed a proposed class action in Manhattan on Tuesday alleging Meta pirated millions of their works to train Llama. After Judge Chhabria's June 2025 ruling, plaintiffs with stronger market-harm evidence have been waiting their turn. On Tuesday morning, five of the world's largest publishers and one of America's best-known novelists walked into a Manhattan federal courthouse and filed a proposed class action complaint against Meta Platforms. Reuters reported the case as Elsevier, Cengage, Hachette, Macmillan, and McGraw Hill, alongside the author Scott Turow, alleging that Meta pirated millions of their books and journal articles to train its Llama large-language models without permission, payment, or licence. The complaint asks the court to certify the case as a class action representing all similarly situated rights holders. It is, by date, the latest in a long line of AI-training copyright cases. By substance, it is meaningfully different from most of those that have come before. Anyone who has been following AI-copyright litigation will recognise the name in the precedent slot: Kadrey v. Meta. That earlier case, filed in 2023 in the Northern District of California by authors including Sarah Silverman, Richard Kadrey, Christopher Golden, Ta-Nehisi Coates, Junot Díaz, and Michael Chabon, made effectively the same allegations: that Meta downloaded copyrighted books from pirate libraries (LibGen, Z-Library, and Anna's Archive) and used them to train Llama. Court records cited by Tom's Hardware established that Meta employees torrented roughly 82 terabytes of pirated material in the process. Mark Zuckerberg personally signed off on the use of LibGen for Llama training, despite internal AI executives flagging it as a "data set we know to be pirated" that could "undermine [Meta's] negotiating position with regulators." And Meta won that case. In June 2025, Judge Vince Chhabria granted summary judgment for Meta on fair-use grounds, finding that the use of copyrighted books to train Llama was sufficiently transformative to clear the fair-use threshold. But Chhabria's ruling was unusually narrow, and unusually candid about its limits. He said publicly that Meta's win "may be in significant tension with reality" and that the ruling applied only to the specific authors who had brought the case. He noted explicitly that future plaintiffs could succeed if they presented stronger evidence of market harm, the prong of fair-use analysis on which the Kadrey plaintiffs had, in his view, fallen short. Tuesday's filing reads, on first inspection, as exactly the kind of case Chhabria invited. There are three structural differences between Kadrey and the new lawsuit, and all three favour the plaintiffs. The first is the catalogue. Where Kadrey involved roughly 666 specific books from a small group of individual authors, the new complaint covers the entire publishing operations of five companies that together account for a substantial share of the world's academic, educational, and trade publishing output. Per Reuters' description of the complaint, titles include not only literary works such as N.K. Jemisin's "The Fifth Season" and Peter Brown's "The Wild Robot" but also textbooks, scientific journal articles, and reference works. The market for those works, particularly the academic and educational categories, is structurally different from the trade-fiction market that dominated the Kadrey plaintiff set. The second is the market-harm evidence. Academic and educational publishers can document, in ways individual authors typically cannot, the specific revenue lines that AI-trained models substitute for. When Llama answers a student's biology question that would otherwise have required consulting a Cengage textbook, the substitution is direct and measurable. The plaintiffs will, on the standard pleadings strategy for a case of this kind, present that substitution as the kind of identifiable market harm Chhabria's June ruling specifically identified as missing from Kadrey. Reed Smith's analysis of the recent fair-use decisions noted that the market-harm prong, more than transformativeness, is now the operative legal battleground. The third is the licensing-market context. Since 2023, AI companies have signed an increasing number of licensing deals with publishers. Meta itself has signed deals with Reuters, CNN, Fox News, People Inc., and USA Today for content licensing. The existence of those licences is, in fair-use law, a significant fact: courts examining the market-harm prong now have evidence that a licensing market exists, that some publishers have priced and negotiated participation in it, and that Meta has chosen to participate in some markets while bypassing others. The new plaintiffs will argue that bypassing them while licensing others is itself evidence of bad faith. Tuesday's case lands against another piece of recent precedent. Anthropic, in a settlement the Authors Guild publicly described as significant, agreed earlier this year to pay authors as part of resolving the Bartz v. Anthropic class action over similar allegations. The settlement amount and terms set a marker for what AI-training copyright cases can produce when they reach a financial resolution rather than an early summary judgment. TNW has tracked Anthropic's broader commercial trajectory through the parallel $1.5bn enterprise services joint venture, the IPO preparations, and the model-deployment programmes; the Bartz settlement is, in financial terms, a manageable line item against that backdrop. For Meta, with its different fact pattern and its prior summary-judgment win, the calculus is different. Settlement is, however, only one possible outcome. The other is that Meta tries the case the way it tried Kadrey, betting that the fair-use defence will hold even against more substantive plaintiffs. The risk in that strategy is asymmetric. A second loss for the publishers would, in effect, settle the question for the entire market: Llama-style training on pirated corpora is fair use even when the plaintiffs are a major industry. A second win would cost the company more in damages and structural remedies than the first case avoided. The new case sits inside a broader legal landscape Meta has been navigating for some time. TNW reported last week on the Meta-New Mexico phase-two trial in Santa Fe, in which the state is seeking algorithm changes, age-verification mandates, and a $3.7bn teen mental-health fund tied to the company's youth-safety record. TNW's analysis earlier this year noted that Meta's mounting child-safety legal exposure could, eventually, cost more than its $145bn AI capex programme. Meta's Q1 2026 capex guidance is now between $125bn and $145bn for the year, an order of magnitude that makes any single litigation outcome look small in absolute terms but that also raises the question of how many simultaneous fronts the company can accept legal exposure on without commercial consequences. There is also the broader regulatory backdrop. TNW has covered Anthropic's Mythos and the Eurogroup's parallel concerns about AI capability and access; that is a different set of regulatory concerns from copyright, but it is part of the same wider story about how AI companies' commercial speed is colliding with multiple categories of slower-moving legal infrastructure. The publishers' Tuesday filing is the copyright instance of that collision. The narrow legal question is whether the use of pirated copyrighted material to train Llama constitutes fair use under US copyright law. The wider question, the one the publishing industry is actually trying to settle, is whether the existing fair-use doctrine, written before generative models existed, can be stretched to accommodate them or whether some new framework, statutory or judicial, has to be built. The Kadrey ruling stretched the doctrine. The new case will test how far the stretch will go. If the publishers win, even partially, the licensing market for AI training data becomes a structural fixture of the industry, with material commercial implications for every model company currently relying on broadly scraped corpora. If they lose, the practice of training on pirated material at scale becomes effectively legally durable in the United States, with the regulatory response shifting to legislatures rather than courts. The procedural calendar will move slowly. Class certification, motions to dismiss, summary-judgment briefing, and trial scheduling will, in the ordinary course, take 18 to 24 months. Investing.com flagged the broader market-screener context around the lawsuit's announcement, noting that several other AI-training copyright cases are now moving through US courts simultaneously, with some likely to reach the appellate level before this one is resolved. The Tuesday filing is, in that sense, a long bet rather than an immediate threat. It is, however, the most credible long bet the publishing industry has yet placed against an AI-training defendant. After Kadrey produced what the Authors Guild called a "technical win" for Meta but a substantive opening for future plaintiffs, the industry has been waiting for the right plaintiff slate to bring the next case. Tuesday's filing names that slate. The litigation that follows will, over the next two years, decide whether Llama's training corpus, and by extension that of every comparable model trained on similarly broad scraped data, was the original commercial sin of the AI cycle or its first widely accepted standard practice. There is no third outcome the courts can produce. Meta will argue, as it argued in Kadrey, that the use is transformative and that no measurable market harm has occurred. The publishers will argue, with documents, accounting, and licensing comparables, that the harm is precisely measurable and that Meta's selective licensing across the industry establishes a market against which its non-licensing of their works can be valued. Judges Chhabria's earlier ruling has, in effect, written the brief for both sides. Whoever sat reading his June opinion most carefully has, on the present evidence, sat down at the publishers' table on Tuesday.
[9]
Scott Turow's latest real-life legal thriller: Suing Meta for copyright infringement
Publishing houses Hachette, Macmillan, McGraw Hill, Elsevier and Cengage joined forces with bestselling author Scott Turow (and his own company S.C.R.I.B.E) to file a class-action lawsuit on Tuesday against Meta and its CEO, Mark Zuckerberg. The plaintiffs accuse the tech company of building generative AI models on the backs of millions of stolen copyrighted books and journal articles. In their complaint filed in the United States District Court for the Southern District of New York, the plaintiffs argue Meta knowingly copied copyrighted materials from notorious pirate websites such as LibGen and Anna's Archive to train various iterations of its Llama language model -- with Zuckerberg's personal authorization to do so. The complaint alleges that Meta willfully bypassed legal licensing markets to gain an advantage in the "AI arms race." "All Americans should understand that the bold future promised by A.I., has been, to paraphrase the investigative writer Alex Reisner, created with stolen words," said Turow in a statement to NPR. "It is all the more shameful that these violations of the law were undertaken by one of the richest corporations in the world." According to the complaint, Meta "briefly considered licensing deals with major publishers" but changed its strategy in April 2023. The question of whether to license or pirate moving forward was "escalated" to Zuckerberg, after which, the complaint alleges, Meta's business development team received verbal instructions to stop licensing efforts. "If we license once [sic] single book, we won't be able to lean into the fair use strategy," a Meta employee is quoted as saying in the complaint. "It's the most flagrant copyright breach in history," said Authors Guild CEO Mary Rasenberger in a statement to NPR. "And these voracious tech companies need to be held accountable." The lawsuit cites numerous specific works allegedly stolen by Meta to feed Llama. Turow alleges Meta infringed several of his well-known books, including the 1987 legal thriller Presumed Innocent. Other cited works include Douglas Preston's Impact, Peter Brown's The Wild Robot, The Fifth Season by N.K. Jemisin, and Lemony Snicket's Who Could That Be at This Hour? The list also includes research and academic titles. The class represented by Turow could potentially include many authors, according to the complaint -- "all legal or beneficial owners of registered copyrights, in whole or in part, for any book possessing an International Standard Book Number (ISBN) or journal article possessing a Digital Object Identifier (DOI) or International Standard Serial Number (ISSN)." Some books do not have ISBN numbers, but most do. The plaintiffs are seeking statutory damages, a permanent injunction against Meta to stop further use of their works, and an order requiring the tech giant to destroy all infringing copies of copyrighted materials. Meta is hitting back against the literary world's allegations. "AI is powering transformative innovations, productivity and creativity for individuals and companies, and courts have rightly found that training AI on copyrighted material can qualify as fair use," said Nkechi Nneji, a public affairs director for Meta, in a statement to NPR. "We will fight this lawsuit aggressively." Authors and publishers have brought dozens of lawsuits against AI companies in recent years, many of which are still pending. Anthropic ended up paying a $1.5 billion settlement to authors in September 2025 to resolve a lawsuit brought by a group of literary plaintiffs. This came after U.S. District Court judge William Alsup had supported Anthropic's argument that the company's use of copyrighted books to train their AI model was acceptable. "The use of the books at issue to train Claude and its precursors was exceedingly transformative," Alsup said last June. (The settlement occurred as a result of the judge later ruling that Anthropic's use of millions of pirated books to build its models copied without obtaining the authors' consent or giving them compensation was not OK.) However, other cases could be used in support of tech companies' "fair use" defense. For example, last June, a federal judge dismissed a copyright infringement lawsuit from a different group of authors who accused Meta of stealing their works to train its models. "The Court has no choice but to grant summary judgment to Meta on the plaintiffs' claim that the company violated copyright law by training its models with their books," said U.S. District Court judge Vince Chhabria, finding that the plaintiffs did not present enough evidence to make the case that Meta's use of their copyrighted works was harmful.
[10]
Major Lawsuit Claims Mark Zuckerberg 'Personally Authorized' Use of Copyrighted Works for AI
Five publishing houses and a best-selling novelist have filed a lawsuit alleging that Meta illegally used millions of copyrighted works to train its AI language system Llama, and Mark Zuckerberg "personally authorized" the company's copyright infringement. Five major publishers -- Hachette, Macmillan, McGraw Hill, Elsevier, and Cengage -- and author Scott Turow are suing Meta and its CEO over "one of the most massive infringements of copyrighted materials in history." The class-action copyright infringement lawsuit, filed Tuesday morning in the United States District Court for the Southern District of New York, alleges that Meta and Zuckerberg used millions of copyrighted books and journal articles without permission to train the company's AI program, Llama. According to a report by The Financial Times, the lawsuit also claims that copyright notices and other copyright management information were removed from those works. The plaintiffs allege that Zuckerberg and Meta "followed their well-known motto 'move fast and break things'" by drawing on a large body of written material without authorization. "Defendants reproduced and distributed millions of copyrighted works without permission, without providing any compensation to authors or publishers, and with full knowledge that their conduct violated copyright law," the complaint reads in part. "Zuckerberg himself personally authorized and actively encouraged the infringement. According to the filing, Meta accessed millions of copyrighted books and journal articles from websites hosting pirated material. It also alleges the company downloaded unauthorized scrapes of "virtually the entire internet" to train its generative AI models. The suit further claims that "Zuckerberg himself personally authorized and actively encouraged the infringement." Meta vowed to "fight this lawsuit aggressively" in a statement on Monday. "AI is powering transformative innovations, productivity and creativity for individuals and companies, and courts have rightly found that training AI on copyrighted material can qualify as fair use," the statement reads. The case follows a recent class-action lawsuit involving AI company Anthropic and a group of U.S. authors who alleged their work was used without permission to train AI systems. In September, a judge approved a $1.5 billion settlement in that case, in what was described as the largest publicly reported copyright recovery in history. When news of the settlement became public, it was seen by some as a potential turning point that could lead other AI companies to compensate rights holders for the use of their material. While settlements do not set legal precedent, they can influence how similar cases develop in the evolving area of AI and copyright law. Legal experts suggest the settlement may set the stage for future payments, whether through court rulings, negotiated settlements, or licensing agreements. There are several ongoing cases against AI companies. For example, a group of artists filed a class-action lawsuit against AI image generators Stable Diffusion and Midjourney, among others.
[11]
Major publishers sue Meta for copyright infringement over AI training
Hachette, Macmillan and others allege that Meta pirated millions of works from textbooks to novels for Llama model Five major publishers sued Meta Platforms in Manhattan federal court on Tuesday, alleging that the tech giant misused their books and journal articles to train its artificial intelligence models. Elsevier, Cengage, Hachette, Macmillan and McGraw Hill, as well as author Scott Turow, alleged in the proposed class-action complaint that Meta pirated millions of their works and used them without permission to train its Llama large language models to respond to human prompts. "Meta's mass-scale infringement isn't public progress, and AI will never be properly realized if tech companies prioritize pirate sites over scholarship and imagination," Maria Pallante, the president of the Association of American Publishers, said in a statement. Meta has denied any wrongdoing. "AI is powering transformative innovations, productivity and creativity for individuals and companies, and courts have rightly found that training AI on copyrighted material can qualify as fair use," a Meta spokesperson responded in a statement on Tuesday. "We will fight this lawsuit aggressively." The publishers allege that Meta pirated works ranging from textbooks to scientific articles to novels including The Fifth Season by NK Jemisin and The Wild Robot by Peter Brown for its AI training. They asked the court for permission to represent a larger class of copyright owners and an unspecified amount of monetary damages. The lawsuit opens a new front in the ongoing copyright battle between creators and tech companies over AI training, in which dozens of authors, news outlets, visual artists and other plaintiffs have sued companies including Meta, OpenAI and Anthropic for infringement. All of the pending cases will likely revolve around whether AI systems make fair use of copyrighted material by using it to create new, transformative content. The first two judges to consider the matter issued diverging rulings last year. Amazon- and Google-backed Anthropic was the first major AI company to settle one of the cases, agreeing last year to pay a group of authors $1.5bn to resolve a class-action lawsuit that could have cost the company billions more in damages for alleged piracy. The New York Times has sued OpenAI and Microsoft for copyright infringement as well.
[12]
James Patterson, Biden publishers say Mark Zuckerberg 'personally authorized' copyright infringement in new lawsuit against Meta | Fortune
Five publishing houses and author Scott Turow sued Meta and CEO Mark Zuckerberg on Tuesday, alleging the company illegally used millions of copyrighted works to train its AI language system Llama. The class action lawsuit, filed in federal court in Manhattan, accuses the tech giant of copyright infringement and opens up a new front in the ongoing battle between the book community and developers of AI. The plaintiffs allege that Zuckerberg and Meta "followed their well-known motto 'move fast and break things'" by illegally drawing upon a massive trove of books and journal articles for Llama. "Defendants reproduced and distributed millions of copyrighted works without permission, without providing any compensation to authors or publishers, and with full knowledge that their conduct violated copyright law," the complaint reads in part. "Zuckerberg himself personally authorized and actively encouraged the infringement." Authors published by the five companies suing -- Elsevier, Cengage, Hachette Book Group, Macmillan and McGraw Hill -- include Turow, James Patterson, Donna Tartt, former President Joe Biden and at least two of the Pulitzer Prize winners announced Monday, Yiyun Li and Amanda Vaill. In a statement Monday, Meta vowed to "fight this lawsuit aggressively." "AI is powering transformative innovations, productivity and creativity for individuals and companies, and courts have rightly found that training AI on copyrighted material can qualify as fair use," the statement reads in part. Over the past few years, numerous authors have pursued legal action involving AI. In 2025, Anthropic agreed to pay $1.5 billion to settle a class action suit initiated by thriller novelist Andrea Bartz and nonfiction writers Charles Graeber and Kirk Wallace Johnson. A final approval hearing is scheduled for next week.
[13]
Publishers, author Scott Turow accuse Meta and Mark Zuckerberg of training AI on copyrighted works
Mary Cunningham is a reporter for CBS MoneyWatch. She previously worked at "60 Minutes," CBSNews.com and CBS News 24/7 as part of the CBS News Associate Program. A group of publishers and bestselling novelist Scott Turow are suing Meta and its founder, Mark Zuckerberg, alleging that the tech giant used copyrighted material to train Meta's artificial intelligence technology. The class-action lawsuit, filed in a federal court in New York, was brought by Turow and publishers Cengage, Elsevier, Hachette, Macmillan and McGraw-Hill. The plaintiffs allege that Meta scraped millions of copyrighted works from across the internet -- including from "notorious pirate sites" -- and used the content to train Llama, Meta's suite of AI models, without permission. Meta also removed copyright management information from the works to hide the fact that it was training its AI on stolen materials, the lawsuit alleges. Like other chatbots, Llama generates text outputs in response to user prompts. The complaint claims that the AI tool is reproducing versions of original works from novels, journal articles and textbooks, and in some cases recreating verbatim copies. Llama also mirrors certain authors' personal style in its responses, according to the lawsuit. The plaintiffs say Meta's actions are robbing authors and publishers of revenue they would otherwise receive. The suit assigns blame to Zuckerberg, claiming he "personally authorized and actively encouraged the infringement" by sidestepping normal licensing procedures. "As a result of Zuckerberg's day-to-day involvement in Meta's AI development, including his authorization for Meta AI to torrent pirate collections to train Llama, Zuckerberg's net worth recently climbed to over $200 billion," the lawsuit says. A Meta spokesperson told CBS News in an email that the company plans to "fight this lawsuit aggressively." "AI is powering transformative innovations, productivity and creativity for individuals and companies, and courts have rightly found that training AI on copyrighted material can qualify as fair use," the spokesperson said in an email. The literary world has previously clashed with AI companies over copyright issues. In a case last year, Anthropic, maker of the AI chatbot Claude, agreed to settle with hundreds of thousands of authors for $1.5 billion, the largest payout for copyright infringement in history, according to The New York Times. The plaintiffs in Tuesday's suit said they are seeking damages.
[14]
Publishers accuse Meta of massive copyright infringement
Meta and CEO Mark Zuckerberg are facing a class action lawsuit filed by five book publishers and best-selling author Scott Turow, alleging copyright infringement in the training of its Llama generative AI platform. The plaintiffs include Hachette, Macmillan, McGraw Hill, Elsevier, and Cengage. The lawsuit claims that the defendants reproduced and distributed millions of copyrighted works without permission and without compensating the authors or publishers, thereby violating copyright law. The complaint states, "Zuckerberg himself personally authorized and actively encouraged the infringement." Meta has faced multiple lawsuits related to its use of copyrighted materials for Llama. A previous copyright infringement lawsuit by a different group of authors in 2023 was unsuccessful. The allegations of Zuckerberg encouraging the use of copyrighted works were highlighted in a case brought by LibGen. Additionally, a group of authors in the UK raised concerns last year about potential copyright violations by Meta, although that case has not yet been litigated. In a separate lawsuit against Anthropic, a judge dismissed the copyright infringement claim, but mentioned piracy as a potential means for authors to seek damages. Meta representative Dave Arnold stated, "AI is powering transformative innovations, productivity and creativity for individuals and companies, and courts have rightly found that training AI on copyrighted material can qualify as fair use," emphasizing the current lack of strong court support for copyright infringement claims.
[15]
Mark Zuckerberg 'personally authorized' Meta's copyright infringement, publishers allege
NEW YORK (AP) -- Five publishing houses and author Scott Turow sued Meta and CEO Mark Zuckerberg on Tuesday, alleging the company illegally used millions of copyrighted works to train its AI language system Llama. The class action lawsuit, filed in federal court in Manhattan, accuses the tech giant of copyright infringement and opens up a new front in the ongoing battle between the book community and developers of AI. The plaintiffs allege that Zuckerberg and Meta "followed their well-known motto 'move fast and break things'" by illegally drawing upon a massive trove of books and journal articles for Llama. "Defendants reproduced and distributed millions of copyrighted works without permission, without providing any compensation to authors or publishers, and with full knowledge that their conduct violated copyright law," the complaint reads in part. "Zuckerberg himself personally authorized and actively encouraged the infringement." Authors published by the five companies suing -- Elsevier, Cengage, Hachette Book Group, Macmillan and McGraw Hill -- include Turow, James Patterson, Donna Tartt, former President Joe Biden and at least two of the Pulitzer Prize winners announced Monday, Yiyun Li and Amanda Vaill. In a statement Monday, Meta vowed to "fight this lawsuit aggressively." "AI is powering transformative innovations, productivity and creativity for individuals and companies, and courts have rightly found that training AI on copyrighted material can qualify as fair use," the statement reads in part. Over the past few years, numerous authors have pursued legal action involving AI. In 2025, Anthropic agreed to pay $1.5 billion to settle a class action suit initiated by thriller novelist Andrea Bartz and nonfiction writers Charles Graeber and Kirk Wallace Johnson. A final approval hearing is scheduled for next week.
[16]
Meta sued for allegedly using copyrighted work to train AI
Yesterday, a major legal battle put AI right in the center of a debate that goes far beyond technology -- it really goes to the heart of creativity, ownership, and what it means to be an author in 2026. Five major publishers, along with bestselling novelist Scott Turow, have filed a class-action lawsuit against Meta and its founder, Mark Zuckerberg. The accusation is serious: that Meta used millions of copyrighted books and journal articles to train its AI model, Llama, without permission, and in some cases allegedly pulled from pirated sites like LibGen and Sci-Hub. The complaint even claims Zuckerberg himself "personally authorized and actively encouraged the infringement." Meta has not yet publicly responded to these allegations, but the implications here are already reverberating across the publishing world. At the center of this case is a question I keep coming back to: what happens when the work of writers, the people who spend years building stories, ideas and entire worlds, is used to teach machines how to recreate them in seconds? Turow didn't hold back in his response, calling it "shameless, damaging and unjust behavior," saying he finds it "distressing and infuriating" that one of the richest corporations in the world would allegedly use pirated versions of his work to build a system that can then produce "competing material, including works supposedly in my style." That's a genuine fear for so many people -- not just copying, but replacement. The lawsuit argues that AI-generated books are already flooding marketplaces like Amazon, potentially pushing out human authors. And even more unsettling, these systems can summarize entire novels so well that, in theory, readers might not need to buy the original at all. One example in the filing described how Llama was prompted to mimic a travel writer's voice and produced what the complaint called a "convincing rendition" of that style. When asked how it did it, the system essentially admitted it had been trained on vast amounts of text, including that author's published work. Listen, AI is not going away. It can be powerful, efficient, even transformative. It has been for my business. It can open doors for creativity, for access, for new kinds of storytelling we haven't even imagined yet. But the argument from authors and publishers is that none of that should come at the expense of consent. And that's really the line being drawn right now: innovation versus ownership. Should companies be able to move fast and build powerful systems using whatever data they can access? Or does that speed come with a responsibility -- the responsibility to compensate, to protect, and to regulate the people whose work built the foundation in the first place? This isn't just about Meta. It's part of a growing wave of lawsuits against AI companies like OpenAI, Anthropic, Google and others. In fact, Anthropic recently agreed to a $1.5 billion settlement with writers over similar claims. And Congress is now being pulled into the conversation, with growing pressure to define what fair use looks like in the age of artificial intelligence, and whether human creativity is something that can be trained on without limits. My thought is simple. AI can absolutely be a great thing. It can create jobs, expand access and reshape industries in ways we're just beginning to understand. But it also has the power to dismantle livelihoods if it moves without guardrails. And I think the excuse that we need to "move fast to compete" cannot come at the expense of the people who have spent their lives creating the very content these systems are built on. We need regulation and oversight from Congress so that stealing intellectual property, and the erosion of creative work, doesn't become the norm in the name of innovation. At its core, this is about respect for creative work and deciding, collectively, where we draw the line between inspiration and appropriation in the age of AI. Lindsey Granger is a NewsNation contributor and co-host of The Hill's commentary show "Rising." This column is an edited transcription of her on-air commentary.
[17]
Mark Zuckerberg 'Personally Authorized' Meta's Copyright Infringement, Publishers Allege
NEW YORK (AP) -- Five publishing houses and author Scott Turow sued Meta and CEO Mark Zuckerberg on Tuesday, alleging the company illegally used millions of copyrighted works to train its AI language system Llama. The class action lawsuit, filed in federal court in Manhattan, accuses the tech giant of copyright infringement and opens up a new front in the ongoing battle between the book community and developers of AI. The plaintiffs allege that Zuckerberg and Meta "followed their well-known motto 'move fast and break things'" by illegally drawing upon a massive trove of books and journal articles for Llama. "Defendants reproduced and distributed millions of copyrighted works without permission, without providing any compensation to authors or publishers, and with full knowledge that their conduct violated copyright law," the complaint reads in part. "Zuckerberg himself personally authorized and actively encouraged the infringement." Authors published by the five companies suing -- Elsevier, Cengage, Hachette Book Group, Macmillan and McGraw Hill -- include Turow, James Patterson, Donna Tartt, former President Joe Biden and at least two of the Pulitzer Prize winners announced Monday, Yiyun Li and Amanda Vaill. In a statement Monday, Meta vowed to "fight this lawsuit aggressively." "AI is powering transformative innovations, productivity and creativity for individuals and companies, and courts have rightly found that training AI on copyrighted material can qualify as fair use," the statement reads in part. Over the past few years, numerous authors have pursued legal action involving AI. In 2025, Anthropic agreed to pay $1.5 billion to settle a class action suit initiated by thriller novelist Andrea Bartz and nonfiction writers Charles Graeber and Kirk Wallace Johnson. A final approval hearing is scheduled for next week.
[18]
Major publishers sue Meta for copyright infringement over AI training
Major publishers have filed a lawsuit against Meta Platforms in Manhattan federal court. They accuse the tech giant of using millions of their books and journal articles without permission to train its artificial intelligence model, Llama. This legal action marks a new development in copyright disputes over AI training data. The publishers seek monetary damages and class action representation. Publishers Elsevier, Cengage, Hachette, Macmillan and McGraw Hill sued Meta Platforms in Manhattan federal court on Tuesday, alleging that the tech giant misused their books and journal articles to train its artificial intelligence model Llama. Assembly Elections 2026 Election Results 2026 Live Updates: Who's ahead in which stateWest Bengal Election Results 2026 Live UpdatesTN Election Result 2026 Live Updates The publishers, as well as author Scott Turow, alleged in the proposed class action complaint that Meta pirated millions of their works and used them without permission to train its large language models to respond to human prompts. Spokespeople for Meta did not immediately respond to a Reuters request for comment on the complaint on Tuesday. "Meta's mass-scale infringement isn't public progress, and AI will never be properly realized if tech companies prioritize pirate sites over scholarship and imagination," Maria Pallante, president of the Association of American Publishers, said in a statement. The publishers allege that Meta pirated works ranging from textbooks to scientific articles to novels including "The Fifth Season" by N.K. Jemisin and "The Wild Robot" by Peter Brown for its AI training. They asked the court for permission to represent a larger class of copyright owners and an unspecified amount of monetary damages. The lawsuit opens a new front in the ongoing copyright battle between creators and tech companies over AI training, in which dozens of authors, news outlets, visual artists and other plaintiffs have sued companies including Meta, OpenAI and Anthropic for infringement. All of the pending cases will likely revolve around whether AI systems make fair use of copyrighted material by using it to create new, transformative content. The first two judges to consider the matter issued diverging rulings last year. Amazon- and Google-backed Anthropic was the first major AI company to settle one of the cases, agreeing last year to pay a group of authors $1.5 billion to resolve a class-action lawsuit that could have cost the company billions more in damages for alleged piracy.
[19]
Publishers Accuse Meta of Misusing Their Works in AI Training | PYMNTS.com
In a federal class action lawsuit filed Tuesday (May 5), publishing houses Elsevier, Cengage Learning, Hachette, Macmillan, and McGraw Hill accused Meta and founder and CEO Mark Zuckerberg of violating copyright laws through "willful infringement" of millions of books and journal articles. "Meta chose to live by its motto of 'move fast, and break things,' and now must be held accountable for what it broke, including the copyright laws," the publishers said in a Tuesday news release. The plaintiffs, who are joined by bestselling legal thriller author Scott Turow, said in the release that Meta deliberately used books and journal articles because they possess characteristics uniquely suited to training large language models. The lawsuit argued that the risk of Llama competing with works by human authors is not just theoretical. One user described creating a "100-chapter fictional book" from "a single prompt using Llama 3.1 70B!" According to a Tuesday (May 5) Reuters report, a Meta spokesperson said the company will "aggressively" fight the publishers' claims. "AI is powering transformative innovations, productivity and creativity for individuals and companies, and courts have rightly found that training AI on copyrighted material can qualify as fair use," the spokesperson said. The lawsuit is the latest in a string of legal actions filed by authors, media outlets and publishers against AI companies. In March, Encyclopedia Britannica and its subsidiary Merriam-Webster sued OpenAI, alleging the startup committed copyright infringement by scraping their articles to train ChatGPT. Meanwhile, some tech giants, including Amazon and Microsoft, are working on deals with media companies in which those publishers would be compensated when their materials are used for AI training. "Publishers increasingly favor usage-based compensation models that scale with how often AI systems rely on their content, rather than flat licensing fees," PYMNTS reported in February. "Industry executives say such models could offer a more sustainable revenue stream as artificial intelligence usage grows, but many also worry that AI companies may not participate in sufficient numbers to make marketplaces economically meaningful."
[20]
Macmillan & others allege Meta used pirated books to train Llama
A coalition of major publishers -- Elsevier, Cengage, Hachette, Macmillan, and McGraw-Hill -- along with bestselling author Scott Turow and his company S.C.R.I.B.E., Inc., filed a class action complaint against Meta Platforms, Inc. and its CEO, Mark Zuckerberg, in the U.S. District Court for the Southern District of New York on May 5, 2026. The plaintiffs allege that the defendants committed extensive copyright infringement by downloading pirated books and journal articles, scraping unauthorised web content and repeatedly using these materials to train successive versions of the Llama AI model, all without paying licensing fees. According to the complaint, the allegedly infringed material spans textbooks, scientific journals, and popular fiction, including titles such as The Fifth Season by N. K. Jemisin and The Wild Robot by Peter Brown. Meta Defends AI Training as Fair Use: A Meta spokesperson dismissed the suit, arguing that courts have already found AI training on copyrighted material can qualify as fair use, and vowed to fight the lawsuit aggressively. The company's defence is part of a broader legal strategy that has recently succeeded in U.S. courts. In June 2025, a U.S. federal court ruled in Meta's favour in a separate lawsuit brought by authors who claimed the company unlawfully used copyrighted books to train its Llama AI models. The court found that the plaintiffs did not sufficiently prove market harm from Meta's AI systems, a key factor in determining fair use under U.S. copyright law. The judge ruled that Meta's use of the books was "highly transformative," and noted that Llama did not reproduce substantial portions of the original works, even with adversarial prompting. The court also stated that although Meta's AI business is commercial, commercial intent alone does not preclude a fair use defense. The court also addressed Meta's alleged use of pirated books separately from the fair use analysis. The judge stated that downloading from shadow libraries does not automatically invalidate a fair use defense, but noted that using pirated repositories may indicate bad faith in certain situations. Allegations Around Meta's Use of Pirated Books: In November 2025, Entrepreneur Media sued Meta, alleging the company used pirated articles to train its Llama AI model. The complaint, filed in the U.S. District Court for the Northern District of California, claimed that several Entrepreneur titles and registered magazine issues, including Start Your Own Import/Export Business, appeared in the LibGen corpus. Meta allegedly downloaded this corpus with Mark Zuckerberg's authorization to train Llama. Entrepreneur argued that the harm was direct and commercial, as users could prompt Llama to generate similar guidance for free, reducing demand for its business guides and periodicals. In June 2025, a U.S. federal judge granted Meta summary judgement in the Kadrey v. Meta case, dismissing copyright infringement claims related to AI training due to insufficient evidence of market harm. However, the case continued on separate claims regarding the alleged distribution of books obtained through torrent downloads. This ruling, which publishers are watching closely, follows Anthropic's $1.5 billion settlement with authors in September 2025, highlighting the significant financial risks for large language model developers. The varied outcomes in these cases make the new publishers' class action a critical test of whether courts will hold AI companies liable for the entire process, from acquiring pirated content to producing outputs that may copy-protected text. Six Legal Counts, a Class Action, and Demands for Injunctions: The complaint brings six counts against Meta and Zuckerberg, covering copyright infringement through torrenting, web scraping, and AI training; distribution of infringing works; contributory infringement by Zuckerberg personally; and violations of the Digital Millennium Copyright Act for stripping copyright management information. The proposed class covers all legal or beneficial owners of registered copyrights in books with an ISBN or journal articles with a DOI or ISSN that Meta reproduced without authorisation by torrenting or web scraping, distributed by torrenting, or reproduced in connection with training a Llama model. Plaintiffs seek permanent injunctions to prevent ongoing infringement, statutory damages under the Copyright Act, attorneys' fees, and a jury trial.
[21]
AI Copyright Dispute Puts Meta & Juckerberg in Legal Spotlight Again
Meta and its CEO Mark Zuckerberg are facing a fresh lawsuit alleging the use of pirated books to train AI models. The latest legal battle sparks renewed debate over copyright laws, data ethics, and the future of AI regulation. The lawsuit opens a new front in the ongoing copyright battle between creators and tech companies over AI training, in which dozens of authors, news outlets, visual artists and other plaintiffs have sued companies including Meta, OpenAI and Anthropic for infringement. and its CEO Mark Zuckerberg are now under fire again after publishers filed a fresh legal challenge in the US. According to reports, major publishers and an author have filed a class-action lawsuit alleging copyright violations tied to the company's AI efforts. The lawsuit claims that Meta used copyrighted books and academic content without permission to train its Llama large language models. The plaintiffs include prominent publishing houses such as Hachette, Macmillan, McGraw-Hill, Elsevier, and Cengage, as well as author Scott Turow. The complaint stated that Meta copied and distributed millions of without authorization or compensation. It also claimed that Zuckerberg was directly involved in approving such practices. The lawsuit is reportedly filed in Manhattan federal court. The plaintiffs are seeking damages and also pushing to expand the case to represent a broader group of copyright holders. "Meta's mass-scale infringement isn't public progress, and AI will never be properly realized if tech companies prioritize pirate sites over scholarship and imagination," Maria Pallante, the president of the Association of American Publishers, said in a statement.
[22]
Meta, Mark Zuckerberg sued for allegedly using pirated books to train AI
The case adds to ongoing legal battles involving AI firms like Anthropic and OpenAI over fair use and copyright. Meta and its CEO Mark Zuckerberg are now under fire again after publishers filed a fresh legal challenge in the US. As per the reports, major publishers and an author have filed a class-action lawsuit claiming copyright violations tied to the company's AI efforts. The lawsuit claims that Meta used copyrighted books and academic content without permission to train its Llama large language models. The plaintiffs include prominent publishing houses such as Hachette, Macmillan, McGraw Hill, Elsevier and Cengage along with author Scott Turow. The complaint stated that Meta copied and distributed millions of copyrighted works without authorisation or compensation with claims that Zuckerberg was directly involved in approving such practices. The lawsuit, reportedly filed in Manhattan federal court alleges that Meta relied on pirated materials ranging from textbooks and research papers to novels to train its AI systems. The plaintiffs are seeking damages and are also pushing for the case to be expanded to represent a broader group of copyright holders. Well, this is not the first time Meta's AI training methods have come under scrutiny. The company has faced many lawsuits with similar allegations, although earlier attempts by authors to prove copyright infringement have not always succeeded. In parallel cases involving other AI firms like Anthropic, courts have shown mixed responses, with some suggesting that while copyright claims may be complex, piracy-related arguments could hold more weight. Also read: OpenAI AI smartphone may launch sooner than expected, mass production tipped for 2027 On the other hand, Meta has maintained that its AI practices come under the legal boundaries and argued that training models on copyrighted material can qualify as fair use. The company has also indicated it will contest the claims. In the meantime, OpenAI and Elon Musk are fighting one of the biggest lawsuits. This came after Musk filed a lawsuit against OpenAI and Sam Altman claiming that the company purposely shifted to a for profit structure to make money, shifting from its helping humanity goal. Musk demanded damages and removal of Altman and Greg Brockman.
Share
Copy Link
Five major publishers including Elsevier, Hachette, and Macmillan have filed a class-action lawsuit against Meta and CEO Mark Zuckerberg, alleging the company illegally used millions of copyrighted books and research papers to train its Llama AI models. The case marks the first AI copyright lawsuit brought by major scientific publishers and could reshape the legal landscape of AI development.
Meta and CEO Mark Zuckerberg face a class-action lawsuit against Meta filed on May 5 in the Southern District of New York by five major publishers and bestselling author Scott Turow
1
. The plaintiffs—Elsevier, Hachette, Macmillan, McGraw Hill, and Cengage—allege that Meta engaged in "one of the most massive infringements of copyrighted materials in history" by using their works without permission to train Llama AI models2
. This marks the first time major scientific publishers have taken legal action against an AI company over copyright infringement, according to the Association of American Publishers1
.
Source: Nature
The lawsuit alleges Meta pirated millions of copyrighted works ranging from scientific journal articles published in Cell and The Lancet to textbooks and novels including "The Fifth Season" by N.K. Jemisin
3
. According to the complaint, Meta accessed scraped research papers and other copyrighted material for AI training through multiple sources, including the Common Crawl data set and file-sharing sites like LibGen and Sci-Hub1
. Evidence presented includes internal emails between Meta employees revealed during a previous case, Kadrey v. Meta1
.Source: The Hill
The lawsuit specifically names Mark Zuckerberg as a defendant, claiming he "personally authorized and actively encouraged" the alleged copyright infringement
2
. The filing alleges that Zuckerberg instructed the company to abandon licensing negotiations with publishers and deliberately stripped works of attribution data to conceal training sources4
. The publishers argue that Meta's generative AI platform functions as "an infinite substitution machine," producing imitation versions of original works that displace human-authored content in the marketplace4
.Scott Turow, former Authors Guild President and plaintiff in the case, expressed his frustration: "I find it distressing and infuriating that one of the top-10 richest corporations in the world knowingly used pirated sources of my books, and thousands of other authors, to train Llama"
2
. The plaintiffs seek unspecified monetary damages and aim to represent a broader class of copyright owners3
.Meta has vowed to "fight this lawsuit aggressively," arguing that training AI on copyrighted material qualifies as fair use under US copyright law
1
. The company maintains that "AI is powering transformative innovations, productivity and creativity for individuals and companies, and courts have rightly found that training AI on copyrighted material can qualify as fair use"5
. This defense mirrors Meta's successful 2025 ruling where a judge found insufficient evidence that Llama AI models would harm the market for human-created content4
.
Source: FT
However, the legal landscape of AI remains unsettled. While US courts have generally backed claims that large language model use of copyrighted material is "transformative," two landmark 2025 rulings warned that acquiring and storing pirated content can constitute infringement, particularly if copyright holders demonstrate substantial market harm
1
. US District Court Judge Vince Chhabria noted in a previous Meta ruling that "the market for the typical human-created romance or spy novel could be diminished substantially by the proliferation of similar AI-created works"2
.Related Stories
Academic texts represent valuable training resources for Llama AI models because they contain high-quality, human-written information that can boost accuracy on scientific topics
1
. Repositories like PubMed are commonly used to build specialized training data sets for scientific domains1
. Yet the proliferation of licensing deals between publishers and tech companies raises concerns about access for researchers building open-source models, potentially strengthening big AI firms while limiting broader innovation1
.The case joins dozens of similar lawsuits against AI companies, with Anthropic recently settling author claims for $1.5 billion, or approximately $3,000 per pirated work
2
. The plaintiffs point to AI-generated books already flooding Amazon's marketplace as evidence of market displacement4
. As this case unfolds, it will test whether courts continue to favor tech companies' fair use arguments or shift toward protecting creators' economic interests in the rapidly evolving AI ecosystem.Summarized by
Navi
08 Feb 2025•Technology
11 Mar 2025•Policy and Regulation

02 Apr 2025•Policy and Regulation

1
Technology

2
Policy and Regulation

3
Science and Research
