91 Sources
[1]
Book authors made the wrong arguments in Meta AI training case, judge says
Soon after a landmark ruling deemed that when Anthropic copied books to train artificial intelligence models, it was a "transformative" fair use, another judge has arrived at the same conclusion in a case pitting book authors against Meta. But that doesn't necessarily mean the judges are completely in agreement, and that could soon become a problem for not just Meta, but other big AI companies celebrating the pair of wins this week. On Wednesday, Judge Vince Chhabria explained that he sided with Meta, despite his better judgment, mainly because the authors made all the wrong arguments in their case against Meta. "This ruling does not stand for the proposition that Meta's use of copyrighted materials to train its language models is lawful," Chhabria wrote. "It stands only for the proposition that these plaintiffs made the wrong arguments and failed to develop a record in support of the right one." Rather than argue that Meta's Llama AI models risked rapidly flooding their markets with competing AI-generated books that could indirectly harm sales, authors fatally only argued "that users of Llama can reproduce text from their books, and that Meta's copying harmed the market for licensing copyrighted materials to companies for AI training." Because Chhabria found both of these theories "flawed" -- the former because Llama cannot produce long excerpts of works, even with adversarial prompting, and the latter because authors are not entitled to monopolize the market for licensing books for AI training -- he said he had no choice but to grant Meta's request for summary judgment. Ultimately, because authors introduced no evidence that Meta's AI threatened to dilute their markets, Chhabria ruled that Meta did enough to overcome authors' other arguments regarding alleged harms by simply providing "its own expert testimony explaining that Llama 3's release did not have any discernible effect on the plaintiffs' sales." Chhabria seemed to criticize authors for raising a "half-hearted" defense of their works, noting that his opinion "may be in significant tension with reality," where it seems "possible, even likely, that Llama will harm the book sale market."
[2]
Key fair use ruling clarifies when books can be used for AI training
Artificial intelligence companies don't need permission from authors to train their large language models (LLMs) on legally acquired books, US District Judge William Alsup ruled Monday. The first-of-its-kind ruling that condones AI training as fair use will likely be viewed as a big win for AI companies, but it also notably put on notice all the AI companies that expect the same reasoning will apply to training on pirated copies of books -- a question that remains unsettled. In the specific case that Alsup is weighing -- which pits book authors against Anthropic -- Alsup found that "the purpose and character of using copyrighted works to train LLMs to generate new text was quintessentially transformative" and "necessary" to build world-class AI models. Importantly, this case differs from other lawsuits where authors allege that AI models risk copying and distributing their work. Because authors suing Anthropic did not allege that any of Anthropic's outputs reproduced their works or expressive style, Alsup found there was no threat that Anthropic's text generator, Claude, might replace authors in their markets. And that lacking argument did tip the fair use analysis in favor of Anthropic. "Like any reader aspiring to be a writer, Anthropic's LLMs trained upon works not to race ahead and replicate or supplant them -- but to turn a hard corner and create something different," Alsup wrote. Alsup's ruling surely disappointed authors, who instead argued that Claude's reliance on their texts could generate competing summaries or alternative versions of their stories. The judge claimed these complaints were akin to arguing "that training schoolchildren to write well would result in an explosion of competing works." "This is not the kind of competitive or creative displacement that concerns the Copyright Act," Alsup wrote. "The Act seeks to advance original works of authorship, not to protect authors against competition." Alsup noted that authors would be able to raise new claims if they found evidence of infringing Claude outputs. That could change the fair use calculus, as it might in a case where a judge recently suggested that Meta's AI products might be "obliterating" authors' markets for works. "Authors concede that training LLMs did not result in any exact copies nor even infringing knockoffs of their works being provided to the public," Alsup wrote. "If that were not so, this would be a different case. Authors remain free to bring that case in the future should such facts develop." Anthropic must face trial over book piracy Anthropic is "pleased" with the ruling, issuing a statement applauding the court for recognizing "that using 'works to train LLMs was transformative -- spectacularly so.'" But Anthropic is not off the hook, granted summary judgment on AI training as fair use, but is still facing a trial over piracy that Alsup ruled did not favor a fair use finding. In the Anthropic case, the AI company is accused of downloading 7 million pirated books to build a research library where copies would be kept "forever" regardless of whether they were ever used for AI training. Seemingly realizing piracy may trigger legal challenges, Anthropic had later tried to replace pirated books with legally purchased copies. But the company also argued that even the initial copying of these pirated books was an "intermediary" step necessary to advance the transformative use of training AI. And perhaps at its least persuasive, Anthropic also argued that because it could have borrowed the books it stole, the theft alone shouldn't "short-circuit" the fair use analysis. But Alsup was not swayed by those arguments, noting that copying books from a pirate site is copyright infringement, "full stop." He rejected "Anthropic's assumption that the use of the copies for a central library can be excused as fair use merely because some will eventually be used to train LLMs," and he cast doubt on whether any of the other AI lawsuits debating piracy could ever escape without paying damages. "This order doubts that any accused infringer could ever meet its burden of explaining why downloading source copies from pirate sites that it could have purchased or otherwise accessed lawfully was itself reasonably necessary to any subsequent fair use," Alsup wrote. "Such piracy of otherwise available copies is inherently, irredeemably infringing even if the pirated copies are immediately used for the transformative use and immediately discarded." But Alsup said that the Anthropic case may not even need to decide on that, since Anthropic's retention of pirated books for its research library alone was not transformative. Alsup wrote that Anthropic's argument to hold onto potential AI training material it pirated in case it ever decided to use it for AI training was an attempt to "fast glide over thin ice." Additionally Alsup pointed out that Anthropic's early attempts to get permission to train on authors' works withered, as internal messages revealed the company concluded that stealing books was considered the more cost-effective path to innovation "to avoid 'legal/practice/business slog,' as cofounder and chief executive officer Dario Amodei put it." "Anthropic is wrong to suppose that so long as you create an exciting end product, every 'back-end step, invisible to the public,' is excused," Alsup wrote. "Here, piracy was the point: To build a central library that one could have paid for, just as Anthropic later did, but without paying for it." To avoid maximum damages in the event of a loss, Anthropic will likely continue arguing that replacing pirated books with purchased books should water down authors' fight, Alsup's order suggested. "That Anthropic later bought a copy of a book it earlier stole off the Internet will not absolve it of liability for the theft, but it may affect the extent of statutory damages," Alsup noted.
[3]
A federal judge sides with Anthropic in lawsuit over training AI on books without authors' permission | TechCrunch
Federal judge William Alsup ruled that it was legal for Anthropic to train its AI models on published books without the authors' permission. This marks the first time that the courts have given credence to AI companies' claim that fair use doctrine can absolve AI companies from fault when they use copyrighted materials to train LLMs. This decision comes as a blow to authors, artists, and publishers who have brought dozens of lawsuits against companies like OpenAI, Meta, Midjourney, Google, and more. While the ruling is not a guarantee that other judges will follow Judge Alsup's lead, it lays the foundations for a precedent that would side with tech companies over creatives. These lawsuits often depend on how a judge interprets fair use doctrine, a notoriously finicky carve out of copyright law that hasn't been updated since 1976 -- a time before the internet, let alone the concept of generative AI training sets. Fair use rulings take into account what the work is being used for (parody and education can be viable), whether or not it's being reproduced for commercial gain (you can write Star Wars fan fiction, but you can't sell it), and how transformative a derivative work is from the original. Companies like Meta have made similar fair use arguments in defense of training on copyrighted works, though before this week's decision, it was less clear how the courts would sway. In this particular case of Bartz v. Anthropic, the group of plaintiff authors also brought into question the manner in which Anthropic attained and stored their works. According to the lawsuit, Anthropic sought to create a "central library" of "all the books in the world" to keep "forever." But millions of these copyrighted books were downloaded for free from pirate sites, which is unambiguously illegal. While the judge granted that Anthropic's training of these materials was a fair use, the court will hold a trial about the nature of the "central library." "We will have a trial on the pirated copies used to create Anthropic's central library and the resulting damages," Judge Alsup wrote in the decision. "That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for theft but it may affect the extent of statutory damages."
[4]
Federal judge sides with Meta in lawsuit over training AI models on copyrighted books
A federal judge sided with Meta on Wednesday in a lawsuit brought against the company by 13 book authors, including Sarah Silverman, that alleged the company had illegally trained its AI models on their copyrighted works. Federal Judge Vince Chhabria issued a summary judgment -- meaning the judge was able to decide on the case without sending it to a jury -- in favor of Meta, finding that the company's training of AI models on copyrighted books in this case fell under the "fair use" doctrine of copyright law and thus was legal. The decision comes just a few days after a federal judge sided with Anthropic in a similar lawsuit. Together, these cases are shaping up to be a win for the tech industry, which has spent years in legal battles with media companies arguing that training AI models on copyrighted works is fair use. However, these decisions aren't the sweeping wins some companies hoped for -- both judges noted that their cases were limited in scope. Judge Chhabria made clear that this decision does not mean that all AI model training on copyrighted works is legal, but rather that the plaintiffs in this case "made the wrong arguments" and failed to develop sufficient evidence in support of the right ones. "This ruling does not stand for the proposition that Meta's use of copyrighted materials to train its language models is lawful," Judge Chhabria said in his decision. Later, he said, "In cases involving uses like Meta's, it seems like the plaintiffs will often win, at least where those cases have better-developed records on the market effects of the defendant's use." Judge Chhabria ruled that Meta's use of copyrighted works in this case was transformative -- meaning the company's AI models did not merely reproduce the authors' books. Furthermore, the plaintiffs failed to convince the judge that Meta's copying of the books harmed the market for those authors, which is a key factor in determining whether copyright law has been violated. "The plaintiffs presented no meaningful evidence on market dilution at all," said Judge Chhabria. Both Anthropic and Meta's wins involve training AI models on books, but there are several other active lawsuits against technology companies for training AI models on other copyrighted works. For instance, The New York Times is suing OpenAI and Microsoft for training AI models on news articles, while Disney and Universal are suing Midjourney for training AI models on films. Judge Chhabria noted in his decision that fair use defenses depend heavily on the details of a case, and some industries may have stronger fair use arguments than others. "It seems that markets for certain types of works (like news articles) might be even more vulnerable to indirect competition from AI outputs," said Chhabria.
[5]
What comes next for AI copyright lawsuits?
The use of copyrighted works to train models is at the heart of a bitter battle between tech companies and content creators. That battle is playing out in technical arguments about what does and doesn't count as fair use of a copyrighted work. But it is ultimately about carving out a space in which human and machine creativity can continue to coexist. There are dozens of similar copyright lawsuits working through the courts right now, with cases filed against all the top players -- not only Anthropic and Meta but Google, OpenAI, Microsoft, and more. On the other side, plaintiffs range from individual artists and authors to large companies like Getty and the New York Times. The outcomes of these cases are set to have an enormous impact on the future of AI. In effect, they will decide whether or not model makers can continue ordering up a free lunch. If not, they will need to start paying for such training data via new kinds of licensing deals -- or even find new ways to train their models. Those prospects could upend the industry. And that's why last week's wins for the technology companies matter. So: Cases closed? Not quite. If you drill into the details, the rulings are less cut-and-dried than they seem at first. Let's take a closer look. In both cases, a group of authors (the Anthropic suit was a class action; 13 plaintiffs sued Meta, including high-profile names such as Sarah Silverman and Ta-Nehisi Coates) set out to prove that a technology company had violated their copyright by using their books to train large language models. And in both cases, the companies argued that this training process counted as fair use, a legal provision that permits the use of copyrighted works for certain purposes. There the similarities end. Ruling in Anthropic's favor, senior district judge William Alsup argued on June 23 that the firm's use of the books was legal because what it did with them was transformative, meaning that it did not replace the original works but made something new from them. "The technology at issue was among the most transformative many of us will see in our lifetimes," Alsup wrote in his judgment. In Meta's case, district judge Vince Chhabria made a different argument. He also sided with the technology company, but he focused his ruling instead on the issue of whether or not Meta had harmed the market for the authors' work. Chhabria said that he thought Alsup had brushed aside the importance of market harm. "The key question in virtually any case where a defendant has copied someone's original work without permission is whether allowing people to engage in that sort of conduct would substantially diminish the market for the original," he wrote on June 25. Same outcome; two very different rulings. And it's not clear exactly what that means for the other cases. On the one hand, it bolsters at least two versions of the fair-use argument. On the other, there's some disagreement over how fair use should be decided. But there are even bigger things to note. Chhabria was very clear in his judgment that Meta won not because it was in the right, but because the plaintiffs failed to make a strong enough argument. "In the grand scheme of things, the consequences of this ruling are limited," he wrote. "This is not a class action, so the ruling only affects the rights of these 13 authors -- not the countless others whose works Meta used to train its models. And, as should now be clear, this ruling does not stand for the proposition that Meta's use of copyrighted materials to train its language models is lawful." That reads a lot like an invitation for anyone else out there with a grievance to come and have another go.
[6]
Anthropic Scores a Landmark AI Copyright Win -- but Will Face Trial Over Piracy Claims
While the startup has won its "fair use" argument, it potentially faces billions of dollars in damages for allegedly pirating over 7 million books to build a digital library. Anthropic has scored a major victory in an ongoing legal battle over artificial intelligence models and copyright, one that may reverberate across the dozens of other AI copyright lawsuits winding through the legal system in the United States. A court has determined that it was legal for Anthropic to train its AI tools on copyrighted works, arguing that the behavior is shielded by the "fair use" doctrine, which allows for unauthorized use of copyrighted materials under certain conditions. "The training use was a fair use," senior district judge William Alsup wrote in a summary judgement order released late Monday evening. In copyright law, one of the main ways courts determine whether using copyrighted works without permission is fair use is to examine whether the use was "transformative," which means that it is not a substitute for the original work but rather something new. "The technology at issue was among the most transformative many of us will see in our lifetimes," Alsup wrote. "This is the first major ruling in a generative AI copyright case to address fair use in detail," says Chris Mammen, a managing partner at Womble Bond Dickinson who focuses on intellectual property law. "Judge Alsup found that training an LLM is transformative use -- even when there is significant memorization. He specifically rejected the argument that what humans do when reading and memorizing is different in kind from what computers do when training an LLM." The case, a class action lawsuit brought by book authors who alleged that Anthropic had violated their copyright by using their works without permission, was first filed in August 2024 in the US District Court for the Northern District of California. Anthropic is the first artificial intelligence company to win this kind of battle, but the victory comes with a large asterisk attached. While Alsup found that Anthropic's training was fair use, he ruled that the authors could take Anthropic to trial over pirating their works. While Anthropic eventually shifted to training on purchased copies of the books, it had nevertheless first collected and maintained an enormous library of pirated materials. "Anthropic downloaded over seven million pirated copies of books, paid nothing, and kept these pirated copies in its library even after deciding it would not use them to train its AI (at all or ever again). Authors argue Anthropic should have paid for these pirated library copies. This order agrees," Alsup writes. "We will have a trial on the pirated copies used to create Anthropic's central library and the resulting damages," the order concludes. Anthropic did not immediately respond to requests for comment. Lawyers for the plaintiffs declined to comment. The lawsuit, Bartz v. Anthropic was first filed less than a year ago; Anthropic asked for summary judgement on the fair use issue in February. It's notable that Alsup has far more experience with fair use questions than the average federal judge, as he presided over the initial trial in Google v. Oracle, a landmark case about tech and copyright that eventually went before the Supreme Court.
[7]
Meta Wins Blockbuster AI Copyright Case -- but There's a Catch
A federal judge ruled that Meta did not violate the law when it trained its AI models on 13 authors' books. Meta scored a major victory in a copyright lawsuit on Wednesday when a federal judge ruled that the company did not violate the law when it trained its AI tools on 13 authors' books without permission. "The Court has no choice but to grant summary judgment to Meta on the plaintiffs' claim that the company violated copyright law by training its models with their books," wrote US District Court Judge Vince Chhabria in a summary judgment. He concluded that the plaintiffs did not present sufficient evidence that Meta's use of their books was harmful. In 2023, a high-profile group of authors, including the comedian Sarah Silverman, sued Meta, alleging that the tech behemoth had infringed on their copyright by training its large language models on their work. Kadrey v. Meta was one of the first cases of its kind; now there are dozens of similar AI copyright lawsuits winding through US courts. Chhabria had previously stressed that he planned to look carefully at whether the plaintiffs had enough evidence to show that Meta's use of their work would hurt them financially. "The key question in virtually any case where a defendant has copied someone's original work without permission is whether allowing people to engage in that sort of conduct would substantially diminish the market for the original," he wrote in the judgment on Wednesday. This is the second major ruling in the AI copyright world this week; on Monday, Judge William Alsup ruled that Anthropic's use of copyrighted materials to train its own AI tools was legal. Chhabria referenced Alsup's summary judgment in his decision. Chhabria took pains to stress that his ruling was based on the specific set of facts in this case -- leaving the door open for other authors to sue Meta for copyright infringement in the future. "In the grand scheme of things, the consequences of this ruling are limited. This is not a class action, so the ruling only affects the rights of these 13 authors -- not the countless others whose works Meta used to train its models," he wrote. "And, as should now be clear, this ruling does not stand for the proposition that Meta's use of copyrighted materials to train its language models is lawful."
[8]
Judge OKs Anthropic's Use of Copyrighted Books in AI Training. That's Bad News for Creators
Katelyn is a writer with CNET covering social media, AI and online services. She graduated from the University of North Carolina at Chapel Hill with a degree in media and journalism. You can often find her with a novel and an iced coffee during her time off. Anthropic's use of copyright-protected books in its AI training process was "exceedingly transformative" and fair use, US senior district judge William Alsup ruled on Monday. It's the first time a judge has decided in favor of an AI company on the issue of fair use, in a significant win for generative AI companies and a blow for creators. Fair use is a doctrine that's part of US copyright law. It's a four-part test that, when the criteria is met, lets people and companies use protected content without the rights holder's permission for specific purposes, like when writing a term paper. Tech companies say that fair use exceptions are essential in order for them to access the massive quantities of human-generated content they need to develop the most advanced AI systems. Writers, actors and many other kinds of creators have been equally clear in arguing that the use of their content to propel AI is not fair use. Publishers, artists and content catalog owners have filed lawsuits alleging that AI companies like OpenAI, Meta and Midjourney are infringing on their protected intellectual property in attempt to circumvent costly, but standard, licensing procedures. (Disclosure: Ziff Davis, CNET's parent company, in April filed a lawsuit against OpenAI, alleging it infringed Ziff Davis copyrights in training and operating its AI systems.) The authors suing Anthropic for copyright infringement say their books were also obtained illegally -- that is, the books were pirated. That leads to the second part of Alsup's ruling, based on his concerns about Anthropic's methods of obtaining the books. In the ruling, he writes that Anthropic co-founder Ben Mann knowingly downloaded unauthorized copies of 5 million books from LibGen and an additional 2 million from Pirate Library Mirror (PirLiMi). The ruling also outlines how Anthropic deliberately obtained print copies of the books it previously pirated in order to create "its own catalog of bibliographic metadata." Anthropic vice president Tom Turvey, the ruling says, was "tasked with obtaining 'all the books in the world' while still avoiding as much 'legal/practice/business slog.'" That meant buying physical books from publishers to create a digital database. The Anthropic team destroyed and discarded millions of used books in this process in order to prep them for machine-readable scanning, by stripping them from their bindings and cutting them down to fit. Anthropic's acquisition and digitization of the print books was fair use, the ruling says. But it adds: "Creating a permanent, general-purpose library was not itself a fair use excusing Anthropic's piracy." Alsup ordered a new trial regarding the pirated library. Anthropic is one of many AI companies facing copyright claims in court, so this week's ruling is likely to have massive ripple effects across the industry. We'll have to see how the piracy claims resolve before we know how much money Anthropic may be ordered to pay in damages. But if the scales tip to grant multiple AI companies fair use exceptions, the creative industry and the people who work in it will certainly suffer damages, too.
[9]
Meta Scores AI Fair Use Court Victory, but Judge Warns Such Wins Won't Always Be the Case
Katelyn is a writer with CNET covering social media, AI and online services. She graduated from the University of North Carolina at Chapel Hill with a degree in media and journalism. You can often find her with a novel and an iced coffee during her time off. AI companies scored another victory in court this week. Meta on Wednesday won a motion for partial summary judgment in its favor in Kadrey v. Meta, a case brought by 13 authors alleging the company infringed on their copyright protections by illegally using their books to train its Llama AI models. The ruling comes two days after a similar victory for Claude maker Anthropic. But Judge Vince Chhabria stressed in his order that this ruling should be limited and doesn't absolve Meta of future claims from other authors. "This ruling does not stand for the proposition that Meta's use of copyrighted materials to train its language models is lawful," he wrote. "It stands only for the proposition that these plaintiffs made the wrong arguments and failed to develop a record in support of the right one." The issue at the heart of the cases is whether the AI companies' use of protected content for AI training qualifies as fair use. The fair use doctrine is a fundamental part of US copyright law that allows people to use copyrighted work without the rights holders' explicit permission, like in education and journalism. There are four key considerations when evaluating whether something is fair use. Anthropic's ruling focused on transformativeness, while Meta's focused on the effect the use of AI has on the existing publishing market. These rulings are big wins for AI companies. OpenAI, Google and others have been fighting for fair use so they don't have to enter costly and lengthy licensing agreements with content creators, much to the chagrin of content creators. For the authors bringing these cases, they may see some victories in subsequent piracy trials (for Anthropic) or new lawsuits. (Disclosure: Ziff Davis, CNET's parent company, in April filed a lawsuit against OpenAI, alleging it infringed Ziff Davis copyrights in training and operating its AI systems.) In his analysis, Chhabria focused on the effect AI-generated books have on the existing publishing market, which he saw as the most important factor of the four needed to prove fair use. He wrote extensively about the risk that generative AI and large language models could potentially violate copyright law, and that fair use needs to be evaluated on a case-by-case basis. Some works, like autobiographies and classic literature such as The Catcher in the Rye, likely couldn't be created with AI, he wrote. However, he noted that "the market for the typical human-created romance or spy novel could be diminished substantially by the proliferation of similar AI-created works." In other words, AI slop could make human-written books seem less valuable and undercut authors' willingness and ability to create. Still, Chhabria said that the plaintiffs did not show sufficient evidence to prove harm from how "Meta's models would dilute the market for their own works." The plaintiffs focused their arguments on how Meta's AI models can reproduce exact snippets from their works and how the company's Llama models hurt their ability to license their books to AI companies. These arguments weren't as compelling in Chhabria's eyes -- he called them "clear losers" -- so he sided with Meta. That's different from the Anthropic ruling, where Judge William Alsup focused on the "exceedingly transformative" nature of the use of the plantiff's books in the results AI chatbots spit out. Chhabria wrote that while "there is no disputing" the use of copyrighted material was transformative, the more urgent question was the effect AI systems had on the ecosystem as a whole. Alsup also outlined concerns about Anthropic's methods of obtaining the books, through illegal online libraries and then by deliberating purchasing print copies to digitize for a "research library." Two court rulings do not make every AI company's use of content legal under fair use. What makes these cases notable is that they are the first to issue substantive legal analyses on the issue; AI companies and publishers have been duking it out in court for years now. But just as Chhabria referenced and responded to the Anthropic ruling, all judges use past cases with similar situations as reference points. They don't have to come to the same conclusion, but the role of precedent is important. It's likely that we'll see these two rulings referenced in other AI and copyright/piracy cases. But we'll have to wait and see how big of an effect these rulings will play in future cases -- and whether it's the warnings or greenlights that hold the most weight in future decisions.
[10]
Meta Won Its AI Fair Use Lawsuit, but Judge Says Authors Are Likely 'to Often Win' Going Forward
AI companies scored another victory in court this week. Meta on Wednesday won a motion for partial summary judgment in its favor in Kadrey v. Meta, a case brought on by 13 authors alleging the company infringed on their copyright protections by illegally using their books to train its Llama AI models. The ruling comes two days after a similar victory for Claude maker Anthropic. But Judge Vince Chhabria stressed in his order that this ruling should be limited and doesn't absolve Meta of future claims from other authors. "This ruling does not stand for the proposition that Meta's use of copyrighted materials to train its language models is lawful," he wrote. "It stands only for the proposition that these plaintiffs made the wrong arguments and failed to develop a record in support of the right one." The issue at the heart of the cases is whether the AI companies' use of protected content for AI training qualifies as fair use. The fair use doctrine is a fundamental part of US copyright law that allows people to use copyrighted work without the rights holders' explicit permission, like in education and journalism. There are four key considerations when evaluating whether something is fair use. Anthropic's ruling focused on transformativeness, while Meta's focused on the effect the use of AI has on the existing publishing market. These rulings are big wins for AI companies. OpenAI, Google and others have been fighting for fair use so they don't have to enter costly and lengthy licensing agreements with content creators, much to the chagrin of content creators. A group of famous authors signed an open letter on Friday, urging publishers to take a stronger stance against AI and avoid using it. "The purveyors of AI have stolen our work from us and from our publishers, too," the letter reads. The authors call out how AI is trained on their work, without permission and compensation, and yet the programs will never be able to connect with humans like real humans can. For the authors bringing these lawsuits, they may see some victories in subsequent piracy trials (for Anthropic) or new lawsuits. But concerns abound about the overall effect AI will have on writers now and in the future, which is something Chhabria also recognized in his order. (Disclosure: Ziff Davis, CNET's parent company, in April filed a lawsuit against OpenAI, alleging it infringed Ziff Davis copyrights in training and operating its AI systems.) In his analysis, Chhabria focused on the effect AI-generated books have on the existing publishing market, which he saw as the most important factor of the four needed to prove fair use. He wrote extensively about the risk that generative AI and large language models could potentially violate copyright law, and that fair use needs to be evaluated on a case-by-case basis. Some works, like autobiographies and classic literature such as The Catcher in the Rye, likely couldn't be created with AI, he wrote. However, he noted that "the market for the typical human-created romance or spy novel could be diminished substantially by the proliferation of similar AI-created works." In other words, AI slop could make human-written books seem less valuable and undercut authors' willingness and ability to create. Still, Chhabria said that the plaintiffs did not show sufficient evidence to prove harm from how "Meta's models would dilute the market for their own works." The plaintiffs focused their arguments on how Meta's AI models can reproduce exact snippets from their works and how the company's Llama models hurt their ability to license their books to AI companies. These arguments weren't as compelling in Chhabria's eyes -- he called them "clear losers" -- so he sided with Meta. That's different from the Anthropic ruling, where Judge William Alsup focused on the "exceedingly transformative" nature of the use of the plaintiff's books in the results AI chatbots spit out. Chhabria wrote that while "there is no disputing" that the use of copyrighted material was transformative, the more urgent question was the effect AI systems had on the ecosystem as a whole. Alsup also outlined concerns about Anthropic's methods of obtaining the books, through illegal online libraries and then by deliberating purchasing print copies to digitize for a "research library." Two court rulings do not make every AI company's use of content legal under fair use. What makes these cases notable is that they are the first to issue substantive legal analyses on the issue; AI companies and publishers have been duking it out in court for years now. But just as Chhabria referenced and responded to the Anthropic ruling, all judges use past cases with similar situations as reference points. They don't have to come to the same conclusion, but the role of precedent is important. It's likely that we'll see these two rulings referenced in other AI and copyright/piracy cases. But we'll have to wait and see how big of an effect these rulings will play in future cases -- and whether it's the warnings or greenlights that hold the most weight in future decisions.
[11]
Anthropic's AI Training on Books Is Fair Use, Judge Rules. Authors Are More Worried Than Ever
Claude maker Anthropic's use of copyright-protected books in its AI training process was "exceedingly transformative" and fair use, US senior district judge William Alsup ruled on Monday. It's the first time a judge has decided in favor of an AI company on the issue of fair use, in a significant win for generative AI companies and a blow for creators. Two days later, Meta won part of its fair use case. Fair use is a doctrine that's part of US copyright law. It's a four-part test that, when the criteria is met, lets people and companies use protected content without the rights holder's permission for specific purposes, like when writing a term paper. Tech companies say that fair use exceptions are essential in order for them to access the massive quantities of human-generated content they need to develop the most advanced AI systems. Writers, actors and many other kinds of creators have been equally clear in arguing that the use of their work to propel AI is not fair use. On Friday, a group of famous authors signed an open letter to publishers urging the companies to pledge never to replace human writers, editors and audiobook narrators with AI and to avoid using AI throughout the publishing process. The signees include Victoria Aveyard, Emily Henry, R.F. Kuang, Ali Hazelwood, Jasmine Guillory, Colleen Hoover and others. "[Our] stories were stolen from us and used to train machines that, if short-sighted capitalistic greed wins, could soon be generating the books that fill our bookstores," the letter reads. "Rather than paying writers a small percentage of the money our work makes for them, someone else will be paid for a technology built on our unpaid labor." The letter is just the latest in a series of battles between authors and AI companies. Publishers, artists and content catalog owners have filed lawsuits alleging that AI companies like OpenAI, Meta and Midjourney are infringing on their protected intellectual property in attempt to circumvent costly, but standard, licensing procedures. (Disclosure: Ziff Davis, CNET's parent company, in April filed a lawsuit against OpenAI, alleging it infringed Ziff Davis copyrights in training and operating its AI systems.) The authors suing Anthropic for copyright infringement say their books were also obtained illegally -- that is, the books were pirated. That leads to the second part of Alsup's ruling, based on his concerns about Anthropic's methods of obtaining the books. In the ruling, he writes that Anthropic co-founder Ben Mann knowingly downloaded unauthorized copies of 5 million books from LibGen and an additional 2 million from Pirate Library Mirror (PirLiMi). The ruling also outlines how Anthropic deliberately obtained print copies of the books it previously pirated in order to create "its own catalog of bibliographic metadata." Anthropic vice president Tom Turvey, the ruling says, was "tasked with obtaining 'all the books in the world' while still avoiding as much 'legal/practice/business slog.'" That meant buying physical books from publishers to create a digital database. The Anthropic team destroyed and discarded millions of used books in this process in order to prep them for machine-readable scanning, by stripping them from their bindings and cutting them down to fit. Anthropic's acquisition and digitization of the print books was fair use, the ruling says. But it adds: "Creating a permanent, general-purpose library was not itself a fair use excusing Anthropic's piracy." Alsup ordered a new trial regarding the pirated library. Anthropic is one of many AI companies facing copyright claims in court, so this week's ruling is likely to have massive ripple effects across the industry. We'll have to see how the piracy claims resolve before we know how much money Anthropic may be ordered to pay in damages. But if the scales tip to grant multiple AI companies fair use exceptions, the creative industry and the people who work in it will certainly suffer damages, too.
[12]
Meta's AI copyright win comes with a warning about fair use
Jay Peters is a news editor covering technology, gaming, and more. He joined The Verge in 2019 after nearly two years at Techmeme. Meta won a major legal ruling in an AI copyright lawsuit brought by 13 authors alleging that the company illegally trained its AI systems on their work without permission. On Wednesday, Judge Vince Chhabria ruled in Meta's favor, saying it is "entitled to summary judgment on its fair use defense to the claim that copying these plaintiffs' books for use as LLM training data was infringement." However, the judge also pointed out some weak points in the ecosystem of Big Tech's AI efforts and Meta's arguments defending its actions as fair use. "This ruling does not stand for the proposition that Meta's use of copyrighted materials to train its language models is lawful," Judge Chhabria said. "It stands only for the proposition that these plaintiffs made the wrong arguments and failed to develop a record in support of the right one." The ruling follows Anthropic's major fair use victory it won from a separate federal judge yesterday, who ruled that training its models on legally purchased copies of books is fair use. Judge Chhabria says that two of the authors' arguments about fair use were "clear losers:" the ability for Meta's Llama AI to reproduce snippets of text from their books and that Meta using their works to train its AI models without permission diluted their ability to license their works for training. "Llama is not capable of generating enough text from the plaintiffs' books to matter, and the plaintiffs are not entitled to the market for licensing their works as AI training data," the judge wrote. The plaintiffs didn't do enough for a "potentially winning argument" that Meta's copying would create "a product that will likely flood the market with similar works, causing market dilution," according to Judge Chhabria. He also discussed the Anthropic ruling, saying that Judge William Alsup brushed aside concerns about the harm generative AI could "inflict on the market for the works it gets trained on."
[13]
Did AI companies win a fight with authors? Technically
Adi Robertson is a senior tech and policy editor focused on VR, online platforms, and free expression. Adi has covered video games, biohacking, and more for The Verge since 2011. In the past week, big AI companies have -- in theory -- chalked up two big legal wins. But things are not quite as straightforward as they may seem, and copyright law hasn't been this exciting since last month's showdown at the Library of Congress. First, Judge William Alsup ruled it was fair use for Anthropic to train on a series of authors' books. Then, Judge Vince Chhabria dismissed another group of authors' complaint against Meta for training on their books. Yet far from settling the legal conundrums around modern AI, these rulings might have just made things even more complicated. Both cases are indeed qualified victories for Meta and Anthropic. And at least one judge -- Alsup -- seems sympathetic to some of the AI industry's core arguments about copyright. But that same ruling railed against the startup's use of pirated media, leaving it potentially on the hook for massive financial damage. (Anthropic even admitted it did not initially purchase a copy of every book it used.) Meanwhile, the Meta ruling asserted that because a flood of AI content could crowd out human artists, the entire field of AI system training might be fundamentally at odds with fair use. And neither case addressed one of the biggest questions about generative AI: when does its output infringe copyright, and who's on the hook if it does? Alsup and Chhabria (incidentally both in the Northern District of California) were ruling on relatively similar sets of facts. Meta and Anthropic both pirated huge collections of copyright-protected books to build a training dataset for their large language models Llama and Claude. Anthropic later did an about-face and started legally purchasing books, tearing the covers off to "destroy" the original copy, and scanning the text. The authors argued that, in addition to the initial piracy, the training process constituted an unlawful and unauthorized use of their work. Meta and Anthropic countered that this database-building and LLM-training constituted fair use. Both judges basically agreed that LLMs meet one central requirement for fair use: they transform the source material into something new. Alsup called using books to train Claude "exceedingly transformative," and Chhabria concluded "there's no disputing" the transformative value of Llama. Another big consideration for fair use is the new work's impact on a market for the old one. Both judges also agreed that based on the arguments made by the authors, the impact wasn't serious enough to tip the scale. Add those things together, and the conclusions were obvious... but only in the context of these cases, and in Meta's case, because the authors pushed a legal strategy that their judge found totally inept. Put it this way: when a judge says his ruling "does not stand for the proposition that Meta's use of copyrighted materials to train its language models is lawful" and "stands only for the proposition that these plaintiffs made the wrong arguments and failed to develop a record in support of the right one" -- as Chhabria did -- AI companies' prospects in future lawsuits with him don't look great. Both rulings dealt specifically with training -- or media getting fed into the models -- and didn't reach the question of LLM output, or the stuff models produce in response to user prompts. But output is, in fact, extremely pertinent. A huge legal fight between The New York Times and OpenAI began partly with a claim that ChatGPT could verbatim regurgitate large sections of Times stories. Disney recently sued Midjourney on the premise that it "will generate, publicly display, and distribute videos featuring Disney's and Universal's copyrighted characters" with a newly launched video tool. Even in pending cases that weren't output-focused, plaintiffs can adapt their strategies if they now think it's a better bet. The authors in the Anthropic case didn't allege Claude was producing directly infringing output. The authors in the Meta case argued Llama was, but they failed to convince the judge -- who found it wouldn't spit out more than around 50 words of any given work. As Alsup noted, dealing purely with inputs changed the calculations dramatically. "If the outputs seen by users had been infringing, Authors would have a different case," wrote Alsup. "And, if the outputs were ever to become infringing, Authors could bring such a case. But that is not this case." In their current form, major generative AI products are basically useless without output. And we don't have a good picture of the law around it, especially because fair use is an idiosyncratic, case-by-case defense that can apply differently to mediums like music, visual art, and text. Anthropic being able to scan authors' books tells us very little about whether Midjourney can legally help people produce Minions memes. Minions and New York Times articles are both examples of direct copying in output. But Chhabria's ruling is particularly interesting because it makes the output question much, much broader. Though he may have ruled in favor of Meta, Chhabria's entire opening argues that AI systems are so damaging to artists and writers that their harm outweighs any possible transformative value -- basically, because they're spam machines. It's worth reading: Generative AI has the potential to flood the market with endless amounts of images, songs, articles, books, and more. People can prompt generative AI models to produce these outputs using a tiny fraction of the time and creativity that would otherwise be required. So by training generative AI models with copyrighted works, companies are creating something that often will dramatically undermine the market for those works, and thus dramatically undermine the incentive for human beings to create things the old-fashioned way. ... As the Supreme Court has emphasized, the fair use inquiry is highly fact dependent, and there are few bright-line rules. There is certainly no rule that when your use of a protected work is "transformative," this automatically inoculates you from a claim of copyright infringement. And here, copying the protected works, however transformative, involves the creation of a product with the ability to severely harm the market for the works being copied, and thus severely undermine the incentive for human beings to create. ... The upshot is that in many circumstances it will be illegal to copy copyright-protected works to train generative AI models without permission. Which means that the companies, to avoid liability for copyright infringement, will generally need to pay copyright holders for the right to use their materials. And boy, it sure would be interesting if somebody would sue and make that case. After saying that "in the grand scheme of things, the consequences of this ruling are limited," Chhabria helpfully noted this ruling affects only 13 authors, not the "countless others" whose work Meta used. A written court opinion is unfortunately incapable of physically conveying a wink and a nod. Those lawsuits might be far in the future. And Alsup, though he wasn't faced with the kind of argument Chhabria suggested, seemed potentially unsympathetic to it. "Authors' complaint is no different than it would be if they complained that training schoolchildren to write well would result in an explosion of competing works," he wrote of the authors who sued Anthropic. "This is not the kind of competitive or creative displacement that concerns the Copyright Act. The Act seeks to advance original works of authorship, not to protect authors against competition." He was similarly dismissive of the claim that authors were being deprived of licensing fees for training: "such a market," he wrote, "is not one the Copyright Act entitles Authors to exploit." But even Alsup's seemingly positive ruling has a poison pill for AI companies. Training on legally acquired material, he ruled, is classic protected fair use. Training on pirated material is a different story, and Alsup absolutely excoriates any attempt to say it's not. "This order doubts that any accused infringer could ever meet its burden of explaining why downloading source copies from pirate sites that it could have purchased or otherwise accessed lawfully was itself reasonably necessary to any subsequent fair use," he wrote. There were plenty of ways to scan or copy legally acquired books (including Anthropic's own scanning system), but "Anthropic did not do those things -- instead it stole the works for its central library by downloading them from pirated libraries." Eventually switching to book scanning doesn't erase the original sin, and in some ways it actually compounds it, because it demonstrates Anthropic could have done things legally from the start. If new AI companies adopt this perspective, they'll have to build in extra but not necessarily ruinous startup costs. There's the up-front price of buying what Anthropic at one point described as "all the books in the world," plus any media needed for things like images or video. And in Anthropic's case these were physical works, because hard copies of media dodge the kinds of DRM and licensing agreements publishers can put on digital ones -- so add some extra cost for the labor of scanning them in. But just about any big AI player currently operating is either known or suspected to have trained on illegally downloaded books and other media. Anthropic and the authors will be going to trial to hash out the direct piracy accusations, and depending on what happens, a lot of companies could be hypothetically at risk of almost inestimable financial damages -- not just from authors, but from anyone that demonstrates their work was illegally acquired. As legal expert Blake Reid vividly puts it, "if there's evidence that an engineer was torrenting a bunch of stuff with C-suite blessing it turns the company into a money piñata." And on top of all that, the many unsettled details can make it easy to miss the bigger mystery: how this legal wrangling will affect both the AI industry and the arts. Echoing a common argument among AI proponents, former Meta executive Nick Clegg said recently that getting artists' permission for training data would "basically kill the AI industry." That's an extreme claim, and given all the licensing deals companies are already striking (including with Vox Media, the parent company of The Verge), it's looking increasingly dubious. Even if they're faced with piracy penalties thanks to Alsup's ruling, the biggest AI companies have billions of dollars in investment -- they can weather a lot. But smaller, particularly open source players might be much more vulnerable, and many of them are also almost certainly trained on pirated works. Meanwhile, if Chhabria's theory is right, artists could reap a reward for providing training data to AI giants. But it's highly unlikely the fees would shut these services down. That would still leave us in a spam-filled landscape with no room for future artists. Can money in the pockets of this generation's artists compensate for the blighting of the next? Is copyright law the right tool to protect the future? And what role should the courts be playing in all this? These two rulings handed partial wins to the AI industry, but they leave many more, much bigger questions unanswered.
[14]
How AI companies are secretly collecting training data from the web (and why it matters)
Like most people, my wife types a search into Google many times each day. We work from home, so our family room doubles as a conference room. Whenever we're in a meeting, and a question about anything comes up, she Googles it. This is the same as it's been for years. But what happens next has changed. Instead of clicking on one of the search result links, she more often than not reads the AI summary. These days, she rarely clicks on any of the sites that provide the original information that Google's AI summarizes. Also: How much energy does AI really use? The answer is surprising - and a little complicated When I spoke to her about this, Denise acknowledged that she actually visits sites less frequently. But she also pointed out that, for topics where she's well-versed, she has noticed the AI is sometimes wrong. She said she takes the AI results with a grain of salt, but they often provide basic enough information that she needs to look no further. If in doubt, she does dig deeper. So that's where we are today. More and more users are like my wife, getting data from the AI and never visiting websites (and therefore never giving content creators a chance to be compensated for their work). Worse, more and more people are trusting AI, so not only are they making it harder for content creators to make a living, but they are often getting hallucinatory or incorrect information. Since they never visit the original sources of information, they have little impetus to cross-check or verify what they read. Cloudflare CEO Matthew Prince offered some devastating statistics. He used the ratio of the number of pages crawled compared to the number of pages fed to readers as a metric. As a baseline, he said that 10 years ago, for every two pages Google crawled, it sent one visitor to a content creator's site. Six months ago, that ratio was six pages crawled to one visitor sent to a content site. Now, just six months later, it's 18 pages crawled to one visitor sent to a content site. The numbers, according to Prince, are far worse for AI sites. AI sites derive substantial value from information they've scraped from all the rest of us. Six months ago, the ratio of pages scraped to visitors redirected via OpenAI was 250 to 1. Now, as people have become more familiar with trusting (or being too lazy to care about inaccuracies), the ratio is 1,500 to 1. In many ways, AI is becoming an existential threat to content creators. By vacuuming up content produced by hard-working teams all across the world, and then feeding that content back as summaries to readers, the publishers and writers are losing revenue and influence. Many creators are also losing motivation, because if they can't make a living doing it, or at least create a following, why bother? Also: AI agents will threaten humans to achieve their goals, Anthropic report finds Some publishers, like Ziff Davis (ZDNET's parent company) and the New York Times, are suing OpenAI for copyright infringement. You've probably seen the disclaimer on ZDNET that says, "Disclosure: Ziff Davis, ZDNET's parent company, filed an April 2025 lawsuit against OpenAI, alleging it infringed Ziff Davis copyrights in training and operating its AI systems." Other publishers, including the Wall Street Journal, the Financial Times, the Atlantic, and the Washington Post, have licensed their content to OpenAI and some other AI large language models. The damage to society as a whole that AI intermediation can cause is profound and worth an article all on its own. But this article is more practical. Here, we acknowledge the threat AI presents to publishing, and focus on technical ways to fight back. In other words, if the AIs can't scrape, they can't give away published and copyrighted content without publishers' permission. The simplest, most direct, and possibly least effective defense is the robots.txt file. This is a file you put at the root of your website's directory. It tells spiders, crawlers, and bots whether they have permission to access your site. This is also called User-Agent filtering. This file has a number of interesting implications. First, only well-behaved crawlers will pay attention to its specifications. It doesn't provide any security against access, so compliance is completely voluntary on the part of the bots. Also: 15 new jobs AI could create - could one be your next gig? Second, you need to be careful which bots you send away. For example, if you use robots.txt to deny access to Googlebot, your site won't get indexed for searching on Google. Say goodbye to all Google referrals. On the other hand, if you use robots.txt to deny access to Google-Extended, you'll block Gemini from indexing and using your site for Gemini training. This site has an index of those bots you might want to deny access to. This is OpenAI's guide on how to prevent OpenAI's bots from crawling your site. But what about web scrapers that ignore robots.txt? How do you prevent them from scraping your site? It's here that site operators need to use a belts-and-suspenders strategy. You're basically in an arms race to find a way to defend against scraping, while the scrapers are trying to find a way to suck down all your site's data. In this section, I'll list a few techniques. This is far from a complete list. Techniques change constantly, both on the part of the defenders and the scrapers. Rate limit requests: Modify your server to limit how many pages can be requested by a given IP address in a period of time. Humans aren't likely to request hundreds of pages per minute. This, like most of the techniques itemized in this section, will differ from server to server, so you'll have to look up your server to find out how to configure this capability. It may also annoy your site's visitors so much that they stop visiting. So, there's that. Use CAPTCHAs: Keep in mind that CAPTCHAs tend to inconvenience users, but they can reduce some types of crawler access to your site. Of course, the irony is that if you're trying to block AI crawlers, it's the AIs that are most likely to be able to defeat the CAPTCHAs. So there's that. Selective IP bans: If you find there are IP ranges that overwhelm your site with access requests, you can ban them at the firewall level. FireHOL (an open source firewall toolset) maintains a blacklist of IP addresses. Most of them are cybersecurity-related, but they can get you started on a block list. Be careful, though. Don't use blanket IP bans, or legitimate visitors will be blocked from your site. So, there's that, too. Also: 5 ways you can plug the widening AI skills gap at your business There are a growing number of anti-scraping services that will attempt to defend your site for a fee. They include: Here's a quick overview of some of the techniques these services use. Behavior matching: This technique analyzes more than headers; it analyzes request behavior. It's essentially a combination of header analysis and bot-by-bot request limiting. JavaScript challenges: Beyond JavaScript-based CAPTCHA, these often run in the background of a web page. They require scripts to execute or measure the pacing of interaction on the page to allow further access. Honeypot traps: These are often elements buried in a web page, like invisible fields or links, that are designed to capture bots. If a bot grabs everything on a site (which a human user is unlikely to do), the honeypot trap recognizes it and initiates a server block. Overall behavioral analysis: This is where AIs are fighting AIs. AIs running on behalf of your website monitor access behavior, and use machine learning to identify access patterns that are not human. Those malicious accesses can then be blocked. Browser fingerprinting: Browsers provide a wide range of data about themselves to the sites they access. Bots generally attempt to spoof the fingerprints of legitimate users. But they often inadvertently provide their own fingerprints, which blocking services can aggregate and then use to block the bots. Decoy traps: These are mazes of decoy pages filled with autogenerated and useless content, linked together in a pattern that causes bots to waste their time or get stuck following links. Most of those are tagged with "nofollow" links, so search engines don't index them or negatively affect your SEO rank. Of course, malicious bots are learning how to identify these traps and counter them, but they do offer limited protection. As an author who makes my living directly from my creative output, I find the prospect of AIs using my work as training data to be offensive. How dare a company like OpenAI make billions off the backs of all of us creatives! They then turn around and provide a product that could potentially put many of us out of work. And yet, I have to acknowledge that AI has saved me time in many different ways. I use a text editor or a word processor every day. But back when I started my career, the publications I wrote for had typesetting operators who converted my written words into publishable content. Now, the blogging tools and content management systems do that work. An entire profession vanished in the space of a few years. Such is the price of new technology. I've been involved with AI innovation for decades. After writing about generative AI since it boomed in early 2023, I'm convinced it's here to stay. Also: The most critical job skill you need to thrive in the AI revolution AI chatbots like Google Gemini and ChatGPT are making token efforts to be good citizens. They scrape all our content and make billions off of it, but they're willing to provide links back to our work for the very few who bother to check sources. Some of the big AI companies contend that they provide value back to publishers. An OpenAI spokesperson told Columbia Journalism Review, "We support publishers and creators by helping 400M weekly ChatGPT users discover quality content through summaries, quotes, clear links, and attribution." Quoted in Digiday, David Carr, senior insights manager at data analytics company Similarweb, said, "ChatGPT sent 243.8 million visits to 250 news and media websites in April 2025, up 98% from 123.2 million visits this January." Those numbers are big, but only without context. Google gets billions of visits a day, and before AI, nearly all those visits resulted in referrals out to other sites. With Google's referral percentages dropping precipitously and OpenAI's referral numbers being a very small percentage of traffic otherwise sent to content producers, the problem is very real. Yes, those links are mere table scraps, but do we block them? If you enable web scraping blocks on your website, will it do anything other than "cut off your nose to spite your face," as my mother used to say? Also: Sam Altman says the Singularity is imminent - here's why Unless every site blocks AI scrapers, effectively locking AI data sets to 2025 and earlier, blocking your own site from the AIs will accomplish little more than preventing you from getting what little traffic there is from the AI services. So should you? In the long term, this practice of AI scraping is unsustainable. If AIs prevent creatives from deriving value from their hard work, the creatives won't have an incentive to keep creating. At that point, the quality of the AI-generated content will begin to decline. It will become a vicious circle, with fewer creatives able to monetize their skills and the AIs providing ever-worsening content quality. So, what do we do about it? If we are to survive into the future, our entire industry needs to ask and attempt to answer that question. If not, welcome to Idiocracy. What about you? Have you taken any steps to block AI bots from scraping your site? Are you concerned about how your content might be used to train generative models? Do you think the trade-off between visibility and protection is worth it? What kinds of tools or services, if any, are you using to monitor or limit scraping? Let us know in the comments below.
[15]
Judge: It's Fair Use to Train AI on Books You Bought, But Not Ones You Pirated
A large language model is as free to read as you and me, a federal judge held Tuesday -- unless that LLM's creators didn't pay for the books used to train that AI system. Judge William Alsup's Tuesday order turns aside part of a class-action lawsuit filed by book authors Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson against the AI firm Anthropic but agrees with one of their key claims. That means Alsup's 32-page opinion could still prove expensive for the company behind the Claude series of AI models. The most important part of Alsup's ruling is that Anthropic has a fair-use defense for digitizing copies of the authors' books that it purchased to train the San Francisco firm's AI models. Calling that an "exceedingly transformative" use, Alsup found that the authors had no more right to demand payment for it than to charge a human reader for learning from their writing. "Everyone reads texts, too, then writes new texts," he wrote. "But to make anyone pay specifically for the use of a book each time they read it, each time they recall it from memory, each time they later draw upon it when writing new things in new ways would be unthinkable." In a later paragraph, Alsup compared the plaintiffs' argument to a complaint that "training schoolchildren to write well would result in an explosion of competing works." He concluded: "This is not the kind of competitive or creative displacement that concerns the Copyright Act." This case, unlike many other recent lawsuits brought against the operators of AI platforms, did not involve any claims that Claude had recreated or recited any copyrighted works: "Authors do not allege that any infringing copy of their works was or would ever be provided to users by the Claude service." Alsup also found that Anthropic did nothing wrong in its original act of book digitization. The company purchased paperback copies of books, scanned and digitized their contents as if they were CDs being ripped to copy to an iPod, and then destroyed the printed originals. "One replaced the other," Alsup writes. "And, there is no evidence that the new, digital copy was shown, shared, or sold outside the company." (Contrast that with the ruling by a panel of judges on a different federal circuit court last September that the Internet Archive had no right to turn digital copies of books it had legally obtained and scanned into e-book loans.) But Anthropic didn't just buy books by the truckload; it also downloaded millions of unauthorized copies of books from online troves of pirated works to speed up training Claude, then kept those copies around just in case. "Every factor points against fair use," Alsup wrote. He found that the company offered no justification "except for Anthropic's pocketbook and convenience." Anthropic's comment to The Verge stuck to the positive parts of Alsup's statement: "We are pleased that the Court recognized that using 'works to train LLMs was transformative -- spectacularly so.'" In October, News Corp. sued Perplexity, alleging that its answers represented a "substitute product" for that conglomerate's own work. In February, Thomson Reuters won a suit against a now-defunct startup called Ross Intelligence that had trained its AI service on the news agency's Westlaw reference to offer a competing service. Earlier in June, Disney and Universal sued the generative-AI image-generation platform Midjourney for offering near-lookalike depictions of those studios' copyrighted characters. PCMag's parent company Ziff Davis is also among the publishers pursuing litigation against AI platforms, having filed a lawsuit against OpenAI in April 2025 alleging it infringed Ziff Davis copyrights in training and operating its AI systems.
[16]
Judge rules mostly for Anthropic in AI book training case
Anthropic scores a qualified victory in fair use case, but got slapped for using over 7 million pirated copies One of the most tech-savvy judges in the US has ruled that Anthropic is within its rights to scan purchased books to train its Claude AI model, but that pirating content is legally out of bounds. In training its model, Anthropic bought millions of books, many second-hand, then cut them up and digitized the content. It also downloaded over 7 million pirated books from Books3 dataset, Library Genesis (Libgen), and the Pirate Library Mirror (PiLiMi), and that was the sticking point for Judge William Alsup of California's Northern District court. On Monday, he ruled that simply digitizing a print copy counted as fair use under current US law, as there was no duplication of the copyrighted work since the printed pages were destroyed after they were scanned. However, Anthropic may have to face trial over the use of pirated material. "As Anthropic trained successive LLMs, it became convinced that using books was the most cost-effective means to achieve a world-class LLM," Alsup wrote [PDF] in Monday's ruling. "During this time, however, Anthropic became 'not so gung ho about' training on pirated books 'for legal reasons.' It kept them anyway." Anthropic became 'not so gung ho about' training on pirated books 'for legal reasons.' The case was filed by three authors - Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson - who claimed that Anthropic illegally used their fiction and non-fiction works to train Claude. At least two of each author's books were included in the pirated material used by Anthropic. Alsup noted that Anthropic hired the former head of partnerships at Google's book-scanning project, Tom Turvey, who began conversations with publishers about licensing content, as other AI developers have done. But these talks were abandoned in favor of simply buying millions of books, taking the pages out, and scanning them, which the judge ruled was fair use. "We are pleased that the Court recognized that using 'works to train LLMs was transformative -- spectacularly so,'" an Anthropic spokesperson told The Register. "Consistent with copyright's purpose in enabling creativity and fostering scientific progress, Anthropic's LLMs trained upon works not to race ahead and replicate or supplant them -- but to turn a hard corner and create something different." On the matter of piracy, however, Alsup noted that in January or February 2021, Anthropic cofounder Ben Mann "downloaded Books3, an online library of 196,640 books that he knew had been assembled from unauthorized copies of copyrighted books -- that is, pirated." In June, he downloaded "at least five million copies of books" from Libgen, and in July 2022, another two million copies were downloaded from PiLiMi, both of which Alsup classified as "pirate libraries." Alsup found that the pirated works weren't necessarily used to train Claude, but that the company had retained them. That could prove legally problematic for the startup, Alsup ruled, since they were kept for "Anthropic's pocketbook and convenience," he found. "This order grants summary judgment for Anthropic that the training use was a fair use. And, it grants that the print-to-digital format change was a fair use for a different reason," he wrote. "But it denies summary judgment for Anthropic that the pirated library copies must be treated as training copies. We will have a trial on the pirated copies used to create Anthropic's central library and the resulting damages, actual or statutory (including for willfulness). That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for the theft but it may affect the extent of statutory damages." Alsup's ruling is mixed news for Anthropic, but he does know his onions. For the last quarter of a century, Alsup has presided over some of the biggest tech trials in history, and his rulings have been backed up by the Supreme Court in some cases. Alsup, a coder for over two decades (primarily in BASIC), presided over the Oracle-Google trial over fair use of Java code in Android, which led him to dabbling in that language. More recently, he sentenced former Google self-driving car engineer Anthony Levandowski to 18 months in prison for stealing proprietary info from his work at Google and bringing it to a new startup, Otto, which he later sold to Uber. President Trump later commuted the sentence in 2021. Bartz and Johnson had no comment at the time of going to press. Graeber declined to discuss the verdict. ®
[17]
The Anthropic Copyright Ruling Exposes Blind Spots on AI
In what is shaping up to be a long, hard fight over the use of creative works, round one has gone to the AI makers. In the first such US decision of its kind, District Judge William Alsup said Anthropic's use of millions of books to train its artificial-intelligence model, without payment to the sources, was legal under copyright law because it was "transformative -- spectacularly so." The closely watched ruling is a warning of what lies ahead under existing copyright laws. Designed to protect creative freedom, the "fair use" doctrine that Anthropic used to successfully defend its actions is now the most potent tool for undermining the creative industry's ability to support itself in the coming age of AI.
[18]
More trouble for authors as Meta wins Llama scraping case
Authors are having a hard time protecting their works from the maws of the LLM makers Californian courts have not been kind to authors this week, with a second ruling going against an unlucky 13 who sought redress for use of their content in training AI models. On Monday, Anthropic won most of its case against three authors over its use of their works to train its AI. Judge William Alsup ruled Anthropic was able to use the authors' books if it bought them, but not if it pirated their material. This ruling does not stand for the proposition that Meta's use of copyrighted materials to train its language models is lawful Meta received a similar verdict two days later in a decision [PDF] issued by Judge Vince Chhabria of California Northern District Court. Citing Judge Alsup's earlier ruling, Chhabria said Meta's copying of the authors' works was technically fair use, since the AI wouldn't reproduce large parts of their text, and that the authors should have tried a different legal argument. The authors alleged that Meta fed 666 copies of books to which they hold copyright into its Llama models but did so without attempting to license the works. The writers argued that Meta's AI could reproduce parts of their works and this would cause them financial harm. The Judge could find no evidence of that harm. "They contend that Llama is capable of reproducing small snippets of text from their books. And they contend that Meta, by using their works for training without permission, has diminished the authors' ability to license their works for the purpose of training large language models. As explained below, both of these arguments are clear losers," he wrote. "The Court has no choice but to grant summary judgment to Meta on the plaintiffs' claim that the company violated copyright law by training its models with their books. But in the grand scheme of things, the consequences of this ruling are limited," he wrote. "This is not a class action, so the ruling only affects the rights of these thirteen authors - not the countless others whose works Meta used to train its models. And, as should now be clear, this ruling does not stand for the proposition that Meta's use of copyrighted materials to train its language models is lawful. It stands only for the proposition that these plaintiffs made the wrong arguments and failed to develop a record in support of the right one." However, the Judge also dismissed Meta's arguments that requiring the social media giant to stop slurping copyrighted material to train LLMs is not in the public interest, describing that argument as "nonsense." Courts are considering several similar cases. Microsoft is currently embroiled in a lawsuit brought by authors over the claimed pirating of their work to train an AI engine, while Disney and Universal are suing AI outfit Midjourney for alleged copyright infringement over image generation.®
[19]
Anthropic wins key ruling on AI in authors' copyright lawsuit
June 24 (Reuters) - A federal judge in San Francisco ruled late Monday that Anthropic's use of books without permission to train its artificial intelligence system was legal under U.S. copyright law. Siding with tech companies on a pivotal question for the AI industry, U.S. District Judge William Alsup said Anthropic made "fair use" of books by writers Andrea Bartz, Charles Graeber and Kirk Wallace Johnson to train its Claude large language model. Alsup also said, however, that Anthropic's storage of the authors' books in a "central library" violated their copyrights and was not fair use. Spokespeople for Anthropic and attorneys for the authors did not immediately respond to requests for comment on the ruling on Tuesday. The writers sued Anthropic last year, arguing that the company, which is backed by Amazon (AMZN.O), opens new tab and Alphabet (GOOGL.O), opens new tab, used pirated versions of their books without permission or compensation to teach Claude to respond to human prompts. The class action lawsuit is one of several brought by authors, news outlets and other copyright owners against companies including OpenAI, Microsoft (MSFT.O), opens new tab and Meta Platforms (META.O), opens new tab over their AI training. The doctrine of fair use allows the use of copyrighted works without the copyright owner's permission in some circumstances. Fair use is a key legal defense for the tech companies, and Alsup's decision is the first to address it in the context of generative AI. AI companies argue their systems make fair use of copyrighted material to create new, transformative content, and that being forced to pay copyright holders for their work could hamstring the burgeoning AI industry. Anthropic told the court that it made fair use of the books and that U.S. copyright law "not only allows, but encourages" its AI training because it promotes human creativity. The company said its system copied the books to "study Plaintiffs' writing, extract uncopyrightable information from it, and use what it learned to create revolutionary technology." Copyright owners say that AI companies are unlawfully copying their work to generate competing content that threatens their livelihoods. Alsup agreed with Anthropic on Monday that its training was "exceedingly transformative." "Like any reader aspiring to be a writer, Anthropic's LLMs trained upon works not to race ahead and replicate or supplant them -- but to turn a hard corner and create something different," Alsup said. Reporting by Blake Brittain in Washington; Editing by Chizu Nomiyama and Louise Heavens Our Standards: The Thomson Reuters Trust Principles., opens new tab Suggested Topics:LitigationIntellectual Property Blake Brittain Thomson Reuters Blake Brittain reports on intellectual property law, including patents, trademarks, copyrights and trade secrets, for Reuters Legal. He has previously written for Bloomberg Law and Thomson Reuters Practical Law and practiced as an attorney.
[20]
Meta fends off authors' US copyright lawsuit over AI
June 25 (Reuters) - A federal judge in San Francisco ruled on Wednesday for Meta Platforms (META.O), opens new tab against a group of authors who had argued that its use of their books without permission to train its artificial intelligence system infringed their copyrights. U.S. District Judge Vince Chhabria said in his decision, opens new tab the authors had not presented enough evidence that Meta's AI would dilute the market for their work to show that the company's conduct was illegal under U.S. copyright law. Chhabria also said, however, that using copyrighted work without permission to train AI would be unlawful in "many circumstances," splitting with another San Francisco judge who found on Monday in a separate lawsuit that Anthropic's AI training made "fair use" of copyrighted materials. "This ruling does not stand for the proposition that Meta's use of copyrighted materials to train its language models is lawful," Chhabria said. "It stands only for the proposition that these plaintiffs made the wrong arguments and failed to develop a record in support of the right one." Spokespeople for Meta and attorneys for the authors did not immediately respond to requests for comment. The authors sued Meta in 2023, arguing the company misused pirated versions of their books to train its AI system Llama without permission or compensation. The lawsuit is one of several copyright cases brought by writers, news outlets and other copyright owners against companies including OpenAI, Microsoft (MSFT.O), opens new tab and Anthropic over their AI training. The legal doctrine of fair use allows the use of copyrighted works without the copyright owner's permission in some circumstances. It is a key defense for the tech companies. Chhabria's decision is the second in the U.S. to address fair use in the context of generative AI, following U.S. District Judge William Alsup's ruling on the same issue in the Anthropic case. AI companies argue their systems make fair use of copyrighted material by studying it to learn to create new, transformative content, and that being forced to pay copyright holders for their work could hamstring the burgeoning AI industry. Copyright owners say AI companies unlawfully copy their work to generate competing content that threatens their livelihoods. Chhabria expressed sympathy for that argument during a hearing in May, which he reiterated on Wednesday. The judge said generative AI had the potential to flood the market with endless images, songs, articles and books using a tiny fraction of the time and creativity that would otherwise be required to create them. "So by training generative AI models with copyrighted works, companies are creating something that often will dramatically undermine the market for those works, and thus dramatically undermine the incentive for human beings to create things the old-fashioned way," Chhabria said. Reporting by Blake Brittain in Washington; Editing by David Gregorio, Alexia Garamfalvi and Nia Williams Our Standards: The Thomson Reuters Trust Principles., opens new tab Suggested Topics:Artificial Intelligence Blake Brittain Thomson Reuters Blake Brittain reports on intellectual property law, including patents, trademarks, copyrights and trade secrets, for Reuters Legal. He has previously written for Bloomberg Law and Thomson Reuters Practical Law and practiced as an attorney.
[21]
Meta wins artificial intelligence copyright case in blow to authors
Meta's use of millions of books to train its artificial intelligence models has been judged "fair" by a federal court on Wednesday, in a win for tech companies that use copyrighted materials to develop AI. The case, brought by about a dozen authors, including Ta-Nehisi Coates and Richard Kadrey, challenged how the $1.4tn social media giant used a library of millions of online books, academic articles and comics to train its Llama AI models. Meta's use of these titles is protected under copyright law's fair use provision, San Francisco district judge Vince Chhabria ruled. The Big Tech firm had argued that the works had been used to develop a transformative technology, which was fair "irrespective" of how it acquired the works. This case is among dozens of legal battles working their way through the courts, as creators seek greater financial rights when their works are used to train AI models that may disrupt their livelihoods -- while companies profit from the technology. However, Chhabria warned that his decision reflected the authors' failure to properly make their case. "This ruling does not stand for the proposition that Meta's use of copyrighted materials to train its language models is lawful," he said. "It stands only for the proposition that these plaintiffs made the wrong arguments and failed to develop a record in support of the right one." It is the second victory in a week for tech groups that develop AI, after a federal judge on Monday ruled in favour of San Francisco start-up Anthropic in a similar case. Anthropic had trained its Claude models on legally purchased physical books that were cut up and manually scanned, which the ruling said constituted "fair use". However, the judge added that there would need to be a separate trial for claims that it pirated millions of books digitally for training. The Meta case dealt with LibGen, a so-called online shadow library that hosts much of its content without permission from the rights holders. Chhabria suggested a "potentially winning argument" in the Meta case would be market dilution, referring to the damage caused to copyright holders by AI products that could "flood the market with endless amounts of images, songs, articles, books, and more". "People can prompt generative AI models to produce these outputs using a tiny fraction of the time and creativity that would otherwise be required," Chhabria added. He warned AI could "dramatically undermine the incentive for human beings to create things the old-fashioned way". Meta and legal representatives for the authors did not immediately reply to requests for comment.
[22]
Judge rules Anthropic did not violate authors' copyrights with AI book training
Dario Amodei, Anthropic CEO, speaking on CNBC's Squawk Box outside the World Economic Forum in Davos, Switzerland on Jan. 21st, 2025. Anthropic's use of books to train its artificial intelligence model Claude was "fair use" and "transformative," a federal judge ruled late on Monday. Amazon-backed Anthropic's AI training did not violate the authors' copyrights since the large language models "have not reproduced to the public a given work's creative elements, nor even one author's identifiable expressive style," wrote U.S. District Judge William Alsup. "The purpose and character of using copyrighted works to train LLMs to generate new text was quintessentially transformative," Alsup wrote. "Like any reader aspiring to be a writer." The decision is a significant win for AI companies as legal battles play out over the use and application of copyrighted works in developing and training LLMs. Alsup's ruling begins to establish the legal limits and opportunities for the industry going forward.
[23]
Judge rules Anthropic's AI training on copyrighted materials is fair use
Anthropic has a mixed result in a class action lawsuit brought by a group of authors who claimed the company used their copyrighted creations without permission. On the positive side for the artificial intelligence company, senior district judge William Alsup of the US District Court for the Northern District of California determined that Anthropic's training of its AI tools on copyrighted works was protected as fair use. Developing large language models for artificial intelligence has created a copyright law boondoggle as creators attempt to protect their works and tech companies or to gather more training materials. Alsup's ruling is one of the first that will likely set the foundation for legal precedents around what AI tools can and cannot do. Using copyright materials can be deemed fair use if the output is determined to be "transformative," or not a substitute for the original work. "The technology at issue was among the most transformative many of us will see in our lifetimes," Alsup . Despite the fair use designation, the ruling does still provide some recourse for the writers; they can choose to take Anthropic to court for piracy. "Anthropic downloaded over seven million pirated copies of books, paid nothing, and kept these pirated copies in its library even after deciding it would not use them to train its AI (at all or ever again)," Alsup wrote. "Authors argue Anthropic should have paid for these pirated library copies. This order agrees."
[24]
Judge rules AI company Anthropic didn't break copyright law but must face trial over pirated books
In a test case for the artificial intelligence industry, a federal judge has ruled that AI company Anthropic didn't break the law by training its chatbot Claude on millions of copyrighted books. But the company is still on the hook and could now go to trial over how it acquired those books by downloading them from online "shadow libraries" of pirated copies. U.S. District Judge William Alsup of San Francisco said in a ruling filed late Monday that the AI system's distilling from thousands of written works to be able to produce its own passages of text qualified as "fair use" under U.S. copyright law because it was "quintessentially transformative." "Like any reader aspiring to be a writer, Anthropic's (AI large language models) trained upon works not to race ahead and replicate or supplant them -- but to turn a hard corner and create something different," Alsup wrote. But while dismissing the key copyright infringement claim made by the group of authors who sued the company last year, Alsup also said Anthropic must still go to trial over its alleged theft of their works. "Anthropic had no entitlement to use pirated copies for its central library," Alsup wrote. A trio of writers -- Andrea Bartz, Charles Graeber and Kirk Wallace Johnson -- alleged in their lawsuit last summer that Anthropic committed "large-scale theft" by allegedly training its popular chatbot Claude on pirated copies of copyrighted books, and that the company "seeks to profit from strip-mining the human expression and ingenuity behind each one of those works." As the case proceeded over the past year in San Francisco's federal court, documents disclosed in court showed Anthropic's internal concerns about the legality of their use of online repositories of pirated works. So the company later shifted its approach and attempted to purchase copies of digitized books. "That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for the theft but it may affect the extent of statutory damages," Alsup wrote. The ruling could set a precedent for similar lawsuits that have piled up against Anthropic competitor OpenAI, maker of ChatGPT, as well as against Meta Platforms, the parent company of Facebook and Instagram. Anthropic -- founded by ex-OpenAI leaders in 2021 -- has marketed itself as the more responsible and safety-focused developer of generative AI models that can compose emails, summarize documents and interact with people in a natural way. But the lawsuit filed last year alleged that Anthropic's actions "have made a mockery of its lofty goals" by tapping into repositories of pirated writings to build its AI product.
[25]
Judge dismisses authors' copyright lawsuit against Meta over AI training
A federal judge on Wednesday sided with Facebook parent Meta Platforms in dismissing a copyright infringement lawsuit from a group of authors who accused the company of stealing their works to train its artificial intelligence technology. The ruling from U.S. District Judge Vince Chhabri was the second in a week from San Francisco's federal court to dismiss major copyright claims from book authors against the rapidly developing AI industry. Chhabri found that 13 authors who sued Meta "made the wrong arguments" and tossed the case. But the judge also said that the ruling is limited to the authors in the case and does not mean that Meta's use of copyrighted materials is lawful. Lawyers for the plaintiffs -- a group of well-known writers that includes comedian Sarah Silverman and authors Jacqueline Woodson and Ta-Nehisi Coates -- didn't immediately respond to a request for comment Wednesday. Meta also didn't immediately respond to a request for comment. "This ruling does not stand for the proposition that Meta's use of copyrighted materials to train its language models is lawful," Chhabri wrote. "It stands only for the proposition that these plaintiffs made the wrong arguments and failed to develop a record in support of the right one." On Monday, from the same courthouse, U.S. District Judge William Alsup ruled that AI company Anthropic didn't break the law by training its chatbot Claude on millions of copyrighted books, but the company must still go to trial for illicitly acquiring those books from pirate websites instead of buying them. But the actual process of an AI system distilling from thousands of written works to be able to produce its own passages of text qualified as "fair use" under U.S. copyright law because it was "quintessentially transformative," Alsup wrote. Chhabria, in his Meta ruling, criticized Alsup's reasoning on the Anthropic case, arguing that "Alsup focused heavily on the transformative nature of generative AI while brushing aside concerns about the harm it can inflict on the market for the works it gets trained on." Chhabria suggested that a case for such harm can be made. In the Meta case, the authors had argued in court filings that Meta is "liable for massive copyright infringement" by taking their books from online repositories of pirated works and feeding them into Meta's flagship generative AI system Llama. Lengthy and distinctively written passages of text -- such as those found in books -- are highly useful for teaching generative AI chatbots the patterns of human language. "Meta could and should have paid" to buy and license those literary works, the authors' attorneys argued. Meta countered in court filings that U.S. copyright law "allows the unauthorized copying of a work to transform it into something new" and that the new, AI-generated expression that comes out of its chatbots is fundamentally different from the books it was trained on. "After nearly two years of litigation, there still is no evidence that anyone has ever used Llama as a substitute for reading Plaintiffs' books, or that they even could," Meta's attorneys argued. Meta says Llama won't output the actual works it has copied, even when asked to do so. "No one can use Llama to read Sarah Silverman's description of her childhood, or Junot Diaz's story of a Dominican boy growing up in New Jersey," its attorneys wrote. Accused of pulling those books from online "shadow libraries," Meta has also argued that the methods it used have "no bearing on the nature and purpose of its use" and it would have been the same result if the company instead struck a deal with real libraries. Such deals are how Google built its online Google Books repository of more than 20 million books, though it also fought a decade of legal challenges before the U.S. Supreme Court in 2016 let stand lower court rulings that rejected copyright infringement claims. The authors' case against Meta forced CEO Mark Zuckerberg to be deposed, and has disclosed internal conversations at the company over the ethics of tapping into pirated databases that have long attracted scrutiny. "Authorities regularly shut down their domains and even prosecute the perpetrators," the authors' attorneys argued in a court filing. "That Meta knew taking copyrighted works from pirated databases could expose the company to enormous risk is beyond dispute: it triggered an escalation to Mark Zuckerberg and other Meta executives for approval. Their gamble should not pay off." "Whatever the merits of generative artificial intelligence, or GenAI, stealing copyrighted works off the Internet for one's own benefit has always been unlawful," they argued. The named plaintiffs are Jacqueline Woodson, Richard Kadrey, Andrew Sean Greer, Rachel Louise Snyder, David Henry Hwang, Ta-Nehisi Coates, Laura Lippman, Matthew Klam, Junot Diaz, Sarah Silverman, Lysa TerKeurst, Christopher Golden and Christopher Farnsworth. Most of the plaintiffs had asked Chhabria to rule now, rather than wait for a jury trial, on the basic claim of whether Meta infringed on their copyrights. Two of the plaintiffs, Ta-Nehisi Coates and Christopher Golden, did not seek such summary judgment. Chhabri said in the ruling that while he had "no choice" but to grant Meta's summary judgment tossing the case, "in the grand scheme of things, the consequences of this ruling are limited. This is not a class action, so the ruling only affects the rights of these 13 authors -- not the countless others whose works Meta used to train its models."
[26]
Meta wins AI copyright case, but judge says others could bring lawsuits
Meta on Wednesday prevailed against a group of 13 authors in a major copyright case involving the company's Llama artificial intelligence model, but the judge made clear his ruling was limited to this case. U.S. District Judge Vince Chhabria sided with Meta's argument that the company's use of books to train its large language models, or LLMs, is protected under the fair use doctrine of U.S. copyright law. Lawyers representing the plaintiffs, including Sarah Silverman and Ta-Nehisi Coates, alleged that Meta violated the nation's copyright law because the company did not seek permission from the authors to use their books for the company's AI model, among other claims. Notably, Chhabria said that it "is generally illegal to copy protected works without permission," but in this case, the plaintiffs failed to present a compelling argument that Meta's use of books to train Llama caused "market harm." Chhabria wrote that the plaintiffs had put forward two flawed arguments for their case. "On this record Meta has defeated the plaintiffs' half-hearted argument that its copying causes or threatens significant market harm," Chhabria said. "That conclusion may be in significant tension with reality." Meta's practice of "copying the work for a transformative purpose" is protected by the fair use doctrine, the judge wrote. "We appreciate today's decision from the Court," a Meta spokesperson said in a statement. "Open-source AI models are powering transformative innovations, productivity and creativity for individuals and companies, and fair use of copyright material is a vital legal framework for building this transformative technology." Though there could be valid arguments that Meta's data training practice negatively impacts the book market, the plaintiffs did not adequately make their case, the judge wrote. Attorneys representing the plaintiffs did not respond to a request for comment. Still, Chhabria noted several flaws in Meta's defense, including the notion that the "public interest" would be "badly disserved" if the company and other businesses were prohibited "from using copyrighted text as training data without paying to do so." "Meta seems to imply that such a ruling would stop the development of LLMs and other generative AI technologies in its tracks," Chhabria wrote. "This is nonsense." The judge left the door open for other authors to bring similar AI-related copyright lawsuits against Meta, saying that "in the grand scheme of things, the consequences of this ruling are limited." "This is not a class action, so the ruling only affects the rights of these thirteen authors -- not the countless others whose works Meta used to train its models," he wrote. "And, as should now be clear, this ruling does not stand for the proposition that Meta's use of copyrighted materials to train its language models is lawful." Additionally, Chhabria noted that there is still a pending, separate claim made by the plaintiffs alleging that Meta "may have illegally distributed their works (via torrenting)." Earlier this week, a federal judge ruled that Anthropic's use of books to train its AI model Claude was also "transformative," thus satisfying the fair use doctrine. Still, that judge said that Anthropic must face a trial over allegations that it downloaded millions of pirated books to train its AI systems." "That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for the theft, but it may affect the extent of statutory damages," the judge wrote. WATCH: Meta pushes back on ban of WhatsApp on devices used by House of Representatives.
[27]
Meta wins AI copyright case filed by Sarah Silverman and other authors
The judge said the plaintiffs failed to show how Meta's actions could financially harm them. Federal Judge Vince Chhabria has ruled in favor of Meta over the 13 book authors, including Sarah Silverman, who sued the company for training its large language model on their published work without obtaining consent. His court has granted summary judgment to Meta, which means the case didn't reach full trial. Chhabria said that Meta didn't violate copyright law after the plaintiffs had failed to show sufficient evidence that the company's use of the authors' work would hurt them financially. In his ruling (PDF), Chhabria admitted that in most cases, it is illegal to feed copyright-protected materials into their large language models without getting permission or paying the copyright owners for the right to use their creations. "...by training generative AI models with copyrighted works, companies are creating something that often will dramatically undermine the market for those works, and thus dramatically undermine the incentive for human beings to create things the old-fashioned way," he wrote. However, the court "must decide cases based on the evidence presented by the parties," he said. For this particular case, the plaintiffs argued that Meta's actions cannot be considered "fair use." They said that that their creations are affected by Meta's use because the company's LLM, Llama, is capable of reproducing small snippets of text from their books. They also said that by using their books for training without consent, Meta had diminished their ability to license their work for LLM training. The judge called both arguments "clear losers." Llama isn't capable of generating enough text straight from the books to matter, he said, and the authors aren't entitled to the "market for licensing their works as AI training data." Chhabria wrote that the argument that Meta copied their books to create a product that has the capability to flood the market with similar works, thereby causing market dilution, could have given the plaintiffs the win. But the plaintiffs barely touched the argument and presented no evidence to show how output from Meta's LLM could dilute the market. Despite his ruling, Chhabria clarified that his decision is limited: It only affects the 13 authors in the lawsuit and "does not stand for the proposition that Meta's use of copyrighted materials to train its language models is lawful." Another judge, William Alsup, also recently sided with Anthropic in a class action lawsuit also brought by a group of authors who accused the company of using their copyrighted work without permission. Alsup provided the writers recourse, though, and allowed them to take Anthropic to court for piracy.
[28]
US Judge sides with AI firm Anthropic over copyright issue
A US judge has ruled that using books to train artificial intelligence (AI) software is not a violation of US copyright law. The decision came out of a lawsuit brought last year against AI firm Anthropic by three writers, a novelist, and two non-fiction authors, who accused the firm of stealing their work to train its Claude AI model and build a multi-billion dollar business. In his ruling, Judge William Alsup wrote that Anthropic's use of the authors' books was "exceedingly transformative" and therefore allowed under US law. But he rejected Anthropic's request to dismiss the case, ruling the firm would have to stand trial over its use of pirated copies to build their library of material.
[29]
Court says AI training on books is fair use but Anthropic must face trial over pirated copies
Serving tech enthusiasts for over 25 years. TechSpot means tech analysis and advice you can trust. What just happened? A federal court has delivered a split decision in a high-stakes copyright case that could reshape the future of artificial intelligence development. US District Judge William Alsup ruled that Anthropic's use of copyrighted books to train its Claude AI system qualifies as lawful "fair use" under copyright law, marking a significant victory for the AI industry. However, the judge simultaneously ordered the company to face trial this December for allegedly building a "central library" containing over 7 million pirated books, a decision that maintains crucial safeguards for content creators. This nuanced ruling establishes that while AI companies may learn from copyrighted human knowledge, they cannot build their foundations on materials that have been stolen. Judge Alsup determined that training AI systems on copyrighted materials transforms the original works into something fundamentally new, comparing the process to human learning. "Like any reader aspiring to be a writer, Anthropic's AI models trained upon works not to replicate them but to create something different," Alsup wrote in his decision. This transformative quality placed the training firmly within legal "fair use" boundaries. Anthropic's defense centered on the allowance for transformative uses under copyright law, which advances creativity and scientific progress. The company argued that its AI training involved extracting uncopyrightable patterns and information from texts, not reproducing the works themselves. Technical documents revealed Anthropic purchased millions of physical books, removed bindings, and scanned pages to create training datasets - a process the judge deemed "particularly reasonable" since the original copies were destroyed after digitization. However, the judge drew a sharp distinction between lawful training methods and the company's parallel practice of downloading pirated books from shadow libraries, such as Library Genesis. Alsup emphatically rejected Anthropic's claim that the source material was irrelevant to fair use analysis. "This order doubts that any accused infringer could ever meet its burden of explaining why downloading source copies from pirate sites was reasonably necessary," the ruling stated, setting a critical precedent about the importance of acquisition methods. The decision provides immediate relief to AI developers facing similar copyright lawsuits, including cases against OpenAI, Meta, and Microsoft. By validating the fair use argument for AI training, the ruling potentially avoids industry-wide requirements to license all training materials - a prospect that could have dramatically increased development costs. Anthropic welcomed the fair use determination, stating it aligns with "copyright's purpose in enabling creativity and fostering scientific progress." Yet the company faces substantial financial exposure in the December trial, where statutory damages could reach $150,000 per infringed work. The authors' legal team declined to comment, while court documents show Anthropic internally questioned the legality of using pirate sites before shifting to purchasing books.
[30]
Judge Rules AI Companies Can Use Some Copyrighted Works to Fuel Their Sludge
The legal decision sets a precedent for the pilfering of creative works for AI-fuel. This week, a federal judge handed AI companies a major win, potentially setting a legal precedent for the industry to plunder copyrighted materials to train their large language models. Anthropic, the large AI company backed by Amazon, has been in a pitched legal battle with a group of writers and journalists who sued the company last summer and accused it of illegally using their works to train the company's flagship chatbot, Claude. The legality of the AI industry's entire business model has long depended on the question of whether it is kosher to hoover up large amounts of copyrighted data from all over the web and then feed it into an algorithm to produce "original" text. Anthropic has maintained that its use of the writers' work falls under fair use and is therefore legal. This week, the federal judge presiding over the case, William Alsup, partially agreed. In his ruling, Alsup claimed that, by training its LLM without the authors' permission, Anthropic did not infringe on copyrighted materials because the work it produced was, in his eyes, original. He claimed that the company's algorithms have... "...not reproduced to the public a given work’s creative elements, nor even one author’s identifiable expressive style...Yes, Claude has outputted grammar, composition, and style that the underlying LLM distilled from thousands of works. But if someone were to read all the modern-day classics because of their exceptional expression, memorize them, and then emulate a blend of their best writing, would that violate the Copyright Act? Of course not." Alsup's ruling departs quite a bit from the writers' litigation, which accused Anthropic of "strip-mining" human expression and ingenuity for the sake of corporate profits. This ruling is just one judge's opinion, but critics fear it could easily set a precedent for other legal decisions across the country. AI companies have been sued dozens of times by creatives on similar grounds. While Alsup's decision may signal broader victories for the AI industry, it isn't exactly what you would call a win for Anthropic. That's because Alsup also ruled that the specific way in which Anthropic nabbed some of the copyrighted materials for its LLMâ€"by downloading over 7 million pirated booksâ€"could be illegal, and would require a separate trial. “We will have a trial on the pirated copies used to create Anthropic’s central library and the resulting damages,†Alsup wrote. “That Anthropic later bought a copy of a book [that] it earlier stole off the internet will not absolve it of liability for theft, but it may affect the extent of statutory damages.†When reached for comment by Gizmodo, Anthropic provided the following statement: “We are pleased that the Court recognized that using â€~works to train LLMs was transformative â€" spectacularly so.’ Consistent with copyright’s purpose in enabling creativity and fostering scientific progress, â€~Anthropic’s LLMs trained upon works not to race ahead and replicate or supplant them â€" but to turn a hard corner and create something different.’†Alsup has presided over several prominent cases involving large tech companies, including Uber, DoorDash, and Waymo. More recently, Alsup ordered the Trump administration to reinstate thousands of fired probationary workers who were pushed out by Elon Musk's DOGE initiative.
[31]
Federal court says copyrighted books are fair use for AI training
Anthropic didn't break the law when it trained its chatbot with copyrighted books, a judge said, but it must go to trial for allegedly using pirated books. A federal judge this week ruled that artificial intelligence company Anthropic did not break the law when it used copyrighted books to train its chatbot, Claude, without the consent of the texts' authors or publishers -- but he ordered the company to go to trial for allegedly using pirated versions of the books. The decision, made Monday by Judge William Alsup of the U.S. District Court for the Northern District of California, represents a win for AI companies, which have battled copyright lawsuits from writers and news organizations for using their work to train AI systems. Alsup said Anthropic's use of the books to train its large language models, was like an aspiring writer who reads copyrighted texts "not to race ahead and replicate or supplant" those works, "but to turn a hard corner and create something different." His ruling was on a lawsuit filed against Anthropic last year by three authors -- Andrea Bartz, Charles Graeber and Kirk Wallace Johnson -- who alleged that the company used their work without their consent to train AI systems in what amounted to "largescale theft." But Alsup ordered Anthropic to face trial for the accusation that it knowingly obtained copies of more than 7 million books from piracy websites, although the company later paid to purchase copies of some books. Alsup said he doubted that "any accused infringer could ever meet its burden of explaining why downloading source copies from pirate sites that it could have purchased or otherwise accessed lawfully was itself reasonably necessary to any subsequent fair use." "That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for the theft but it may affect the extent of statutory damages," he added. In a statement, Anthropic said it was pleased that the court recognized that using published works to train LLMs was consistent with copyright laws "in enabling creativity and fostering scientific progress." But the company said it disagrees with the decision to hold a trial for its "acquisition of a subset of books and how they were used," in apparent reference to the piracy allegations. "We remain confident in our overall case, and are evaluating all options," it said. In their lawsuit, the authors said the actions of Anthropic have made "a mockery of its lofty goals." The company was founded in 2021 by a group that included OpenAI's former vice president of research Dario Amodei with goals that included "research into increasing the safety of AI systems." Bartz and Johnson did not reply to requests for comment. Graeber declined to comment. After concerns arose within the company about using pirated books, Anthropic hired former Google Books executive Tom Turvey to obtain "all the books in the world" while also avoiding as many legal issues as possible, according to court documents. Turvey and his team could have sought to reach commercial agreements with publishers to license the books to train its AI systems, Alsup noted, but they instead purchased millions of print books from retailers, many of them in used condition, then scanned them into digital form. The company could have also hired staff writers and engineers to create good original writing to train AI models. But that would have "required spending more," Alsup noted.
[32]
Judge Rules Training AI on Authors' Books Is Legal But Pirating Them Is Not
A federal judge in California ruled Monday that Anthropic likely violated copyright law when it pirated authors' books to create a giant dataset and "forever" library but that training its AI on those books without authors' permission constitutes transformative fair use under copyright law. The complex decision is one of the first of its kind in a series of high-profile copyright lawsuits brought by authors and artists against AI companies, and it's largely a very bad decision for authors, artists, writers, and web developers. This case, in which authors Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson sued Anthropic, maker of the Claude family of large language models, is one of dozens of high-profile lawsuits brought against AI giants. The authors sued Anthropic because the company scraped full copies of their books for the purposes of training their AI models from a now-notorious dataset called Books3, as well as from the piracy websites LibGen and Pirate Library Mirror (PiLiMi). The suit also claims that Anthropic bought used physical copies of books and scanned them for the purposes of training AI. "From the start, Anthropic 'had many places from which' it could have purchased books, but it preferred to steal them to avoid 'legal/practice/business slog,' as cofounder and chief executive officer Dario Amodei put it. So, in January or February 2021, another Anthropic cofounder, Ben Mann, downloaded Books3, an online library of 196,640 books that he knew had been assembled from unauthorized copies of copyrighted books -- that is, pirated," William Alsup, a federal judge for the Northern District of California, wrote in his decision Monday. "Anthropic's next pirated acquisitions involved downloading distributed, reshared copies of other pirate libraries. In June 2021, Mann downloaded in this way at least five million copies of books from Library Genesis, or LibGen, which he knew had been pirated. And, in July 2022, Anthropic likewise downloaded at least two million copies of books from the Pirate Library Mirror, or PiLiMi, which Anthropic knew had been pirated." Notably, Anthropic also created an internal, "general-purpose library" made up partially of pirated copyrighted works for "various uses for which the company might have of them," in addition to scraping the books for the purposes of training AI. William Alsup, a federal judge for the Northern District of California, wrote in his decision Monday that the creation of this "pirated library ... points against fair use" and must be considered at trial. At a hearing in May, Alsup signaled that he was leaning toward making this type of decision: "I'm inclined to say they did violate the Copyright Act but the subsequent uses were fair use," Alsup said. "The downloaded pirated copies used to build a central library were not justified by a fair use. Every factor points against fair use," Alsup wrote. "Anthropic employees said copies of works (pirated ones, too) would be retained 'forever' for 'general purpose' even after Anthropic determined they would never be used for training LLMs. A separate justification was required for each use. None is even offered here except for Anthropic's pocketbook and convenience.We will have a trial on the pirated copies used to create Anthropic's central library and the resulting damages, actual or statutory (including for willfulness). That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for the theft but it may affect the extent of statutory damages." Still, this is not a good decision for authors, because the judge ruled that actually training AI on the works was not illegal, though it is too early to say exactly what this means in a larger context. At the moment, it suggests that training an AI on legally purchased works is sufficiently transformative, but that pirating those works in the first place is not. This case did not consider what it means for AI training of free-to-access content on the open web, on social media, from libraries, etc. It's largely a win for AI companies, who, when faced with these sorts of lawsuits, have almost universally said that their data scraping and training is legal as a transformative fair use under copyright law, arguing they do not need to ask for permission or provide compensation when they scrape the internet to build AI tools. This lawsuit does not allege that Anthropic or Claude directly recreated parts of the authors' books to its users: "When each LLM was put into a public-facing version of Claude, it was complemented by other software that filtered user inputs to the LLM and filtered outputs from the LLM back to the user," Alsup wrote in his order. "As a result, Authors do not allege that any infringing copy of their works was or would ever be provided to users by the Claude service. Yes, Claude could help less capable writers create works as well-written as Authors' and competing in the same categories. But Claude created no exact copy, nor any substantial knock-off. Nothing traceable to Authors' works. Such allegations are simply not part of plaintiffs' amended complaint, nor in our record." Many other copyright lawsuits against AI companies argue that not only are AI companies training on pirated copyrighted data, but that the AI tools they create then regurgitate large passages of those copyrighted works either verbatim or in a substantially similar style. Researchers found, for example, that Meta's AI has "memorized" huge portions of books and will regurgitate them. This case largely considered whether the actual training itself is a violation of copyright law. "The use of the books at issue to train Claude and its precursors was exceedingly transformative and was a fair use under Section 107 of the Copyright Act," Alsup wrote in his order. "And, the digitization of the books purchased in print form by Anthropic was also a fair use but not for the same reason as applies to the training copies. Instead, it was a fair use because all Anthropic did was replace the print copies it had purchased for its central library with more convenient space-saving and searchable digital copies for its central library -- without adding new copies, creating new works, or redistributing existing copies." In February, Thompson Reuters won a case against a competitor in which it claimed its competitor illegally scraped its works to train AI. There are currently dozens of similar lawsuits winding their way through the legal system right now, so it's likely to take a few more decisions before we get a full picture of what courts think about the legality of mass, unauthorized AI data training.
[33]
Judge rules Anthropic can legally train AI on copyrighted material
However, the same judgment says that Anthropic can still be sued for piracy. One of the big gray areas in the burgeoning generative AI space is whether the training of AI models on copyrighted material without the permission of copyright holders violates copyright. This has led a group of authors to sue Anthropic, the company behind the AI chatbot Claude. Now, a US federal judge has ruled that AI training is covered by so-called "fair use" laws and is therefore legal, Engadget reports. Under US law, fair use means that copyrighted material is allowed to be used if the result is considered "transformative." That is, the resulting work must be something new rather than it being entirely derivative or a substitute for the original work. This is one of the first judicial reviews of its kind, and the judgment may serve as precedent for future cases. However, the judgment also notes that the plaintiff authors still have the option to sue Anthropic for piracy. The judgment states that the company illegally downloaded (pirated) over 7 million books without paying, and also kept them in its internal library even after deciding they wouldn't be used to train or re-train the AI model going forward. The judge wrote: "Authors argue Anthropic should have paid for these pirated library copies. This order agrees."
[34]
In a first-of-its-kind decision, an AI company wins a copyright infringement lawsuit brought by authors
AI companies could have the legal right to train their large language models on copyrighted works -- as long as they obtain copies of those works legally. That's the upshot of a first-of-its-kind ruling by a federal judge in San Francisco on Monday in an ongoing copyright infringement case that pits a group of authors against a major AI company. The ruling is significant because it represents the first substantive decision on how fair use applies to generative AI systems. Fair use doctrine enables copyrighted works to be used by third parties without the copyright holder's consent in some circumstances such as illustrating a point in a news article. Claims of fair use are commonly invoked by AI companies trying to make the case for the use of copyrighted works to train their generative AI models. But authors and other creative industry plaintiffs have been pushing back with a slew of lawsuits. Authors take on Anthropic In their 2024 class action lawsuit, authors Andrea Bartz, Charles Graeber and Kirk Wallace Johnson alleged Anthropic AI used the contents of millions of digitized copyrighted books to train the large language models behind their chatbot, Claude, including at least two works by each plaintiff. It also purchased some books. The company also bought some hard copy books and scanned them before ingesting them into its model. "Rather than obtaining permission and paying a fair price for the creations it exploits, Anthropic pirated them," the authors' complaint states. In Monday's order, Senior U.S. District Judge William Alsup supported Anthropic's argument, stating the company's use of books by the plaintiffs to train their AI model was acceptable. "The training use was a fair use," he wrote. "The use of the books at issue to train Claude and its precursors was exceedingly transformative." The judge said the digitization of the books purchased in print form by Anthropic could also be considered fair use, "because all Anthropic did was replace the print copies it had purchased for its central library with more convenient space-saving and searchable digital copies for its central library -- without adding new copies, creating new works, or redistributing existing copies." However, Alsup also acknowledged that not all books were paid for. He wrote Anthropic "downloaded for free millions of copyrighted books in digital form from pirate sites on the internet" as part of its effort "to amass a central library of 'all the books in the world' to retain 'forever,.'" Alsup did not approve of Anthropic's view "that the pirated library copies must be treated as training copies," and is allowing the authors' piracy complaint to proceed to trial. "We will have a trial on the pirated copies used to create Anthropic's central library and the resulting damages, actual or statutory (including for willfulness)," Alsup stated. Bifurcated responses Alsup's bifurcated decision led to similarly divided responses from those involved in the case and industry stakeholders. In a statement to NPR, Anthropic praised the judge's recognition that using works to train large language models was "transformative -- spectacularly so." The company added: "Consistent with copyright's purpose in enabling creativity and fostering scientific progress, Anthropic's large language models are trained upon works not to race ahead and replicate or supplant them, but to turn a hard corner and create something different." However, Anthropic also said it disagrees with the court's decision to proceed with a trial. "We believe it's clear that we acquired books for one purpose only -- building large language models -- and the court clearly held that use was fair," the company stated. A member of the plaintiffs' legal team declined to speak publicly about the decision. The Authors' Guild, a major professional writers' advocacy group, did share a statement: "We disagree with the decision that using pirated or scanned books for training large language models is fair use," the statement said. In an interview with NPR, the guild's CEO, Mary Rasenberger, added that authors need not be too concerned with the ruling. "The impact of this decision for book authors is actually quite good," Rasenberger said. "The judge understood the outrageous piracy. And that comes with statutory damages for intentional copyright infringement, which are quite high per book." According to the Copyright Alliance, U.S. copyright law states willful copyright infringement can lead to statutory damages of up to $150,000 per infringed work. The ruling states Anthropic pirated more than 7 million copies of books. So the damages resulting from the upcoming trial could be huge. The part of the case focused on Anthropic's liability for using pirated works is scheduled to go to trial in December. Other cases and a new ruling Similar lawsuits have been brought by other prominent authors. Ta-Nehisi Coates, Michael Chabon, Junot Díaz and the comedian Sarah Silverman are involved in ongoing cases against AI players. On Wednesday, U.S. District Judge Vince Chhabria ruled in favor of Meta in one of those cases. A copyright infringement lawsuit was brought by 13 authors including Richard Kadrey and Silverman. They sued Meta for allegedly using pirated copies of their novels to train LLaMA. Meta claimed fair use and won because the authors failed to present evidence that Meta's use of their books impacted the market for their original work. However, the judge said the ruling applies only to the specific works included in the lawsuit and that in future cases, authors making similar claims could win if they make a stronger case. " These rulings are going to help tech companies and copyright holders to see where judges and courts are likely to go in the future," said Ray Seilie, a lawyer based in Los Angeles with the firm Kinsella Holley Iser Kump Steinsapir, who focuses on AI and creativity. He is not involved with this particular case. " I think they can be seen as a victory for the AI community writ large because they create a precedent suggesting that AI companies can use legally-obtained material to train their models," Seilie said. But he said this doesn't mean AI companies can immediately go out and scan whatever books they buy with impunity, since the rulings are likely to be appealed and the cases could potentially wind up before the Supreme Court.
[35]
Federal Judge Gives AI Companies a Landmark 'Fair Use' Victory
American artificial intelligence (AI) company Anthropic, which develops large language models competing with platforms like OpenAI's ChatGPT and Google's Gemini, has won a key ruling in a United States federal court. A federal judge ruled this week that AI developers can train AI models on copyrighted content without obtaining permission from the content creators. As The Verge reports, U.S. Federal Judge William Alsup of the Northern District of California ruled that Anthropic has the legal right to train AI models using copyrighted work. Judge Alsup says that this use falls under fair use. In the lawsuit, Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson v. Anthropic PBC, the three plaintiffs, all authors, allege that Anthropic had no right to use their protected works to train its family of Claude AI models. Judge Alsup disagrees, ruling that Anthropic's use of the plaintiffs' works, which included buying physical books, stripping them bare, and scanning the text into its training workflow, falls under fair use. Fair use has long been a crucial defense for AI companies, which require huge libraries of human-created work to sufficiently train their various AI models. Understandably, artists have resisted, and copyright infringement lawsuits have popped up left and right. Alsup's ruling is multi-faceted, however. While the federal judge has sided with Anthropic on the matter of using legally acquired, copyrighted materials to train AI models, the judge takes significant issue with some of Anthropic's other behavior, including storing more than seven million pirated books in a central library. This is not protected under the fair use doctrine, and the judge has set a second trial later this year to determine the damages Anthropic may owe for this infringement. As Reuters reports, "U.S. copyright law says that willful copyright infringement can justify statutory damages of up to $150,000 per work." However, potentially even more influential than Judge Alsup's ruling that training AI on copyrighted material can be protected under the doctrine of fair use is his additional decision that building AI models using copyrighted work can be considered sufficiently transformative to avoid violating copyright. "To summarize the analysis that now follows, the use of the books at issue to train Claude and its precursors was exceedingly transformative and was a fair use under Section 107 of the Copyright Act," Judge Alsup writes. "Anthropic's LLMs have not reproduced to the public a given work's creative elements, nor even one author's identifiable expressive style (assuming arguendo that these are even copyrightable). Yes, Claude has outputted grammar, composition, and style that the underlying LLM distilled from thousands of works," Alsup continues elsewhere in his ruling. "But if someone were to read all the modern-day classics because of their exceptional expression, memorize them, and then emulate a blend of their best writing, would that violate the Copyright Act? Of course not." Alsup calls using legally-acquired copyrighted works to train LLMs as "quintessentially transformative," claiming that Anthropic is using existing works "not to race ahead and replicate or supplant" the creators, but to "turn a hard corner and create something different." In their lawsuit, the plaintiffs alleged that, in general, training LLMs would "result in an explosion of works competing with their works," as Alsup characterizes it. The judge strongly disagrees with this complaint. "But Authors' complaint is no different than it would be if they complained that training schoolchildren to write well would result in an explosion of competing works," Alsup writes. "This is not the kind of competitive or creative displacement that concerns the Copyright Act. The Act seeks to advance original works of authorship, not to protect authors against competition." As a result of Alsup's ruling, AI companies now have a proven avenue through which they can defend their training work on the grounds of fair use. The ruling also asserts that some training applications are sufficiently transformative to be legally protected. There is little doubt that this new ruling could prove to be a landmark case that influences how other judges handle copyright claims levied against AI companies. That said, Anthropic will still need to answer for its alleged piracy.
[36]
Anthropic did not breach copyright when training AI on books without permission, court rules
Judge says firm made 'fair use' of literature but that storage of pirated books in central library constituted infringement A US judge has ruled that a tech company's use of books to train its artificial intelligence system - without permission of the authors - did not breach copyright law. A federal judge in San Francisco said Anthropic made "fair use" of books by writers Andrea Bartz, Charles Graeber and Kirk Wallace Johnson to train its Claude large language model. Judge William Alsup compared the Anthropic model's use of books to a "reader aspiring to be a writer" who uses works "not to race ahead and replicate or supplant them" but to "turn a hard corner and create something different". Alsup added, however, that Anthropic's copying and storage of more than 7m pirated books in a central library infringed the authors' copyrights and was not fair use - although the company later bought "millions" of print books as well. The judge has ordered a trial in December to determine how much Anthropic owes for the infringement. "That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for the theft but it may affect the extent of statutory damages," Alsup wrote. US copyright law says that wilful copyright infringement can result in damages of up to $150,000 (£110,000) per work. The copyright issue has pitted AI firms against publishers and the creative industries because generative AI models - the term for technology that underpins powerful tools such as the ChatGPT chatbot - have to be trained on a vast amount of publicly available data in order to generate their responses. Much of that data has included copyright-protected works. An Anthropic spokesperson said the company was pleased that the court recognised its AI training was transformative and "consistent with copyright's purpose in enabling creativity and fostering scientific progress". Keith Kupferschmid, chief executive of the US nonprofit Copyright Alliance, described the decision as a "mixed bag." "In some instances AI companies should be happy with the decision and in other instances copyright owners should be happy," he said. The writers filed the proposed class action against Anthropic last year, arguing that the company, which is backed by Amazon and Alphabet, used pirated versions of their books without permission or compensation to teach Claude to respond to human prompts. The proposed class action is one of several lawsuits brought by authors, news outlets and other copyright owners against companies including OpenAI, Microsoft and Meta Platforms over their AI training. The doctrine of fair use allows the use of copyrighted works without the copyright owner's permission in some circumstances. Fair use is a key legal defence for the tech companies, and Alsup's decision is the first to address it in the context of generative AI. AI companies argue their systems make fair use of copyrighted material to create new, transformative content, and that being forced to pay copyright holders for their work could hamstring the nascent industry. Anthropic told the court that it made fair use of the books and that US copyright law "not only allows, but encourages" its AI training because it promotes human creativity. The company said its system copied the books to "study Plaintiffs' writing, extract uncopyrightable information from it, and use what it learned to create revolutionary technology". Giles Parsons, a partner at UK law firm Browne Jacobson, said the ruling would have no impact in the UK, where the fair use argument holds less sway. Under current UK copyright law, which the government is seeking to change, copyright-protected work can be used without permission for scientific or academic research. He said: "The UK has a much narrower fair use defence which is very unlikely to apply in these circumstances." Copyright owners in the US and UK say that AI companies are unlawfully copying their work to generate competing content that threatens their livelihoods. A UK government proposals to change copyright law in the UK by allowing use of copyright-protected work without permission - unless the work's owner signals they want to opt out of the process - has met with vociferous opposition from the creative industries. Alsup said, Anthropic violated the authors' rights by saving pirated copies of their books as part of a "central library of all the books in the world" that would not necessarily be used for AI training. Anthropic and other prominent AI companies including OpenAI and Facebook owner Meta have been accused of downloading pirated digital copies of millions of books to train their systems.
[37]
Meta wins AI copyright lawsuit as US judge rules against authors
Writers accused Facebook owner of breach over using books without permission to train its AI system Mark Zuckerberg's Meta has won the backing of a judge in a copyright lawsuit brought by a group of authors, in the second legal victory for the US artificial intelligence industry this week. The writers, who included Sarah Silverman and Ta-Nehisi Coates, had argued that the Facebook owner had breached copyright law by using their books without permission to train its AI system. The ruling follows a decision on Monday that Anthropic, another major player in the AI field, had not infringed authors' copyright. The US district judge Vince Chhabria, in San Francisco, said in his decision on the Meta case that the authors had not presented enough evidence that the technology company's AI would dilute the market for their work to show that its conduct was illegal under US copyright law. However, the ruling offered some hope for American creative professionals who argue that training AI models on their work without permission is illegal. Chhabria also said that using copyrighted work without permission to train AI would be unlawful in "many circumstances", splitting with another federal judge in San Francisco who found on Monday in a separate lawsuit that Anthropic's AI training made "fair use" of copyrighted materials. The doctrine of fair use allows the use of copyrighted works without the copyright owner's permission in some circumstances and is a key defence for the tech companies. "This ruling does not stand for the proposition that Meta's use of copyrighted materials to train its language models is lawful," Chhabria said. "It stands only for the proposition that these plaintiffs made the wrong arguments and failed to develop a record in support of the right one." Anthropic also faces a further trial this year after the judge in its case ruled that its copying and storage of more than 7m pirated books in a central library infringed the authors' copyrights and was not fair use. A spokesperson for the Meta case authors' law firm, Boies Schiller Flexner, said that it disagreed with the judge's decision to rule for Meta despite the "undisputed record" of the company's "historically unprecedented pirating of copyrighted works". A Meta spokesperson said the company appreciated the decision and called fair use a "vital legal framework" for building "transformative" AI technology. The authors sued Meta in 2023, arguing the company misused pirated versions of their books to train its AI system Llama without permission or compensation. The copyright issue has pitted AI companies against publishers and the creative industries on both sides of the Atlantic because generative AI models - the term for technology that underpins powerful tools such as the ChatGPT chatbot - have to be trained on a vast amount of publicly available data in order to generate their responses. Much of that data has included copyright-protected works. The lawsuit is one of several copyright cases brought by writers, news outlets and other copyright owners against companies including OpenAI, Microsoft and Anthropic over their AI training. AI companies argue their systems make fair use of copyrighted material by studying it to learn to create new, transformative content, and that being forced to pay copyright holders for their work could hamstring the growing AI industry. Copyright owners say AI companies unlawfully copy their work to generate competing content that threatens their livelihoods. Chhabria expressed sympathy for that argument during a hearing in May, which he reiterated on Wednesday. The judge said generative AI had the potential to flood the market with endless images, songs, articles and books using a tiny fraction of the time and creativity that would otherwise be required to create them. "So by training generative AI models with copyrighted works, companies are creating something that often will dramatically undermine the market for those works, and thus dramatically undermine the incentive for human beings to create things the old-fashioned way," Chhabria said.
[38]
Judge in 'Kadrey v. Meta' AI copyright case rules for Meta, against authors
Meta's victory is a setback for creatives, but the fight is far from over. Credit: Ian Moore / Mashable Composite; breakermaximus / MileA / iStock / Andrea Izzotti / iStock Editorial / Getty / Meta Meta just won a major ruling in a landmark case about how copyright law and fair use applies to AI model training, the second such loss for authors this week. Just days ago, Anthropic won a fair use case as well. Late Wednesday afternoon, U.S. District Judge for the Northern District of California Vince Chhabria denied the plaintiffs' motion for partial summary judgment. At issue in the case: whether Meta's use of pirated books to train its Llama AI models violated copyright law. In the case, Richard Kadrey, et al. v. Meta Platforms Inc., authors including Richard Kadrey, Sarah Silverman, Ta-Nehisi Coates, and Junot Diaz accused Meta of copyright infringement. In the discovery phase of the case, internal Meta messages revealed that the company used pirated datasets with copies of 7.5 million pirated books and 81 million research papers, according to The Atlantic's LibGen investigation. What may seem like a blatant theft for profit in the eyes of the authors is actually a much more complex deliberation in copyright law. It's undisputed that Meta torrented terabytes of pirated books, but its lawyers successfully defended this act under the fair use doctrine, which allows the use of copyrighted works in certain contexts. Kadrey v. Meta is one of dozens of copyright lawsuits against AI companies making their way through the U.S. court system. At the heart of these fights is a battle of values: the rights and livelihoods of artists versus technological innovation at all costs. Of the four fair use factors, the case mostly hinged on factor one, whether the use is transformative, and factor four, whether the use harms the existing or future market for the copyrighted work. Meta clinched factor one. "There is no serious question that Meta's use of the plaintiffs' books had a 'further purpose' and 'different character' than the books -- that it was highly transformative," said Chhabria in his ruling. Relatedly, Anthropic won a fair use case on Tuesday, with U.S. District Judge William Alsup deeming its Claude models transformative. So the bulk of the deliberation came down to the fourth factor, or market harms. Chhabria said the plaintiffs failed to successfully argue that Meta caused market harm, for example, by regurgitating verbatim excerpts of books, robbing authors of AI licensing deals, or diluting the market with AI-generated copycats. "Meta has defeated the plaintiffs' half-hearted argument that its copying causes or threatens significant market harm," said Chhabria. "That conclusion may be in significant tension with reality, but it's dictated by the choice the plaintiffs made... while failing to present meaningful evidence on the effect of training LLMs like Llama with their books on the market for [AI-generated] books." Chhabria's decision was forecasted during the oral arguments held on May 1. The judge grilled lead plaintiff counsel David Boies about his team's shortcomings in presenting the market harm argument. "Whether it's in the summary judgment record or not, it seems like you're asking me to speculate that the market for Sarah Silverman's memoir will be affected by the billions of things that Llama will ultimately be capable of producing," said Chhabria "and it's just not obvious to me that that's the case." Chhabria even pushed Boies to argue more strongly for market harms, saying, "you lose if you can't show that the market for the copyrighted works that are being used to train the models are dramatically impacted." Almost two months later, Chhabria made this decision final. "We appreciate today's decision from the Court," said a Meta spokesperson about the ruling. "Open-source AI models are powering transformative innovations, productivity and creativity for individuals and companies, and fair use of copyright material is a vital legal framework for building this transformative technology." The ruling does contain some good news for authors and artists, just not for the 13 authors involved in this case. Judge Chhabria emphasized that his decision isn't a precedent that applies to all such cases. This Tweet is currently unavailable. It might be loading or has been removed. This Tweet is currently unavailable. It might be loading or has been removed. Chhabria explained in his ruling that his decision was less about the fair use defense of using pirated books to train AI models and more about the shortcomings of the plaintiffs' argument. "The Court had no choice but to grant summary judgment to Meta," said the judge, before adding: "This is not a class action, so the ruling only affects the rights of these thirteen authors -- not the countless others whose works Meta used to train its models. And, as should now be clear, this ruling does not stand for the proposition that Meta's use of copyrighted materials to train its language models is lawful. It stands only for the proposition that these plaintiffs made the wrong arguments and failed to develop a record in support of the right one." Chhabria also said he believed "it will be illegal to copy copyright-protected works to train generative AI models without permission." On a possibly related note, this May, the U.S. Copyright Office released a pre-publication version of a highly anticipated report on copyright law and AI. The report concluded that training AI models on copyrighted works without permission is likely not fair use. However, the report came out days before President Donald Trump fired the head of the Copyright Office, so it's unclear what impact this preliminary report could have on future cases. Meta's fair use ruling is certainly a setback for authors and other creatives. But as Chhabria signaled, the fight is far from over.
[39]
The 'Kadrey v. Meta' fair use ruling is just the start of a long, complex AI copyright battle
On Wednesday, the judge in the landmark AI copyright case Kadrey, et al. v. Meta Platforms Inc. ruled in Meta's favor. And U.S. District Judge Vince Chhabria seemed to do so reluctantly, calling his own ruling "in significant tension with reality." Thirteen authors, including Sarah Silverman, Ta-Nehisi Coates, and Junot Diaz, sued Meta for its unlicensed use of their books to train its Llama AI models. The facts of the case seemed particularly egregious. Not only did Meta pirate unlicensed copies of the authors' works, but internal Meta messages revealed during discovery showed that the company's own employees expressed legal and ethical doubts about pirating those works. Other messages suggest that employees sought to eliminate traces of piracy, looking for words like "stolen" and "pirated" as part of the team's "mitigation" efforts. Instead of settling the messy copyright battle over AI training, Chhabria's ruling adds another layer of complexity to this legal issue. Just a day earlier, a judge in a similar AI copyright case ruled in favor of another AI company, Anthropic. In the same Northern District of California, U.S. District Judge William Alsup declared in Bartz v. Anthropic that Anthropic's use of pirated books in shadow libraries Books3 and LibGen (the same datasets in the Meta case) was fair use. However, Robert Brauneis, an intellectual property law professor at George Washington University Law School, said Judge Alsup and Judge Chhabria used dramatically different reasoning. Both cases hinged on the fair use legal doctrine, particularly the fourth factor in such defenses -- potential market harms. "Judge Alsup has a very narrow view: if a generative AI output does not itself infringe a particular work used to train the model, any loss in sales of the training work caused by people using the AI output instead cannot be taken into account as 'market harm' under the fourth factor," said Brauneis, who was among a group of copyright lawyers that filed an amicus brief in support of plaintiffs in Kadrey v. Meta. "Judge Chhabria says that's wrong: harm caused by 'diluting' the market for a training work can and should be taken into account, and serious market dilution harm can even outweigh a high level of transformativeness under the first factor." So while both judges sided with the fair use argument, their opposing rationales lay the groundwork for a complex and fragmented legal landscape. The plaintiffs tried, and failed, to argue against Meta's fair use defense. In a blog post written after the May 1 oral arguments, Kevin Madigan, senior VP of policy and government affairs for the Copyright Alliance, wrote that the plaintiff's lawyer "shockingly" failed to present potential counterarguments. Of the four fair use factors, the case mostly hinged on factor one, whether the use is transformative, and factor four, whether the use harms the existing or future market for the copyrighted work. Chhabria favored Meta on factor one. "There is no serious question that Meta's use of the plaintiffs' books had a 'further purpose' and 'different character' than the books -- that it was highly transformative," said Chhabria in his ruling. The deliberation then turned to the fourth factor, or market harms, where Chhabria had much to say about the plaintiff's counsel's argument. They simply failed to successfully argue that Meta caused market harm. In discussing market harms during oral arguments, Chhabria brought up a hypothetical -- future Taylor Swifts. "Even if a million songs are produced by [Meta's Llama] model in the style of a Taylor Swift song, it's not going to affect the market for Taylor Swift songs. But what about the next Taylor Swift?" Chhabria asked Meta lawyer Kannon Shanmugam. "What about the up-and-coming, relatively unknown artist who is writing songs... and by feeding copyrighted works like hers into the model, it enables the model to produce a billion pop songs?" Chhabria seemed to foreshadow his eventual ruling when he questioned plaintiff counsel David Boies about evidence of market harms. "Whether it's in the summary judgment record or not, it seems like you're asking me to speculate that the market for Sarah Silverman's memoir will be affected by the billions of things that Llama will ultimately be capable of producing," said Chhabria "and it's just not obvious to me that that's the case." Chhabria told Boies, "you lose if you can't show that the market for the copyrighted works that are being used to train the models are dramatically impacted." Ultimately, Chhabria decided that Meta had the stronger argument. "Meta has defeated the plaintiffs' half-hearted argument that its copying causes or threatens significant market harm," said Chhabria. "That conclusion may be in significant tension with reality, but it's dictated by the choice the plaintiffs made... while failing to present meaningful evidence on the effect of training LLMs like Llama with their books on the market for [AI-generated] books." On the day of the ruling, a Meta spokesperson provided this statement to Mashable: "We appreciate today's decision from the Court. Open-source AI models are powering transformative innovations, productivity and creativity for individuals and companies, and fair use of copyright material is a vital legal framework for building this transformative technology." In his decision, the district judge said his ruling was less about the fair use defense of using pirated books to train AI models and more about the shortcomings of the plaintiffs' argument. "The Court had no choice but to grant summary judgment to Meta," said Chhabria, before adding: "This is not a class action, so the ruling only affects the rights of these thirteen authors -- not the countless others whose works Meta used to train its models. And, as should now be clear, this ruling does not stand for the proposition that Meta's use of copyrighted materials to train its language models is lawful. It stands only for the proposition that these plaintiffs made the wrong arguments and failed to develop a record in support of the right one." His ruling also leaves the door open for other artists to file similar copyright suits against Meta -- and other AI companies. Chhabria even postulated that "it will be illegal to copy copyright-protected works to train generative AI models without permission." But this ruling also has symbolic meaning for artists. "If this case comes out and says that training of large language models on pirated datasets from which copyright information has been stripped is fair use, then that is a horrible, horrible outcome for millions of creative professionals around the world," said Justin Hughes, a law professor at Loyola Law School, in an interview with Mashable before the ruling. Kadrey v. Meta is one of dozens of copyright lawsuits against AI companies. At the time of publication, AI blog ChatGPT Is Eating the World counted 39 ongoing cases. But while courts deliberate, generative AI is already making a big impact on creative industries. Generative AI's ability to automate the creation of text, images, video, and audio is already replacing creative jobs. In 2024, researchers from the Imperial College London Business School and the Berlin School of Economics published a paper analyzing how generative AI is affecting the labor market. Since the introduction of ChatGPT, they found "nearly immediate decreases in posts for online gig workers across job types, but particularly for automation-prone jobs." The jobs most impacted were writing gigs, which decreased by 30 percent. A 2023 report commissioned by the Animation Guild to measure generative AI's impact in entertainment industries stated, "almost two-thirds of the 300 business leaders surveyed expect GenAI to play a role in consolidating or replacing existing job titles in their business division over the next three years. According to the study, which was conducted by CVL Economics, that's 203,800 missing jobs by 2026. Many artists see the existence of AI tools like Llama as an existential threat. Adding insult to injury, AI models were trained on the very human expression they're accused of replacing. In an amicus brief in support of the plaintiffs, the American Association of Publishers argued that this case was much simpler than it seemed. Meta, "a company valued at over a trillion dollars, asks this Court to declare that it is free to appropriate and commercially exploit the content of copyrighted works on a massive scale without permission or payment for that content, a ruling that would have catastrophic consequences for authors and publishers of books, journals and other textual works protected by copyright." While Meta prevailed on the fair use ruling, Madigan called Chhabria's decision a "mixed bag." "The things that are not good for copyright owners are Judge Chhabria's treatment of transformative use under the first factor, and also his unwillingness to recognize licensing markets under the fourth." Here, Madigan was referring to the plaintiff's potential loss of licensing deals, an argument that Chhabria said he wouldn't take into account. "But why that is not necessarily the worst thing in the world, is that it's so cabined to the specifics of this case and the failure to develop a record and raise certain issues," Madigan continued. The plaintiffs will also likely appeal, he added. A spokesperson for Boies Schiller Flexner, the firm representing the plaintiffs, told Mashable, "The court ruled that AI companies that 'feed copyright-protected works into their models without getting permission from the copyright holders or paying for them' are generally violating the law. Yet, despite the undisputed record of Meta's historically unprecedented pirating of copyrighted works, the court ruled in Meta's favor. We respectfully disagree with that conclusion." They did not respond to the question of whether they would file an appeal. Kadrey v. Meta and Bartz v. Anthropic are often lumped together because they both focus on the inputs of pirated books as data to train AI models. By contrast, other high-profile AI copyright cases -- the New York Times lawsuit against OpenAI and Microsoft, another case against Anthropic from major record labels (Concord v. Anthropic), and the more recent Disney v. Midjourney -- focus on AI models' outputs. For these cases, "where they've all shown evidence of infringing output, [Kadrey v. Meta] has absolutely no bearing," said Madigan. With cases that focus on output, "you don't have to get into sort of these more abstract doctrinal discussions about transformative use and whether training is transformative in purpose. You just have to show side-by-side verbatim copies," he continued.
[40]
AI training is 'fair use' federal judge rules in Anthropic copyright case
A federal judge in San Francisco has ruled that training an AI model on copyrighted works without specific permission to do so was not a violation of copyright law. U.S. District Judge William Alsup said that AI company Anthropic could assert a "fair use" defense against copyright claims for training its Claude AI models on copyrighted books. But the judge also ruled that it mattered exactly how those books were obtained. Alsup supported Anthropic's claim that it was "fair use" for it to purchase millions of books and then digitize them for use in AI training. The judge said it was not ok, however, for Anthropic to have also downloaded millions of pirated copies of books from the internet and then maintained a digital library of those pirated copies. The judge ordered a separate trial on Anthropic's storage of those pirated books, which could determine the company's liability and any damages related to that potential infringement. The judge has also not yet ruled whether to grant the case class action status, which could dramatically increase the financial risks to Anthropic if it is found to have infringed on authors' rights. In finding that it was "fair use" for Anthropic to train its AI models on books written by three authors -- Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson -- who had filed a lawsuit against the AI company for copyright violations, Alsup addressed a question that has simmered since before OpenAI's ChatGPT kick-started the generative AI boom in 2022: Can copyrighted data be used to train generative AI models without the owner's consent? Dozens of AI and copyright-related lawsuits have been filed over the past three years, most of which hinge on the concept of fair use, a doctrine that allows the use of copyrighted material without permission if the use is sufficiently transformative -- meaning it must serve a new purpose or add new meaning, rather than simply copying or substituting the original work. Alsup's ruling may set a precedent for these other copyright cases -- although it is also likely that many of these rulings will be appealed, meaning it will take years until there is clarity around AI and copyright in the U.S. According to the judge's ruling, Anthropic's use of the books to train Claude was "exceedingly transformative" and constituted "fair use under Section 107 of the Copyright Act." Anthropic told the court that its AI training was not only permissible, but aligned with the spirit of U.S. copyright law, which it argued "not only allows, but encourages" such use because it promotes human creativity. The company said it copied the books to "study Plaintiffs' writing, extract uncopyrightable information from it, and use what it learned to create revolutionary technology." While training AI models with copyrighted data may be considered fair use, Anthropic's separate action of building and storing a searchable repository of pirated books is not, Alsop ruled. Alsup noted that the fact that Anthropic later bought a copy of a book it earlier stole off the internet "will not absolve it of liability for the theft but it may affect the extent of statutory damages." The judge also looked askance at Anthropic's acknowledgement that it had turned to downloading pirated books in order to save time and money in building its AI models. "This order doubts that any accused infringer could ever meet its burden of explaining why downloading source copies from pirate sites that it could have purchased or otherwise accessed lawfully was itself reasonably necessary to any subsequent fair use," Alsup said. The "transformative" nature of AI outputs is important, but it's not the only thing that matters when it comes to fair use. There are three other factors to consider: what kind of work it is (creative works get more protection than factual ones), how much of the work is used (the less, the better), and whether the new use hurts the market for the original. For example, there is the ongoing case against Meta and OpenAI by comedian Sarah Silverman and two other authors, who filed copyright infringement lawsuits in 2023 alleging that pirated versions of their works were used without permission to train AI language models. The defendants recently argued that the use falls under fair use doctrine because AI systems "study" works to "learn" and create new, transformative content. Federal district judge Vince Chhabria pointed out that even if this is true, the AI systems are "dramatically changing, you might even say obliterating, the market for that person's work." But he also took issue with the plaintiffs, saying that their lawyers had not provided enough evidence of potential market impacts. Alsup's decision differed markedly from Chhabria's on this point. Alsup said that while it was undoubtedly true that Claude could lead to increase competition for the authors' works, this kind of "competitive or creative displacement is not the kind of competitive or creative displacement that concerns the Copyright Act" Copyright's purpose was to encourage the creation of new works, not to shield authors from competition, Alsup said, and he likened the authors' objections to Claude to the fear that teaching school children to write well might also result in an explosion of competing books. Alsup also took note in his ruling that Anthropic had built "guardrails" into Claude that were meant to prevent it from producing outputs that directly plagiarized the books on which it had been trained. Neither Anthropic nor the plaintiffs' lawyers immediately responded to requests to comment on the Alsup's decision.
[41]
Courts say AI training on copyrighted material is legal
A ruling in a U.S. District Court has effectively given permission to train artificial intelligence models using copyrighted works, in a decision that's extremely problematic for creative industries. Content creators and artists have been suffering for years, with AI companies scraping their sited and scanning books to train large language models (LLMs) without permission. That data is then used for generative AI and other machine learning tasks, and then monetized by the scraping company with no compensation for the original host or author. Following a ruling by a U.S. District Court for the Northern District of California issued on Tuesday, companies are being given free rein to train with just about any published media that they want to harvest. The ruling is based on a lawsuit from Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson against Anthropic dating back to 2024. The suit accused the company of using pirated material to train its Claude AI models. This included Anthropic creating digital copies of printed books for AI model training. The ruling from Judge William Alsup -- a judge very familiar to readers of AppleInsider -- rules in favor of each side in various ways. However, the weight of the ruling certainly sides with Anthropic and AI scrapers in this instance. Under the ruling, Judge Alsup says that copies used to train specific LLMs was justifiable as fair use. "The technology at issue was among the most transformative many of us will see in our lifetimes," Alsup commented. For physical copies that were converted from a print library to a digital library, this was also deemed fair use. Furthermore, using that content to train LLMs was also fair use. Alsup compared the author's complaint to if the same argument was used against an effort to train schoolchildren how to write well. It's not clear how that applies, given that artificial intelligence models are not considered "schoolchildren" in any legal sense. In that argument, Alsup ruled that the Copyright Act is intended to advance original works of authorship, not to "protect authors against competition." Where the authors saw a small amount of success was in the usage of pirated works. Creating a library of pirated digital books, even if they are not used for the training of a model, does not constitute fair use. That also remains the case if Anthropic later bought a copy of a pirated book after pirating it in the first place. On the matter of the piracy argument, the court will be holding a trial to determine damages against Anthropic. The ruling is terrible for artists, musicians, and writers. Other professions where machine learning models could be a danger to their livelihoods will have issues too -- like judges who once said that they took a coding class once, and therefore knew what they were talking about with tech. AI models take advantage of the hard work and life experiences of media creators, and pass it off as its own. At the same time, it leaves content producers with few options to take to combat the phenomenon. As it stands, the ruling will clearly be precedent in other lawsuits in the AI space, especially when dealing with the producers of original works that are pillaged for training purposes. Over the years, AI companies were attacked for grabbing any data they could to feed the LLMs, even content scraped from the Internet without permission. This is a problem that manifests in quite a few ways. The most obvious is in generative AI, as the models could be trained to create images in specific styles, which devalues the work of actual artists. A example of a fightback is a lawsuit from Disney and Universal against Midjourney, which surfaced in early June. The company behind the AI image generator is accused of mass copyright infringement, for training the models on image of the most recognizable characters from the studio. The studios unite in calling Midjourney "a bottomless pit of plagarism," built on the unauthorized use of protected material. When you have two major media companies that are usually bitter rivals uniting for a single cause, you know it's a serious issue. It's also a growing issue for websites and publishers, like AppleInsider. Instead of using a search tool and viewing websites for information, a user can simply ask for a customized summary from an AI model, without needing to visit the site that it has sourced the information from in the first place. And, that information is often wrong, combined with data from other sources, polluting the original meaning of the content. For instance, we've seen our tips on how to do something plagiarized with sections reproduced verbatim, and mashed up out of order with that from other sites, making a procedure that doesn't work. The question of how to deal with compensating the lost revenues of publishers is still one that has not yet been answered in a meaningful way. There are some companies that have been trying to stay on the more ethical side of things, with Apple among them. Apple has offered news publishers millions to license content, for training its generative AI. It has also paid for licenses from Shutterstock, which helped develop its visual engines used for Apple Intelligence features. Major publishers have also taken to blocking AI services from accessing their archives, doing so via robots.txt. However, this only stops ethical scrapers, not everyone. And, scraping an entire site takes server power and bandwidth -- which is not free for the hosting site that's getting scraped. The ruling also follows after an increase in efforts from major tech companies to lobby for a block on U.S. states introducing AI regulation for a decade. Meanwhile in the EU, there have been attempts to sign tech companies up to an AI Pact, to develop AI in safe ways. Apple is apparently not involved in either effort.
[42]
US judge sides with Meta in AI training copyright case
A US judge on Wednesday handed Meta a victory over authors who accused the tech giant of violating copyright law by training Llama artificial intelligence on their creations without permission. District Court Judge Vince Chhabria in San Francisco ruled that Meta's use of the works to train its AI model was "transformative" enough to constitute "fair use" under copyright law, in the second such courtroom triumph for AI firms this week. However, it came with a caveat that the authors could have pitched a winning argument that by training powerful generative AI with copyrighted works, tech firms are creating a tool that could let a sea of users compete with them in the literary marketplace. "No matter how transformative (generative AI) training may be, it's hard to imagine that it can be fair use to use copyrighted books to develop a tool to make billions or trillions of dollars while enabling the creation of a potentially endless stream of competing works that could significantly harm the market for those books," Chhabria said in his ruling. Tremendous amounts of data are needed to train large language models powering generative AI. Musicians, book authors, visual artists and news publications have sued various AI companies that used their data without permission or payment. AI companies generally defend their practices by claiming fair use, arguing that training AI on large datasets fundamentally transforms the original content and is necessary for innovation. "We appreciate today's decision from the court," a Meta spokesperson said in response to an AFP inquiry. "Open-source AI models are powering transformative innovations, productivity and creativity for individuals and companies, and fair use of copyright material is a vital legal framework for building this transformative technology." In the case before Chhabria, a group of authors sued Meta for downloading pirated copies of their works and using them to train the open-source Llama generative AI, according to court documents. Books involved in the suit include Sarah Silverman's comic memoir "The Bedwetter" and Junot Diaz's Pulitzer Prize-winning novel "The Brief Wondrous Life of Oscar Wao," the documents showed. "This ruling does not stand for the proposition that Meta's use of copyrighted materials to train its language models is lawful," the judge stated. "It stands only for the proposition that these plaintiffs made the wrong arguments and failed to develop a record in support of the right one." Market harming? A different federal judge in San Francisco on Monday sided with AI firm Anthropic regarding training its models on copyrighted books without authors' permission. District Court Judge William Alsup ruled that the company's training of its Claude AI models with books bought or pirated was allowed under the "fair use" doctrine in the US Copyright Act. "Use of the books at issue to train Claude and its precursors was exceedingly transformative and was a fair use," Alsup wrote in his decision. "The technology at issue was among the most transformative many of us will see in our lifetimes," Alsup added in his decision, comparing AI training to how humans learn by reading books. The ruling stems from a class-action lawsuit filed by authors Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson, who accused Anthropic of illegally copying their books to train chatbot Claude, the company's ChatGPT rival. Alsup rejected Anthropic's bid for blanket protection, ruling that the company's practice of downloading millions of pirated books to build a permanent digital library was not justified by fair use protections.
[43]
US judge backs using copyrighted books to train AI
A US federal judge has sided with Anthropic regarding training its artificial intelligence models on copyrighted books without authors' permission, a decision with the potential to set a major legal precedent in AI deployment. District Court Judge William Alsup ruled on Monday that the company's training of its Claude AI models with books bought or pirated was allowed under the "fair use" doctrine in the US Copyright Act. "Use of the books at issue to train Claude and its precursors was exceedingly transformative and was a fair use," Alsup wrote in his decision. "The technology at issue was among the most transformative many of us will see in our lifetimes," Alsup added in his 32-page decision, comparing AI training to how humans learn by reading books. Tremendous amounts of data are needed to train large language models powering generative AI. Musicians, book authors, visual artists and news publications have sued various AI companies that used their data without permission or payment. AI companies generally defend their practices by claiming fair use, arguing that training AI on large datasets fundamentally transforms the original content and is necessary for innovation. "We are pleased that the court recognized that using 'works to train LLMs was transformative,'" an Anthropic spokesperson said in response to an AFP query. The judge's decision is "consistent with copyright's purpose in enabling creativity and fostering scientific progress," the spokesperson added. Blanket protection rejected The ruling stems from a class-action lawsuit filed by authors Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson, who accused Anthropic of illegally copying their books to train Claude, the company's AI chatbot that rivals ChatGPT. However, Alsup rejected Anthropic's bid for blanket protection, ruling that the company's practice of downloading millions of pirated books to build a permanent digital library was not justified by fair use protections. Along with downloading of books from websites offering pirated works, Anthropic bought copyrighted books, scanned the pages and stored them in digital format, according to court documents. Anthropic's aim was to amass a library of "all the books in the world" for training AI models on content as deemed fit, the judge said in his ruling. While training AI models on the pirated content posed no legal violation, downloading pirated copies to build a general-purpose library constituted copyright infringement, regardless of eventual training use. The case will now proceed to trial on damages related to the pirated library copies, with potential penalties including financial damages. Anthropic said it disagreed with going to trial on this part of the decision and was evaluating its legal options. Valued at $61.5 billion and heavily backed by Amazon, Anthropic was founded in 2021 by former OpenAI executives. The company, known for its Claude chatbot and AI models, positions itself as focused on AI safety and responsible development.
[44]
Anthropic's landmark copyright ruling is a victory for the AI industry -- but the company is still on the hook for piracy claims
In a test case for the artificial intelligence industry, a federal judge has ruled that AI company Anthropic didn't break the law by training its chatbot Claude on millions of copyrighted books. But the company is still on the hook and must now go to trial over how it acquired those books by downloading them from online "shadow libraries" of pirated copies. U.S. District Judge William Alsup of San Francisco said in a ruling filed late Monday that the AI system's distilling from thousands of written works to be able to produce its own passages of text qualified as "fair use" under U.S. copyright law because it was "quintessentially transformative." "Like any reader aspiring to be a writer, Anthropic's (AI large language models) trained upon works not to race ahead and replicate or supplant them -- but to turn a hard corner and create something different," Alsup wrote. But while dismissing a key claim made by the group of authors who sued the company for copyright infringement last year, Alsup also said Anthropic must still go to trial in December over its alleged theft of their works. "Anthropic had no entitlement to use pirated copies for its central library," Alsup wrote. A trio of writers -- Andrea Bartz, Charles Graeber and Kirk Wallace Johnson -- alleged in their lawsuit last summer that Anthropic's practices amounted to "large-scale theft," and that the company "seeks to profit from strip-mining the human expression and ingenuity behind each one of those works." As the case proceeded over the past year in San Francisco's federal court, documents disclosed in court showed Anthropic's internal concerns about the legality of their use of online repositories of pirated works. So the company later shifted its approach and attempted to purchase copies of digitized books. "That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for the theft but it may affect the extent of statutory damages," Alsup wrote. The ruling could set a precedent for similar lawsuits that have piled up against Anthropic competitor OpenAI, maker of ChatGPT, as well as against Meta Platforms, the parent company of Facebook and Instagram. Anthropic -- founded by ex-OpenAI leaders in 2021 -- has marketed itself as the more responsible and safety-focused developer of generative AI models that can compose emails, summarize documents and interact with people in a natural way. But the lawsuit filed last year alleged that Anthropic's actions "have made a mockery of its lofty goals" by tapping into repositories of pirated writings to build its AI product. Anthropic said Tuesday it was pleased that the judge recognized that AI training was transformative and consistent with "copyright's purpose in enabling creativity and fostering scientific progress." Its statement didn't address the piracy claims.
[45]
Anthropic Scores Partial Victory in Copyright Case Over AI Training Data - Decrypt
OpenAI and Meta face similar author-led lawsuits over the use of copyrighted works to train AI models. AI firm Anthropic has won a key legal victory in a copyright battle over how artificial intelligence companies use copyrighted material to train their models, but the fight is far from over. U.S. District Judge William Alsup found that Anthropic's use of copyrighted books to train its AI chatbot Claude qualifies as "fair use" under U.S. copyright law, in a ruling late Monday. "Like any reader aspiring to be a writer, Anthropic's LLMs trained upon works not to race ahead and replicate or supplant them -- but to turn a hard corner and create something different," U.S. District Judge William Alsup said in his ruling. But the judge also faulted the Amazon and Google-backed firm for building and maintaining a massive "central library" of pirated books, calling that part of its operations a clear copyright violation. The case, brought last August by authors Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson, accused Anthropic of building Claude using millions of pirated books downloaded from notorious sites like Library Genesis and Pirate Library Mirror. The lawsuit, which seeks damages and a permanent injunction, alleges Anthropic "built a multibillion-dollar business by stealing hundreds of thousands of copyrighted books," to train Claude, its family of AI models. Alsup said that AI training can be "exceedingly transformative," noting how Claude's outputs do not reproduce or regurgitate authors' works but generate new text "orthogonal" to the originals. Court records reveal that Anthropic downloaded at least seven million pirated books, including copies of each author's works, to assemble its library. Internal emails revealed that Anthropic co-founders sought to avoid the "legal/practice/business slog" of licensing books, while employees described the goal as creating a digital collection of "all the books in the world" to be kept "forever." "There is no carveout, however, from the Copyright Act for AI companies," Alsup said, noting that maintaining a permanent library of stolen works -- even if only some were used for training -- "destroy the academic publishing market" if allowed. Judge William Alsup's ruling is the first substantive decision by a U.S. federal court that directly analyzes and applies the doctrine of fair use specifically to the use of copyrighted material for training generative AI models. The court distinguished between copies used directly for AI training, which were deemed fair use, and the retained pirated copies, which will now be subject to further legal proceedings, including potential damages. While several lawsuits have been filed -- including high-profile cases against OpenAI, Meta, and others -- those cases are still in early stages, with motions to dismiss pending or discovery ongoing. OpenAI and Meta both face lawsuits from groups of authors alleging their copyrighted works were exploited without consent to train large language models such as ChatGPT and LLaMA. The New York Times sued OpenAI and Microsoft in 2023, accusing them of using millions of Times articles without permission to develop AI tools. Reddit also recently sued Anthropic, alleging it scraped Reddit's platform over 100,000 times to train Claude, despite claiming to have stopped.
[46]
Meta and OpenAI Use of Copyrighted Books for Training AI Was Fair Use: Federal Judge - Decrypt
Chhabria said Meta prevailed only because the authors failed to present strong arguments and evidence. A federal judge delivered a significant blow to authors suing tech giants over AI training this week. The judge ruled that Meta's use of copyrighted books to train its artificial intelligence models constituted fair use under copyright law. U.S. District Judge Vince Chhabria in San Francisco sided with Meta Platforms on Wednesday in a case brought by 13 authors, including comedian Sarah Silverman and Pulitzer Prize winners Junot Díaz and Andrew Sean Greer. The 13 authors suing Meta failed to provide enough evidence that the company's AI would dilute the market for their work, Judge Chhabria said in the ruling. Their argument, he said, "barely gives this issue lip service" and lacked the facts needed to prove harm under U.S. copyright law. But the judge made clear that the ruling is far from a blanket endorsement of AI companies' controversial training practices. "This ruling does not stand for the proposition that Meta's use of copyrighted materials to train its language models is lawful," Chhabria said. "It stands only for the proposition that these plaintiffs made the wrong arguments and failed to develop a record in support of the right one." Kunal Anand, CEO of AI chatbot service AiBaat, told Decrypt that he hopes it's a sign that courts will find a way of "balancing technological progress with creator rights." "While the decision favored Meta, it reminds us that ethical AI development demands clear licensing frameworks," he added. The authors sued Meta and OpenAI in 2023, alleging the companies misused pirated versions of their books to train their Llama AI and ChatGPT systems without permission or compensation. In January, court filings revealed that Meta CEO Mark Zuckerberg personally approved using the pirated dataset, despite warnings from his AI team that it was illegally obtained. Internal messages cited in the filings show engineers at Meta hesitated, with one employee admitting, "torrenting from a corporate laptop doesn't feel right." But the company proceeded anyway. Judge Chhabria acknowledged the potential for AI to "flood the market with endless amounts of images, songs, articles, books, and more" using "a tiny fraction of the time and creativity that would otherwise be required." He noted in the ruling this could "dramatically undermine the market for those works, and thus dramatically undermine the incentive for human beings to create things the old-fashioned way." Chhabria expressed sympathy for authors' concerns, but it wasn't enough to make a sound legal argument. "Courts can't decide cases based on general understandings," he said. The ruling favoring Meta affects only these 13 specific authors since it was not certified as a class action. The decision marks the second major victory for AI companies this week, following a similar ruling favoring Anthropic on Monday. In that case, Judge William Alsup also found AI training to be fair use, but criticized Anthropic for building a permanent library of pirated books. Experts say the solution to disputes over AI training and copyrighted content lies in proactive market-based approaches rather than waiting for regulatory clarity. "By the time policymakers catch up to the latest AI breakthroughs, those breakthroughs will have advanced another generation," Hitesh Bhardwaj, co-founder at Capx AI, told Decrypt. "A more sustainable path is to reward people whose work fuels AI: create transparent marketplaces where authors and creators license their own data on fair terms." "That approach puts control back in the hands of the people whose content powers our models," he said.
[47]
US judge rules that Anthropic's use of copyrighted content to train AI was fair use, but pirating books is step too far
The UK's Data (Use and Access) Bill has now passed, without the amendment that would've required AI tools to declare the use of copyrighted material, or any provision for copyright holders to 'opt-out' of their work being used as training data. The whole thing has left me wondering if there'll ever be something that AI can't gobble up and regurgitate. Well, a legal case in the US against AI firm Anthropic has produced an absolutely perfect punchline to this bleak episode. A federal judge has ruled that Anthropic didn't break the law when it used copyrighted material to train the large language model Claude, as this counts as "fair use" under US copyright law, reports AP News. What's keeping Anthropic submerged in legal hot water, though, is how the company may have acquired that copyrighted material -- in this case, thousands of books not bought but 'found' online. Long legal story short, AI can scrape copyrighted content -- it just can't pirate it. For context, this all began last summer, when authors Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson first brought their lawsuit against Anthropic. That filing from August last year alleged, "Anthropic downloaded known pirated versions of Plaintiffs' works." The full complaint goes on to read, "An essential component of Anthropic's business model -- and its flagship 'Claude' family of large language models (or 'LLMs') -- is the largescale theft of copyrighted works," and that the company "seeks to profit from strip-mining the human expression and ingenuity behind each one of those works." A number of documents disclosed as part of legal proceedings unearthed concerns from Anthropic's own employees about the use of pirated books to train Claude. Though the company pivoted to buying physical books in bulk and painstakingly digitising each page for the AI model to gobble up, the judge ruled that the earlier piracy still needs to be legally addressed. As such, the ruling made by San Francisco federal court Judge William Alsup on Monday means that Claude can keep being trained on the author's works -- but Anthropic must return to court in December to be tried based on the whole "largescale theft of copyrighted works" thing. Judge Alsup wrote in this week's ruling, "Anthropic had no entitlement to use pirated copies for its central library." I'm no legal professional, but on this point I can agree. However, Alsup also described the output of AI models trained on copyrighted material as "quintessentially transformative," and therefore not a violation of fair use under the law. He went on to add, "Like any reader aspiring to be a writer, Anthropic's (AI large language models) trained upon works not to race ahead and replicate or supplant them -- but to turn a hard corner and create something different." Again, I'm not any kind of lawyer, and I'm definitely not offering legal advice, but yeah, I'm not buying this argument. I'd argue that a truly transformative, creative synthesis requires at least some understanding of whatever material you're imbibing. Large language models like Claude don't 'understand' texts as we do, instead playing an extremely complex game of word association. In other words, Claude isn't creating, it's just trying to string together enough words that its training data say go together in order to fool a human into thinking the AI output they're reading is coherent copy. But what do I know? I'm just a writer -- and Large Language Models may now enjoy the legal precedent set by this San Francisco case.
[48]
US judge sides with Meta in AI training copyright case
San Francisco (United States) (AFP) - A US judge on Wednesday handed Meta a victory over authors who accused the tech giant of violating copyright law by training Llama artificial intelligence on their creations without permission. District Court Judge Vince Chhabria in San Francisco ruled that Meta's use of the works to train its AI model was "transformative" enough to constitute "fair use" under copyright law, in the second such courtroom triumph for AI firms this week. However, it came with a caveat that the authors could have pitched a winning argument that by training powerful generative AI with copyrighted works, tech firms are creating a tool that could let a sea of users compete with them in the literary marketplace. "No matter how transformative (generative AI) training may be, it's hard to imagine that it can be fair use to use copyrighted books to develop a tool to make billions or trillions of dollars while enabling the creation of a potentially endless stream of competing works that could significantly harm the market for those books," Chhabria said in his ruling. Tremendous amounts of data are needed to train large language models powering generative AI. Musicians, book authors, visual artists and news publications have sued various AI companies that used their data without permission or payment. AI companies generally defend their practices by claiming fair use, arguing that training AI on large datasets fundamentally transforms the original content and is necessary for innovation. "We appreciate today's decision from the court," a Meta spokesperson said in response to an AFP inquiry. "Open-source AI models are powering transformative innovations, productivity and creativity for individuals and companies, and fair use of copyright material is a vital legal framework for building this transformative technology." In the case before Chhabria, a group of authors sued Meta for downloading pirated copies of their works and using them to train the open-source Llama generative AI, according to court documents. Books involved in the suit include Sarah Silverman's comic memoir "The Bedwetter" and Junot Diaz's Pulitzer Prize-winning novel "The Brief Wondrous Life of Oscar Wao," the documents showed. "This ruling does not stand for the proposition that Meta's use of copyrighted materials to train its language models is lawful," the judge stated. "It stands only for the proposition that these plaintiffs made the wrong arguments and failed to develop a record in support of the right one." Market harming? A different federal judge in San Franciso on Monday sided with AI firm Anthropic regarding training its models on copyrighted books without authors' permission. District Court Judge William Alsup ruled that the company's training of its Claude AI models with books bought or pirated was allowed under the "fair use" doctrine in the US Copyright Act. "Use of the books at issue to train Claude and its precursors was exceedingly transformative and was a fair use," Alsup wrote in his decision. "The technology at issue was among the most transformative many of us will see in our lifetimes," Alsup added in his decision, comparing AI training to how humans learn by reading books. The ruling stems from a class-action lawsuit filed by authors Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson, who accused Anthropic of illegally copying their books to train chatbot Claude, the company's ChatGPT rival. Alsup rejected Anthropic's bid for blanket protection, ruling that the company's practice of downloading millions of pirated books to build a permanent digital library was not justified by fair use protections.
[49]
Meta wins AI copyright suit before it could go to a jury as the 'plaintiffs made the wrong arguments'
A landmark case on use of copyright materials by AI could be on the horizon, but the arguments weren't strong enough here. This week has been a weird one for AI copyright lawsuits. Antrophic just won a suit to use copyrighted content, but pirating that content first is indeed illegal. Meta, similarly, won a case for its use of copyrighted content, but once again, there's a big asterisk. As reported by TechCrunch, Meta won its case against 13 book authors whose work was scraped by its AI in a summary judgment. Effectively, this means the case was decided before going to a jury. It's not so much that Meta won, so much as the plaintiffs lost, as specified by US federal judge Vince Chhabria: "This ruling does not stand for the proposition that Meta's use of copyrighted materials to train its language models is lawful. It stands only for the proposition that these plaintiffs made the wrong arguments and failed to develop a record in support of the right one." The central arguments made, according to Chhabria, are that "Llama is capable of reproducing small snippets of text from their books" and that Llama using the authors' work "has diminished the authors' ability to license their works for the purpose of training large language models". Chhabria calls both of these arguments "clear losers" as Llama doesn't generate "enough text from the plaintiffs' books to matter, and the plaintiffs are not entitled to the market for licensing their works as AI training data". The one argument made by the plaintiffs that Chhabria does give credence to is that scraping their work could pump the market with similar derivative work. Despite suggesting this argument could have merit, the summary says the plaintiffs "barely give this issue lip service, and they present no evidence about how the current or expected outputs from Meta's models would dilute the market for their own works." Given that part of the plaintiff's argument is about them not being able to sell their books to LLM owners, I also don't buy it, as derivative work could still come from content willingly given to these tools. There are four factors in determining fair use that judges consider. They are: Whilst clarifying that fair use is rather flexible and malleable to change, Chhabria focused on the fourth factor as the one that potential plaintiffs should attempt to win on. "Because the issue of market dilution is so important in this context, had the plaintiffs presented any evidence that a jury could use to find in their favor on the issue, factor four would have needed to go to a jury." Effectively, the claim made is that the plaintiffs not only misunderstood where they could win but also focused on arguments that weren't persuasive for the current application of copyright law. "In cases involving uses like Meta's, it seems like the plaintiffs will often win". Chhabria continues, "It's hard to imagine that it can be fair use to use copyrighted books to develop a tool to make billions or trillions of dollars while enabling the creation of a potentially endless stream of competing works that could significantly harm the market for those books." Anthropic's recent case rules that it didn't break the law when using copyrighted material for its LLM Claude, but that using pirated material is still illegal. The judge in this case, Judge Alsup, argued Claude's use of that material is "quintessentially transformative." Chhabria is critical of the notion that Judge Alsup focuses on the transformative nature of generative AI without acknowledging "the harm it can inflict on the market for the works it gets trained on". In a similar vein, Getty has recently dropped a claim against Stability AI for its use of Getty's copyrighted material. Multiple cases are being settled each week around AI's use of copyrighted material, yet there appears to be room for a landmark case, according to Chhabria. In this specific case, though, "the plaintiffs presented no meaningful evidence on market dilution at all".
[50]
Meta and Anthropic win key verdicts in US AI copyright cases
Leading tech companies won a few verdicts this week in US artificial intelligence (AI) copyright lawsuits. Federal judges sided with Facebook parent Meta Platforms and AI company Anthropic in two separate verdicts. The case against Meta was brought by a group of authors who accused the company of stealing their works to train its AI technology. The Anthropic case decided that the company's AI Claude didn't break copyright rules by training on millions of copyrighted books. US District Judge Vince Chhabria found that the 13 authors who sued Meta "made the wrong arguments," so the case got thrown out - but that doesn't mean the use of copyright materials is lawful. In his 40-page ruling, Chhabria repeatedly said Meta and other AI companies have turned into serial copyright infringers as they train their technology on books and other works created by humans. "This ruling does not stand for the proposition that Meta's use of copyrighted materials to train its language models is lawful," Chhabria wrote. Earlier this week, US District Judge William Alsup ruled that Anthropic didn't break the law but the company must still go to trial because it obtained those books from pirate websites instead of buying them. But the actual process of an AI system distilling from thousands of written works to be able to produce its own passages of text qualified as "fair use" under US copyright law because it was "quintessentially transformative," Alsup wrote. Books are important sources of data needed to build large language models. In the race to outdo each other in developing the most advanced AI chatbots, a number of tech companies have turned to online repositories of stolen books that they can get for free.
[51]
Anthropic wins key AI copyright case, but remains on the hook for using pirated books
Alain Sherter is a senior managing editor with CBS News. He covers business, economics, money and workplace issues for CBS MoneyWatch. Anthropic has won a major legal victory in a case over whether the artificial intelligence company was justified in hoovering up millions of copyrighted books to train its chatbot. In a ruling that could set an important precedent for similar disputes, Judge William Alsup of the United States District Court for the Northern District of California on Tuesday said Anthropic's use of legally purchased books to train its AI model, Claude, did not violate U.S. copyright law. Anthropic, which was founded by former executives with ChatGPT developer OpenAI, introduced Claude in 2023. Like other generative AI bots, the tool lets users ask natural language questions and then provides neatly summarized answers using AI trained on millions of books, articles and other material. Alsup ruled that Anthropic's use of copyrighted books to train its language learning model, or LLM, was "quintessentially transformative" and did not violate "fair use" doctrine under copyright law. "Like any reader aspiring to be a writer, Anthropic's LLMs trained upon works not to race ahead and replicate or supplant them, but to turn a hard corner and create something different," his decision states. By contrast, Alsup also found that Anthropic may have broken the law when it separately downloaded millions of pirated books and said it will face a separate trial in December over this issue. Court documents revealed that Anthropic employees expressed concern concern about the legality of using pirate sites to access books. The company later shifted its approach and hired a former Google executive in charge of Google Books, a searchable library of digitized books that successfully weathered years of copyright battles. Anthropic cheered the ruling. "We are pleased that the Court recognized that using 'works to train LLMs (language learning models) was transformative -- spectacularly so," an Anthropic spokesperson told CBS News in an email. The ruling stems from a case filed last year by three authors in federal court. After Anthropic used copies of their books to train Claude, Andrea Bartz, Charles Graeber and Kirk Wallace Johnson sued Anthropic for alleged copyright infringement, claiming the company's practices amounted to "large-scale theft." The authors also alleged that Anthropic "seeks to profit from strip-mining the human expression and ingenuity behind each one of those works." The authors' attorneys declined comment. Other AI companies have also come under fire over the material they use to build their language learning models. The New York Times, for example, sued Open AI and Microsoft in 2023, claiming that the tech companies used millions of its articles to train their automated chatbots. At the same time, some media companies and publishers are also seeking compensation by licensing their content to companies like Anthropic and OpenAI.
[52]
US judge backs using copyrighted books to train AI
San Francisco (United States) (AFP) - A US federal judge has sided with Anthropic regarding training its artificial intelligence models on copyrighted books without authors' permission, a decision with the potential to set a major legal precedent in AI deployment. District Court Judge William Alsup ruled on Monday that the company's training of its Claude AI models with books bought or pirated was allowed under the "fair use" doctrine in the US Copyright Act. "Use of the books at issue to train Claude and its precursors was exceedingly transformative and was a fair use," Alsup wrote in his decision. "The technology at issue was among the most transformative many of us will see in our lifetimes," Alsup added in his 32-page decision, comparing AI training to how humans learn by reading books. Tremendous amounts of data are needed to train large language models powering generative AI. Musicians, book authors, visual artists and news publications have sued various AI companies that used their data without permission or payment. AI companies generally defend their practices by claiming fair use, arguing that training AI on large datasets fundamentally transforms the original content and is necessary for innovation. "We are pleased that the court recognized that using 'works to train LLMs was transformative,'" an Anthropic spokesperson said in response to an AFP query. The judge's decision is "consistent with copyright's purpose in enabling creativity and fostering scientific progress," the spokesperson added. Blanket protection rejected The ruling stems from a class-action lawsuit filed by authors Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson, who accused Anthropic of illegally copying their books to train Claude, the company's AI chatbot that rivals ChatGPT. However, Alsup rejected Anthropic's bid for blanket protection, ruling that the company's practice of downloading millions of pirated books to build a permanent digital library was not justified by fair use protections. Along with downloading of books from websites offering pirated works, Anthropic bought copyrighted books, scanned the pages and stored them in digital format, according to court documents. Anthropic's aim was to amass a library of "all the books in the world" for training AI models on content as deemed fit, the judge said in his ruling. While training AI models on the pirated content posed no legal violation, downloading pirated copies to build a general-purpose library constituted copyright infringement, regardless of eventual training use. The case will now proceed to trial on damages related to the pirated library copies, with potential penalties including financial damages. Anthropic said it disagreed with going to trial on this part of the decision and was evaluating its legal options. Valued at $61.5 billion and heavily backed by Amazon, Anthropic was founded in 2021 by former OpenAI executives. The company, known for its Claude chatbot and AI models, positions itself as focused on AI safety and responsible development.
[53]
Federal judge rules copyrighted books are fair use for AI training
Anthropic CEO Dario Amodei, right, and Chief Product Officer Mike Krieger talk after unveiling Claude 4 during the Code with Claude conference on May 22 in San Francisco.Don Feria / AP Content Services for Anthropic A federal judge has sided with Anthropic in a major copyright ruling, declaring that artificial intelligence developers can train on published books without authors' consent. The decision, filed Monday in the U.S. District Court for the Northern District of California, sets a precedent that training AI systems on copyrighted works constitutes fair use. Though it doesn't guarantee other courts will follow, Judge William Alsup's ruling marks the first of dozens of ongoing copyright lawsuits to give an answer on fair use in the context of generative AI. It's a question that's been raised by creatives across various industries for years since generative AI tools exploded into the mainstream, allowing users to easily produce art from models trained on copyrighted work -- often without the human creator's knowledge or permission. AI companies have been hit with a slew of copyright lawsuits from media companies, music labels and authors since 2023. Artists have signed multiple open letters urging government officials and AI developers to constrain the unauthorized use of copyrighted works. In recent years, companies have also increasingly inked licensing deals with AI developers to dictate terms of use for their artists' works. Alsup on Monday ruled on a lawsuit filed by three authors -- Andrea Bartz, Charles Graeber and Kirk Wallace Johnson -- last August, who claimed that Anthropic ignored copyright protections when it pirated millions of books and digitized purchased books to feed into its large language models, which helped train them to generate human-like text responses. "The copies used to train specific LLMs were justified as a fair use," Alsup wrote in the ruling. "Every factor but the nature of the copyrighted work favors this result. The technology at issue was among the most transformative many of us will see in our lifetimes." His decision stated that Anthropic's use of the books to train its models, including versions of its flagship AI model Claude, was "exceedingly transformative" enough to fall under fair use. Fair use, as defined by the Copyright Act, takes into account four factors: the purpose of the use, what kind of copyrighted work is used (creative works get stronger protection than factual works), how much of the work was used, and whether the use hurts the market value of the original work. "We are pleased that the Court recognized that using 'works to train LLMs was transformative -- spectacularly so,'" Anthropic said in a statement, quoting the ruling. "Consistent with copyright's purpose in enabling creativity and fostering scientific progress, 'Anthropic's LLMs trained upon works not to race ahead and replicate or supplant them -- but to turn a hard corner and create something different.'" Bartz and Johnson did not immediately respond to requests for comment. Graeber declined to comment. Alsup noted, however, that all of the authors' works contained "expressive elements" earning them stronger copyright protection, which is a factor that points against fair use, although not enough to sway the overall ruling. He also added that while making digital copies of purchased books was fair use, downloading pirated copies for free did not constitute fair use. But aside from the millions of pirated copies, Alsup wrote, copying entire works to train AI models was "especially reasonable" because the models didn't reproduce those copies for public access, and doing so "did not and will not displace demand" for the original books. His ruling stated that although AI developers can legally train AI models on copyrighted works without permission, they should obtain those works through legitimate means that don't involve pirating or other forms of theft. Despite siding with the AI company on fair use, Alsup wrote that Anthropic will still face trial for the pirated copies it used to create its massive central library of books used to train AI. "That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for the theft," Alsup wrote, "but it may affect the extent of statutory damages."
[54]
Anthropic's AI copyright 'win' is more complicated than it looks
Judge William Alsup of the U.S. District Court for the Northern District of California ruled that Anthropic's use of copyrighted material for training was fair use. His decision carries weight. "Authors cannot rightly exclude anyone from using their works for training or learning as such," Alsup wrote. "Everyone reads texts, too, then writes new texts. They may need to pay for getting their hands on a text in the first instance. But to make anyone pay specifically for the use of a book each time they read it, each time they recall it from memory, each time they later draw upon it when writing new things in new ways would be unthinkable." Alsup called training Claude "exceedingly transformative," comparing the model to "any reader aspiring to be a writer." That language helps explain why tech lobbyists were quick to call it a major win. Experts agreed. "It's a pretty big win actually for the future of AI training," says Andres Guadamuz, an intellectual property expert at the University of Sussex who has closely followed AI copyright cases. But he adds: "It could be bad for Anthropic specifically depending on authors winning the piracy issue, but that's still very far away."
[55]
Anthropic Proves in Court Using Copyright Content for AI Training is Fair | AIM
A federal judge in San Francisco ruled that Anthropic's use of copyrighted books to train its AI system constitutes fair use under US copyright law. Judge William Alsup found that Anthropic's AI training was "exceedingly transformative", as the company used the texts to analyse writing, extract uncopyrightable information, and develop new technology, which aligns with the purpose of copyright law to foster creativity. The judge found that training a large language model (LLM) using text is "transformative" because the purpose is not to reproduce or distribute the books, but to enable the model to learn statistical relationships between words and generate new text based on that understanding. The training process was compared to a person reading books to learn how to write, not to copy them verbatim. Moreover, there was no evidence that Anthropic's model reproduced the books or created outputs that would directly substitute the original works. "No output to the public was even alleged to be infringing," the court noted. According to the order, Anthropic "downloaded for free millions of copyrighted books in digital form from pirate sites on the internet", including sources like LibGen and Pirate Library Mirror (PiLiMi). The company also "purchased copyrighted books", removed bindings, scanned them, and added them to a searchable, digital archive intended to retain everything forever. However, the ruling also distinguished between legally acquired books and pirated copies. Alsup ruled that Anthropic's downloading and storing of over seven million pirated books in a "central library" infringed the authors' copyrights and was not protected as fair use. According to him, while copies used for training models may qualify as fair use, almost any unauthorised copying for the central library, especially via pirated sources, would have been too much. The judge ordered a trial scheduled for December to determine damages related to this infringement, which could involve substantial statutory penalties. The case originated from a class action lawsuit filed by authors who alleged that Anthropic used unauthorised copies of their books to train its Claude LLM without permission or compensation. While the court recognised the legality of training AI on lawfully obtained works, it criticised Anthropic for relying on pirated materials, noting that acquiring books through piracy is not reasonably necessary for fair use and undermines copyright protections. The ruling is seen as a significant but complex precedent for the AI industry. It affirms that AI training on copyrighted works can be fair use if done with legally obtained materials, but also signals that companies must avoid piracy to limit legal risks. The decision is expected to influence ongoing and future copyright disputes involving AI, though appeals and further litigation are likely. OpenAI is facing several legal challenges over alleged unauthorised use of copyrighted material to train its LLMs. A key lawsuit in New York involves The Authors Guild and well-known writers such as George RR Martin and Jodi Picoult, who argue that OpenAI used their works without permission, jeopardising their income from original writing. In a separate case, The New York Times has taken legal action against both OpenAI and Microsoft, accusing them of using millions of its articles to develop AI systems that now act as direct competitors in delivering news and information. Meta has also come under legal scrutiny, with authors like Richard Kadrey and Sarah Silverman claiming the company used vast collections of copyrighted content, sourced through torrent sites like LibGen and Sci-Hub, to train its models. Meanwhile, Stability AI is being sued by Getty Images, which alleges that the company copied millions of its photos and metadata to train its AI image tools. The complaint includes claims that the system produced images featuring Getty's watermark. Moreover, Midjourney is facing legal pressure from media giants, including Disney and Universal, for allegedly generating AI-based images that replicate their copyrighted material.
[56]
Meta comes out winner in AI copyright case against authors
A U.S. judge sided with Meta Platforms Inc. today in an AI copyright lawsuit brought by 13 authors who claimed the social media firm had illegally trained its AI systems on their work without permission. This is the second major AI copyright case this week, with both decisions in favor of the AI companies. In the first, which was brought against Anthropic PBC, the judge ruled that training AI systems on written works amounted to "fair use" under U.S. copyright law. Today's case, which was brought by the comedian Sarah Silverman and other notable authors, claimed that Meta had trained its large language models on their copyrighted works. The authors said their books were available through various online libraries, which resulted in the firm plagiarizing their content. They claimed that Meta's practices hurt the book market. Federal Judge Vince Chhabria disagreed, writing in his summary judgment that it "is generally illegal to copy protected works without permission," but in this case, the plaintiffs, he said, hadn't provided a compelling argument that Meta's practices had harmed the book market. "On this record, Meta has defeated the plaintiffs' half-hearted argument that its copying causes or threatens significant market harm," he said. "That conclusion may be in significant tension with reality." Still, Chhabria contended that this doesn't mean that such practices are lawful all the time. "This is not a class action, so the ruling only affects the rights of these thirteen authors -- not the countless others whose works Meta used to train its models," he wrote. "And, as should now be clear, this ruling does not stand for the proposition that Meta's use of copyrighted materials to train its language models is lawful." He also dismissed Meta's contention that not allowing firms to train their AI models on copyrighted works would put a stop to the development of LLMs, which he said was "nonsense." "We appreciate today's decision from the Court," a Meta spokesperson said in a statement. "Open-source AI models are powering transformative innovations, productivity, and creativity for individuals and companies, and fair use of copyright material is a vital legal framework for building this transformative technology."
[57]
AI and copyright - Meta and Anthropic declare Pyhrric victories, but may still lose the war
Two of the copyright lawsuits against US AI companies, Meta and Anthropic, have reached summary judgements, each of which is being presented as a win by the vendors and their online supporters. However, the victories this week are partial, flawed, and Pyrrhic, in that they do not constitute decisive legal endorsements of scraping copyrighted texts to train AI models. Meanwhile, nearly 50 other cases remain ongoing worldwide, many against ChatGPT maker OpenAI in the US. But first, Meta. Yesterday a federal judge ruled that the Facebook, Instagram, and WhatsApp parent had not broken the law when it trained its AI models on the works of 13 authors, including Richard Kadrey, Sarah Silverman, Ta-Nehisi Coates, and Mike Huckabee. However, US District Court Judge Vince Chhabria noted that the judgement solely related to those authors' claims and the details they presented. The plaintiffs had not provided sufficient evidence that Meta's scraping of their work was harmful, he said - financially or in terms of creating automated competitors. As a result, he said that the Court had "no choice" but to grant summary judgement to Meta. Despite this, the Judge was emphatic on a critical point: This ruling does not stand for the proposition that Meta's use of copyrighted materials to train its language models is lawful. It stands only for the proposition that these plaintiffs made the wrong arguments and failed to develop a record in support of the right one. Hardly a resounding victory for Meta, therefore. Indeed, Chhabria's judgement - while disappointing to millions of creatives who saw Kadrey vs Meta as carrying their own hopes to Mark Zuckerberg's door - sounded a warning to AI vendors on the critical question of market dilution and harm, which he suggested could be a "potentially winning argument" in future cases. The mass scraping of copyrighted data could "flood the market with endless amounts of images, songs, articles, books, and more", he noted, adding: People can prompt generative AI models to produce these outputs using a tiny fraction of the time and creativity that would otherwise be required [which would] dramatically undermine the incentive for human beings to create things the old-fashioned way. Let's explore that proposition for a moment. Last year, the scenario-tracking unit of European police organization Europol predicted that most online content will be synthetic by as early as 2026, as so-called 'AI slop' proliferates on every channel. The evidence is around us already. Online art communities are overrun with generative content, while employers complain at being inundated with thousands of similar ChatGPT-authored resumés. AI 'acts' are commonplace on streaming platforms such as Spotify - where the vendor is incentivized by not having to pay creators royalties. Meanwhile, start-up publisher Spines says it aims to release 8,000 AI 'authored' books in the months ahead. And so the list goes on. All of this - as Judge Chhabria noted - may be a dis-incentive to develop human creative skills, and perhaps even to write new books or conduct original research. Consider this: the authors of any future non-fiction text may question what the point would be of spending months researching a topic and writing tens of thousands of words, if that book were then scraped and turned into a revenue stream by a trillion-dollar company. The authors' royalties would ebb away as users turned to AI search, while their research and hard work would neither be credited nor remunerated. (Believe me, I am contracted to write a book by the end of this year, and have had all of those thoughts.) Meanwhile, on social platforms, some users have begun to question whether AI-powered search has effectively broken the internet, by replacing the quest for verifiable data with generated texts of questionable provenance. These may contain hallucinations, errors, synthetic data, or conflations of unrelated subjects - early signs of model collapse. This viewpoint was summed up by an X user called LanguageCrawler: AI using AI as a source of information has made Google search full of errors and therefore useless. Google eliminated the most useful tool (especially for language workers) which was the number of hits for a phrase or a group of words. Instead, the AI search engine just makes up a definition even if the term doesn't really exist. Even so, this week has handed partial victories to AI providers. US District Court Judge William Alsup reached a similar judgement to Chhabria's in a case brought by three authors against Anthropic, noting that the practice of training its Claude model on scraped texts was "exceedingly transformative". In other words, it created something new, rather than a copy. Judge Alsup wrote: Like any reader aspiring to be a writer, Anthropic's LLMs trained upon works, not to race ahead and replicate or supplant them -- but to turn a hard corner and create something different. If this training process reasonably required making copies within the LLM or otherwise, those copies were engaged in a transformative use. That argument was actually dismissed earlier this year by the US Copyright Office, which noted that AI models do not read and learn like humans, but instead ingest at industrial scale. In a coruscating 109-page document, the Office said: Some AI companies asserted that 'there is no copy of the training data -- whether text, images, or other formats -- present in the model itself.' OpenAI characterized contrary arguments as based on 'a common and unfortunate misperception of the technology,' and argued that model weights are just 'large strings of numbers' that reflect 'statistical relationship[s]' among the training tokens. But others pointed to 'numerous examples' of models generating 'verbatim, near identical, or substantially similar outputs,' arguing that they can 'embody the expressive works they were trained on.' News/Media Alliance stated that 'regardless of the exact technical processes employed,' such behavior 'has the same effect as memorization and retention.' Many have seen countless examples of generative AIs producing images that are almost identical to scenes from blockbuster movies - featuring characters from those films. The implication that vendors have scraped proprietary work is surely inescapable? On the question of transformation, the Office noted: On one end of the spectrum, training a model is most transformative when the purpose is to deploy it for research, or in a closed system that constrains it to a non-substitutive task. For example, training a language model on a large collection of data, including social media posts, articles, and books, for deployment in systems used for content moderation does not have the same educational purpose as those papers and books. On the other end of the spectrum is training a model to generate outputs that are substantially similar to copyrighted works in the dataset. For example, a foundation image model might be further trained on images from a popular animated series and deployed to generate images of characters from that series. [Can this be read as a dig at OpenAI supremo Sam Altman for sharing Studio Ghibli-like memes on social platforms?] Unlike cases where copying computer programs to access their functional elements was necessary to create new, interoperable works, using images or sound recordings to train a model that generates similar expressive outputs does not merely remove a technical barrier to productive competition. In such cases, unless the original work itself is being targeted for comment or parody, it is hard to see the use as transformative. However, even Alsup's judgement was not an emphatic win for Anthropic. Had the authors claimed that the AI generated "infringing knockoffs [...] this would be a different case", he wrote, paving the way for lawsuits that focus on this aspect of the market. (Cue authors prompting an AI to produce similar work, then sending the result to their lawyer.) The judge added that there would need to be a separate trial for claims that Anthropic scraped pirated texts to train AI models. On that point, Meta is among the vendors known to have scraped the free or pirate LibGen library, which contains millions of copyrighted books, reports, and research papers - an academic tells me that his entire life's work is in that library. A report earlier this year from Denmark's Rights Alliance set out the ways in which several vendors have used pirated content to train their systems. In the Meta case this week, lawyers for the plaintiffs have signalled their determination to appeal the verdict, partly on that basis. They said: The Court ruled that AI companies that 'feed copyright-protected works into their models without getting permission from the copyright holders or paying for them' are generally violating the law. Yet, despite the undisputed record of Meta's historically unprecedented pirating of copyrighted works, the court ruled in Meta's favor. We respectfully disagree with that conclusion. But as noted above, Judge Chhabria explained that Silverman et al had presented insufficient proof of harm. However, he did not exonerate Meta for its actions, dismissing claims that creator copyright is somehow an impediment to innovation He said of that commonplace position among AI vendors: This is nonsense. Where copying for LLM (Large Langauge Model) training isn't fair use, LLM developers (including Meta) won't need to stop using copyrighted works to train their models. They will only need to pay rightsholders for licences for that training. These products are expected to generate billions, even trillions of dollars for the companies that are developing them. If using copyrighted works to train the models is as necessary as the companies say, they will figure out a way to compensate copyright holders for it. And in a swipe Meta's boss, the Judge noted that the company had abandoned its prior licensing efforts when - "following escalation to CEO Mark Zuckerberg" - it realised that most of the texts available from publishers were contained in the LibGen pirate library, and opted to use those instead. He added that all other members of the proposed class of claimants - the world's authors and rightsholders - were now free to pursue cases on the same claims. So, some battles have been lost, pending appeal. But the war for copyright? It is far from over, let alone won by vendors, despite their cries of victory. Indeed, both judges have signalled the way ahead for other claimants, who may yet win.
[58]
Judge sides with Anthropic in landmark AI copyright case, but orders it to go on trial over piracy claims - SiliconANGLE
Judge sides with Anthropic in landmark AI copyright case, but orders it to go on trial over piracy claims Anthropic PBC scored a major victory for itself and the broader artificial intelligence industry today when a federal judge ruled that it hasn't broken the law by training its chatbot Claude on hundreds of legally-purchased books that were later digitized without the authors' permission. However, the company is still on the hook for millions of pirated copies of books that it downloaded from the internet and used to train its models. U.S. District Judge William Alsup of the Northern District of California said in a ruling today that the way Anthropic's models distill information from thousands of written works and produce their own unique text meets the definition of "fair use" under U.S. copyright law. He justified this because the model's outputs are essentially new. "Like any reader aspiring to be a writer, Anthropic's models trained upon works not to race ahead and replicate or supplant them - but to turn a hard corner and create something different," Alsup wrote in his judgement. But although the judge dismissed one of the claims made in a class action lawsuit by a trio of authors last year, he ordered that Anthropic must stand trial in December for allegedly stealing thousands of copyrighted works. "Anthropic had no entitlement to use pirated copies for its central library," Alsup said. The lawsuit, filed last year by authors Andrea Bartz, Charles Graeber and Kirk Wallace Johnson last summer, alleges that the company's AI model training practices amount to "large-scale theft" of thousands of copyrighted books. It also alleged that the company sought to profit by "strip-mining the human expression and ingenuity behind each of those works." During the case, it was revealed in documents disclosed by Anthropic that a number of its researchers raised concerns over the legality of using online libraries of pirated books. That prompted the company to change its approach and purchase copies of hundreds of digitized works. But the judge said that although the company later purchased many copies of books legally, that doesn't absolve it of the liability for any earlier thefts. However, it "may affect the extent of statutory damages," Alsop added. Today's ruling could set a precedent for dozens of similar lawsuits that have been filed against Anthropic's competitors in the AI industry, including the ChatGPT creator OpenAI, as well as Meta Platforms Inc. and the AI search engine Perplexity AI Inc. Claims of copyright infringement have been piling up against AI companies, with dozens of cases filed by authors, media companies and music labels since 2023, when generative AI burst into the public consciousness. Creators have also signed multiple open letters calling on governments to rein in AI developers and prevent them from using copyrighted works for training their models. The furore has had a limited impact, with some AI companies responding by signing legal agreements with publishers that allow them to access their copyrighted materials. Anthropic, which was founded in 2021 by a number of ex-OpenAI employees, has positioned itself as being more responsible and safety-focused, but the lawsuit filed last year charges that its actions "made a mockery of its lofty goals" due to its practise of training its models on pirated works. In response to today's ruling, Anthropic did not address the piracy claims, but said it was pleased that the judge had recognized AI training is "transformative and consistent with copyright's purpose in enabling creativity and fostering scientific progress."
[59]
Judge dismisses authors' copyright lawsuit against Meta over AI training
A federal judge on Wednesday sided with Facebook parent Meta Platforms in dismissing a copyright infringement lawsuit from a group of authors who accused the company of stealing their works to train its artificial intelligence technology. The ruling from U.S. District Judge Vince Chhabri was the second in a week from San Francisco's federal court to dismiss major copyright claims from book authors against the rapidly developing AI industry. Chhabri found that 13 authors who sued Meta "made the wrong arguments" and tossed the case. But the judge also said that the ruling is limited to the authors in the case and does not mean that Meta's use of copyrighted materials is lawful. Lawyers for the plaintiffs -- a group of well-known writers that includes comedian Sarah Silverman and authors Jacqueline Woodson and Ta-Nehisi Coates -- didn't immediately respond to a request for comment Wednesday. Meta also didn't immediately respond to a request for comment. "This ruling does not stand for the proposition that Meta's use of copyrighted materials to train its language models is lawful," Chhabri wrote. "It stands only for the proposition that these plaintiffs made the wrong arguments and failed to develop a record in support of the right one." On Monday, from the same courthouse, U.S. District Judge William Alsup ruled that AI company Anthropic didn't break the law by training its chatbot Claude on millions of copyrighted books, but the company must still go to trial for illicitly acquiring those books from pirate websites instead of buying them. But the actual process of an AI system distilling from thousands of written works to be able to produce its own passages of text qualified as "fair use" under U.S. copyright law because it was "quintessentially transformative," Alsup wrote. Chhabria, in his Meta ruling, criticized Alsup's reasoning on the Anthropic case, arguing that "Alsup focused heavily on the transformative nature of generative AI while brushing aside concerns about the harm it can inflict on the market for the works it gets trained on." Chhabria suggested that a case for such harm can be made. In the Meta case, the authors had argued in court filings that Meta is "liable for massive copyright infringement" by taking their books from online repositories of pirated works and feeding them into Meta's flagship generative AI system Llama. Lengthy and distinctively written passages of text -- such as those found in books -- are highly useful for teaching generative AI chatbots the patterns of human language. "Meta could and should have paid" to buy and license those literary works, the authors' attorneys argued. Meta countered in court filings that U.S. copyright law "allows the unauthorized copying of a work to transform it into something new" and that the new, AI-generated expression that comes out of its chatbots is fundamentally different from the books it was trained on. "After nearly two years of litigation, there still is no evidence that anyone has ever used Llama as a substitute for reading Plaintiffs' books, or that they even could," Meta's attorneys argued. Meta says Llama won't output the actual works it has copied, even when asked to do so. "No one can use Llama to read Sarah Silverman's description of her childhood, or Junot Diaz's story of a Dominican boy growing up in New Jersey," its attorneys wrote. Accused of pulling those books from online "shadow libraries," Meta has also argued that the methods it used have "no bearing on the nature and purpose of its use" and it would have been the same result if the company instead struck a deal with real libraries. Such deals are how Google built its online Google Books repository of more than 20 million books, though it also fought a decade of legal challenges before the U.S. Supreme Court in 2016 let stand lower court rulings that rejected copyright infringement claims. The authors' case against Meta forced CEO Mark Zuckerberg to be deposed, and has disclosed internal conversations at the company over the ethics of tapping into pirated databases that have long attracted scrutiny. "Authorities regularly shut down their domains and even prosecute the perpetrators," the authors' attorneys argued in a court filing. "That Meta knew taking copyrighted works from pirated databases could expose the company to enormous risk is beyond dispute: it triggered an escalation to Mark Zuckerberg and other Meta executives for approval. Their gamble should not pay off." "Whatever the merits of generative artificial intelligence, or GenAI, stealing copyrighted works off the Internet for one's own benefit has always been unlawful," they argued. The named plaintiffs are Jacqueline Woodson, Richard Kadrey, Andrew Sean Greer, Rachel Louise Snyder, David Henry Hwang, Ta-Nehisi Coates, Laura Lippman, Matthew Klam, Junot Diaz, Sarah Silverman, Lysa TerKeurst, Christopher Golden and Christopher Farnsworth. Most of the plaintiffs had asked Chhabria to rule now, rather than wait for a jury trial, on the basic claim of whether Meta infringed on their copyrights. Two of the plaintiffs, Ta-Nehisi Coates and Christopher Golden, did not seek such summary judgment. Chhabri said in the ruling that while he had "no choice" but to grant Meta's summary judgment tossing the case, "in the grand scheme of things, the consequences of this ruling are limited. This is not a class action, so the ruling only affects the rights of these 13 authors -- not the countless others whose works Meta used to train its models."
[60]
Anthropic wins ruling on AI training in copyright lawsuit but must face trial on pirated books
In a test case for the artificial intelligence industry, a federal judge has ruled that AI company Anthropic didn't break the law by training its chatbot Claude on millions of copyrighted books. But the company is still on the hook and must now go to trial over how it acquired those books by downloading them from online "shadow libraries" of pirated copies. U.S. District Judge William Alsup of San Francisco said in a ruling filed late Monday that the AI system's distilling from thousands of written works to be able to produce its own passages of text qualified as "fair use" under U.S. copyright law because it was "quintessentially transformative." "Like any reader aspiring to be a writer, Anthropic's (AI large language models) trained upon works not to race ahead and replicate or supplant them -- but to turn a hard corner and create something different," Alsup wrote. But while dismissing a key claim made by the group of authors who sued the company for copyright infringement last year, Alsup also said Anthropic must still go to trial in December over its alleged theft of their works. "Anthropic had no entitlement to use pirated copies for its central library," Alsup wrote. A trio of writers -- Andrea Bartz, Charles Graeber and Kirk Wallace Johnson -- alleged in their lawsuit last summer that Anthropic's practices amounted to "large-scale theft," and that the company "seeks to profit from strip-mining the human expression and ingenuity behind each one of those works." As the case proceeded over the past year in San Francisco's federal court, documents disclosed in court showed Anthropic's internal concerns about the legality of their use of online repositories of pirated works. So the company later shifted its approach and attempted to purchase copies of digitized books. "That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for the theft but it may affect the extent of statutory damages," Alsup wrote. The ruling could set a precedent for similar lawsuits that have piled up against Anthropic competitor OpenAI, maker of ChatGPT, as well as against Meta Platforms, the parent company of Facebook and Instagram. Anthropic -- founded by ex-OpenAI leaders in 2021 -- has marketed itself as the more responsible and safety-focused developer of generative AI models that can compose emails, summarize documents and interact with people in a natural way. But the lawsuit filed last year alleged that Anthropic's actions "have made a mockery of its lofty goals" by tapping into repositories of pirated writings to build its AI product. Anthropic said Tuesday it was pleased that the judge recognized that AI training was transformative and consistent with "copyright's purpose in enabling creativity and fostering scientific progress." Its statement didn't address the piracy claims.
[61]
Anthropic wins ruling on AI training in copyright lawsuit but must face trial on pirated books
In a test case for the artificial intelligence industry, a federal judge has ruled that AI company Anthropic didn't break the law by training its chatbot Claude on millions of copyrighted books. But the company is still on the hook and must now go to trial over how it acquired those books by downloading them from online "shadow libraries" of pirated copies. U.S. District Judge William Alsup of San Francisco said in a ruling filed late Monday that the AI system's distilling from thousands of written works to be able to produce its own passages of text qualified as "fair use" under U.S. copyright law because it was "quintessentially transformative." "Like any reader aspiring to be a writer, Anthropic's (AI large language models) trained upon works not to race ahead and replicate or supplant them -- but to turn a hard corner and create something different," Alsup wrote. But while dismissing a key claim made by the group of authors who sued the company for copyright infringement last year, Alsup also said Anthropic must still go to trial in December over its alleged theft of their works. "Anthropic had no entitlement to use pirated copies for its central library," Alsup wrote. A trio of writers -- Andrea Bartz, Charles Graeber and Kirk Wallace Johnson -- alleged in their lawsuit last summer that Anthropic's practices amounted to "large-scale theft," and that the San Francisco-based company "seeks to profit from strip-mining the human expression and ingenuity behind each one of those works." Books are known to be important sources of the data -- in essence, billions of words carefully strung together -- that are needed to build large language models. In the race to outdo each other in developing the most advanced AI chatbots, a number of tech companies have turned to online repositories of stolen books that they can get for free. Documents disclosed in San Francisco's federal court showed Anthropic employees' internal concerns about the legality of their use of pirate sites. The company later shifted its approach and hired Tom Turvey, the former Google executive in charge of Google Books, a searchable library of digitized books that successfully weathered years of copyright battles. With his help, Anthropic began buying books in bulk, tearing off the bindings and scanning each page before feeding the digitized versions into its AI model, according to court documents. But that didn't undo the earlier piracy, according to the judge. "That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for the theft but it may affect the extent of statutory damages," Alsup wrote. The ruling could set a precedent for similar lawsuits that have piled up against Anthropic competitor OpenAI, maker of ChatGPT, as well as against Meta Platforms, the parent company of Facebook and Instagram. Anthropic -- founded by ex-OpenAI leaders in 2021 -- has marketed itself as the more responsible and safety-focused developer of generative AI models that can compose emails, summarize documents and interact with people in a natural way. But the lawsuit filed last year alleged that Anthropic's actions "have made a mockery of its lofty goals" by building its AI product on pirated writings. Anthropic said Tuesday it was pleased that the judge recognized that AI training was transformative and consistent with "copyright's purpose in enabling creativity and fostering scientific progress." Its statement didn't address the piracy claims.
[62]
Judge dismisses authors' copyright lawsuit against Meta over AI training
A federal judge on Wednesday sided with Facebook parent Meta Platforms in dismissing a copyright infringement lawsuit from a group of authors who accused the company of stealing their works to train its artificial intelligence technology. The ruling from U.S. District Judge Vince Chhabri was the second in a week from San Francisco's federal court to dismiss major copyright claims from book authors against the rapidly developing AI industry. Chhabri found that 13 authors who sued Meta "made the wrong arguments" and tossed the case. But the judge also said that the ruling is limited to the authors in the case and does not mean that Meta's use of copyrighted materials is lawful. Lawyers for the plaintiffs -- a group of well-known writers that includes comedian Sarah Silverman and authors Jacqueline Woodson and Ta-Nehisi Coates -- didn't immediately respond to a request for comment Wednesday. Meta also didn't immediately respond to a request for comment. "This ruling does not stand for the proposition that Meta's use of copyrighted materials to train its language models is lawful," Chhabri wrote. "It stands only for the proposition that these plaintiffs made the wrong arguments and failed to develop a record in support of the right one." On Monday, from the same courthouse, U.S. District Judge William Alsup ruled that AI company Anthropic didn't break the law by training its chatbot Claude on millions of copyrighted books, but the company must still go to trial for illicitly acquiring those books from pirate websites instead of buying them. But the actual process of an AI system distilling from thousands of written works to be able to produce its own passages of text qualified as "fair use" under U.S. copyright law because it was "quintessentially transformative," Alsup wrote. Chhabria, in his Meta ruling, criticized Alsup's reasoning on the Anthropic case, arguing that "Alsup focused heavily on the transformative nature of generative AI while brushing aside concerns about the harm it can inflict on the market for the works it gets trained on." Chhabria suggested that a case for such harm can be made. In the Meta case, the authors had argued in court filings that Meta is "liable for massive copyright infringement" by taking their books from online repositories of pirated works and feeding them into Meta's flagship generative AI system Llama. Lengthy and distinctively written passages of text -- such as those found in books -- are highly useful for teaching generative AI chatbots the patterns of human language. "Meta could and should have paid" to buy and license those literary works, the authors' attorneys argued. Meta countered in court filings that U.S. copyright law "allows the unauthorized copying of a work to transform it into something new" and that the new, AI-generated expression that comes out of its chatbots is fundamentally different from the books it was trained on. "After nearly two years of litigation, there still is no evidence that anyone has ever used Llama as a substitute for reading Plaintiffs' books, or that they even could," Meta's attorneys argued. Meta says Llama won't output the actual works it has copied, even when asked to do so. "No one can use Llama to read Sarah Silverman's description of her childhood, or Junot Diaz's story of a Dominican boy growing up in New Jersey," its attorneys wrote. Accused of pulling those books from online "shadow libraries," Meta has also argued that the methods it used have "no bearing on the nature and purpose of its use" and it would have been the same result if the company instead struck a deal with real libraries. Such deals are how Google built its online Google Books repository of more than 20 million books, though it also fought a decade of legal challenges before the U.S. Supreme Court in 2016 let stand lower court rulings that rejected copyright infringement claims. The authors' case against Meta forced CEO Mark Zuckerberg to be deposed, and has disclosed internal conversations at the company over the ethics of tapping into pirated databases that have long attracted scrutiny. "Authorities regularly shut down their domains and even prosecute the perpetrators," the authors' attorneys argued in a court filing. "That Meta knew taking copyrighted works from pirated databases could expose the company to enormous risk is beyond dispute: it triggered an escalation to Mark Zuckerberg and other Meta executives for approval. Their gamble should not pay off." "Whatever the merits of generative artificial intelligence, or GenAI, stealing copyrighted works off the Internet for one's own benefit has always been unlawful," they argued. The named plaintiffs are Jacqueline Woodson, Richard Kadrey, Andrew Sean Greer, Rachel Louise Snyder, David Henry Hwang, Ta-Nehisi Coates, Laura Lippman, Matthew Klam, Junot Diaz, Sarah Silverman, Lysa TerKeurst, Christopher Golden and Christopher Farnsworth. Most of the plaintiffs had asked Chhabria to rule now, rather than wait for a jury trial, on the basic claim of whether Meta infringed on their copyrights. Two of the plaintiffs, Ta-Nehisi Coates and Christopher Golden, did not seek such summary judgment. Chhabri said in the ruling that while he had "no choice" but to grant Meta's summary judgment tossing the case, "in the grand scheme of things, the consequences of this ruling are limited. This is not a class action, so the ruling only affects the rights of these 13 authors -- not the countless others whose works Meta used to train its models."
[63]
Meta wins AI copyright fight with authors
A federal judge in California ruled in favor of Meta regarding a lawsuit initiated by 13 book authors, including Sarah Silverman, concerning the alleged unauthorized use of their copyrighted works for training artificial intelligence models. Federal Judge Vince Chhabria issued a summary judgment, which allowed for a judicial decision without a jury, determining that Meta's AI model training, in this specific instance, conformed to the "fair use" doctrine of copyright law, thereby deeming it lawful. This decision follows a recent ruling where a federal judge similarly sided with Anthropic in a comparable lawsuit. These judicial outcomes are observed as favorable to the technology sector, which has engaged in ongoing legal disputes with media entities, asserting that training AI models on copyrighted materials constitutes fair use under existing legal frameworks. Anthropic finds: AI models chose blackmail to survive Judge Chhabria clarified that his decision does not universally legalize all instances of AI model training on copyrighted materials. He stated that the plaintiffs in the Meta case "made the wrong arguments" and did not provide sufficient evidence to support their claims. The judge remarked, "This ruling does not stand for the proposition that Meta's use of copyrighted materials to train its language models is lawful." He further elaborated, "In cases involving uses like Meta's, it seems like the plaintiffs will often win, at least where those cases have better-developed records on the market effects of the defendant's use." The court found Meta's use of the copyrighted works to be "transformative," indicating that the company's AI models did not merely replicate the authors' original books. Additionally, the plaintiffs failed to demonstrate that Meta's copying of the books caused harm to the market for those authors' works, a crucial element in assessing copyright infringement. Judge Chhabria noted, "The plaintiffs presented no meaningful evidence on market dilution at all." Judge Chhabria emphasized that fair use defenses are highly dependent on the specific facts of each case, suggesting that certain industries might possess stronger fair use arguments than others. He indicated that "markets for certain types of works (like news articles) might be even more vulnerable to indirect competition from AI outputs."
[64]
Big Tech Wants to Take Your Work to Feed Its Bots. These Lawsuits Might Let Them.
Are you sure you want to unsubscribe from email alerts for Nitish Pahwa? Sign up for the Slatest to get the most insightful analysis, criticism, and advice out there, delivered to your inbox daily. Last week, two different federal judges in the Northern District of California made legal rulings that attempt to resolve one of the knottiest debates in the artificial intelligence world: whether it's a copyright violation for Big Tech firms to use published books for training generative bots like ChatGPT. Unfortunately for the many authors who've brought lawsuits with this argument, neither decision favors their case -- at least, not for now. And that means creators in all fields may not be able to stop A.I. companies from using their work however they please. On Tuesday, a U.S. district judge ruled that Amazon-backed startup Anthropic did not violate copyright law when it used the works of three authors to train the company's flagship chatbot, Claude. In Bartz v. Anthropic, writers Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson claimed that the A.I. firm had infringed upon their copyright protections when multiple books of theirs had not only been used to train Claude, but had been pirated illegally for said purpose. Anthropic's counter was that all of its practices -- from the training itself to the utilization of books its engineers had alternatively pirated and purchased for Claude training -- constituted an instance of fair use and were perfectly legal. U.S. District Judge William Alsup agreed in part with Anthropic, ruling that the training itself did not violate copyright law, but that the piracy certainly did. A new trial is set to rule on the damages from Anthropic's downloads of ill-gotten books. The second judgment landed just a day later, concerning a case that prominent authors like Sarah Silverman, Ta-Nehisi Coates, and Richard Kadrey had brought against Meta on similar grounds, albeit more limited in scope. They merely argued, in a bid for summary judgment in Kadrey v. Meta, that automatic A.I. training with copyrighted works undercuts their ability to negotiate other deals, and that Meta's Llama sets are "capable of reproducing small snippets of text from their books." Judge Vince Chhabria sided with Meta but appeared to do so regretfully, stating that Meta's use of the writers' work to train its bots isn't necessarily legal but that the plaintiffs "made the wrong arguments." Chhabria went even further, adding, "it's hard to imagine that it can be fair use to use copyrighted books to develop a tool to make billions or trillions of dollars while enabling the creation of a potentially endless stream of competing works." However, instead of presenting that "potentially winning argument," the authors had approached the bench with "clear losers." Meta's A.I. did not reproduce enough text from the authors' books to constitute any sort of plagiarism or piracy. The judge likewise decreed that these writers "are not entitled to the market for licensing their works as AI training data." But ultimately, unlike Alsup, Chhabria did appear to leave an open legal pathway for the authors to relitigate their case on different grounds if they so wish. While we don't know for sure just yet, it seems likely that the authors will try again. After all, Kadrey, Silverman, and Coates have also been fighting a case against OpenAI, alleging direct copyright infringement and unfair competition in OpenAI's use of their books to train bots like ChatGPT. Their complaint has already gotten some results, having forced OpenAI to reveal details of its closely guarded training data in court. They may also be encouraged by a significant A.I.-copyright decision from February, when the global media-and-tech organization Thomson Reuters won a fair use lawsuit against an A.I. startup whose large language models ingested and reproduced Thomson Reuters' legal texts. Those authors are not the only creatives arguing that the fruits of their labor should not be easy fuel for advanced syntax-prediction machines. Over the past few months, several high- and low-profile court cases have been pitting everyone from authors to powerful media companies to music publishers and eminent news organizations against burgeoning and well-funded A.I. outfits -- to varying results. All these legal battles are existential for publishers and writers at a moment when A.I. is already upending long-embattled creative sectors -- and when tech executives are doubling down on their insistence that copyright limits should be brushed aside in the arms race for A.I. supremacy. Should they win out, there may be nothing stopping A.I. companies from devastating the industries that allow for humans to exercise their creative expression, and filling the void with knockoff machine generations instead. In OpenAI CEO Sam Altman's case, he's even leveraging his newly chummy relationship with President Donald Trump in the hopes that the federal government will unilaterally declare all A.I. training methods to be permissible as fair use. The Trump administration has already made some favorable moves on behalf of Altman and Co. this month, with DOGE having sacked the head of the Library of Congress and the head of its U.S. Copyright Office, right as the agency was set to publish a report recommending A.I.-training copyright standards that would be more favorable to authors. (The office currently has no one at the helm.) There's also the fear that Congress will attempt to nullify all state-level A.I. regulations via federal legislation, which would mean that the few laws that are in place to protect creators from A.I. -- like Tennessee's bill against unauthorized deepfakes of notable performers -- may soon be all but crushed. All of which is to say, there is a lot that's going to be legally murky about A.I. and copyright for a while yet. Judges are going to have to assess the copyright implications of a wide range of media -- not just text, but printed text as compared with digital text, along with illustration, video, and music. On top of that, all these federal court rulings are likely to be appealed by either party no matter the result, making it all but inevitable that appellate courts and even the Supreme Court will chime in. (The recent SCOTUS ruling on Trump's birthright citizenship executive order will make it impossible for the lower courts to effectively pause any A.I.-copyright executive actions from the White House.) However, there are certain indications from the rulings in last week's Anthropic and Meta cases that offer us a hint as to where the judicial system may ultimately land on the fair use issue. In the Anthropic case, one reasoning Judge Alsup gave was that the trained data sets and A.I. models that power Claude have sufficient anti-plagiarism filters, which meant that "Claude created no exact copy, nor any substantial knock-off." (Alsup did allow that "if the outputs were ever to become infringing, Authors could bring such a case.") What's more, Anthropic stashed all the books used for training in a permanent internal set -- but never handed out those books to others or made them inappropriately public in any way. This "central library" did not apply to fair use, but it did not infringe upon copyright as long as the books were purchased properly. (This is why the judge plans to bring Anthropic to trial over the books it had stolen.) For the Meta case, Judge Chhabria did not seem to agree with all of Alsup's points -- especially the contention that purchasing books for training indicates sufficient compensation -- and he all but wrote a guidebook for his plaintiffs to try again later. Specifically, he stated that the authors should come back with arguments that Meta's chatbots do produce output that's strikingly similar to their works, that A.I.'s ability to do so at a rapid pace at scale cuts into the market for their books (especially when it comes to nonfiction and newer fiction), and that Meta's A.I. achieves all this through the utilization of pirated book copies for training (a fact that was uncovered during this very trial). "In many circumstances it will be illegal to copy copyright-protected works to train generative AI models without permission," Chhabria declared. "Which means that the companies, to avoid liability for copyright infringement, will generally need to pay copyright holders for the right to use their materials." Both judges seem to align on a couple of key points. One, A.I. generations that significantly resemble samples from their training data are not protected by fair use, but filters that prevent chatbots from copying their sources are kosher. "The Anthropic LLM implements filters so that if you have a user who asks for basically an entire work, the LLM is not going to give them that," said Ray Seilie, an entertainment and tech lawyer who serves as counsel for the law firm KHIKS. Second, A.I. firms cannot shortcut the training process through piracy of intellectual property. Where they diverge on the second point is how much training itself violates copyright law. We're likely to see more such disputes on that contention as other cases make their way through the courts. But for creators worried about how A.I. has appropriated their work, these rulings have offered a strategy. In Disney's new lawsuit against the image generator Midjourney and the big three record labels' lawsuits against A.I.-music tech, the plaintiffs specifically attack the respective startups for generating images/songs that easily resemble the copyright works used for training (e.g., Midjourney spitting out a Donald Duck replication, or the app Suno mimicking Bruce Springsteen's voice). Authors litigating with OpenAI and other text generators can point to how A.I.-generated books have taken over various Amazon bestseller lists, and how in many cases those charted "books" appear as outright clones of original works. These writers and journalists can also leverage arguments that A.I. companies tried to hasten training by mass piracy, that these generative tools are capable of replicating their work at scale and with speed, and that any stowed copyright training material that was leaked in a cyberattack or shared without permission does not fit within the bounds of fair use. What if these copyright battles are also lost? Then there will be little in the way of stopping A.I. startups from utilizing all creative works for their own purposes, with no consideration as to the artists and writers who actually put in the work. And we will have a world blessed less with human creativity than one overrun by second-rate slop that crushes the careers of the people whose imaginations made that A.I. so potent to begin with.
[65]
Anthropic Wins Key Ruling on AI in Authors' Copyright Lawsuit
A federal judge in San Francisco ruled late on Monday that Anthropic's use of books without permission to train its artificial intelligence system was legal under U.S. copyright law. Siding with tech companies on a pivotal question for the AI industry, U.S. District Judge William Alsup said Anthropic made "fair use" of books by writers Andrea Bartz, Charles Graeber and Kirk Wallace Johnson to train its Claude large language model. Alsup also said, however, that Anthropic's copying and storage of more than 7 million pirated books in a "central library" infringed the authors' copyrights and was not fair use. The judge has ordered a trial in December to determine how much Anthropic owes for the infringement. U.S. copyright law says that willful copyright infringement can justify statutory damages of up to $150,000 per work.
[66]
Judge Dismisses Authors' Copyright Lawsuit Against Meta Over AI Training
A federal judge on Wednesday sided with Facebook parent Meta Platforms in dismissing a copyright infringement lawsuit from a group of authors who accused the company of stealing their works to train its artificial intelligence technology. The ruling from U.S. District Judge Vince Chhabri was the second in a week from San Francisco's federal court to dismiss major copyright claims from book authors against the rapidly developing AI industry. Chhabri found that 13 authors who sued Meta "made the wrong arguments" and tossed the case. But the judge also said that the ruling is limited to the authors in the case and does not mean that Meta's use of copyrighted materials is lawful. Lawyers for the plaintiffs -- a group of well-known writers that includes comedian Sarah Silverman and authors Jacqueline Woodson and Ta-Nehisi Coates -- didn't immediately respond to a request for comment Wednesday. Meta also didn't immediately respond to a request for comment. "This ruling does not stand for the proposition that Meta's use of copyrighted materials to train its language models is lawful," Chhabri wrote. "It stands only for the proposition that these plaintiffs made the wrong arguments and failed to develop a record in support of the right one." On Monday, from the same courthouse, U.S. District Judge William Alsup ruled that AI company Anthropic didn't break the law by training its chatbot Claude on millions of copyrighted books, but the company must still go to trial for illicitly acquiring those books from pirate websites instead of buying them. But the actual process of an AI system distilling from thousands of written works to be able to produce its own passages of text qualified as "fair use" under U.S. copyright law because it was "quintessentially transformative," Alsup wrote. Chhabria, in his Meta ruling, criticized Alsup's reasoning on the Anthropic case, arguing that "Alsup focused heavily on the transformative nature of generative AI while brushing aside concerns about the harm it can inflict on the market for the works it gets trained on." Chhabria suggested that a case for such harm can be made. In the Meta case, the authors had argued in court filings that Meta is "liable for massive copyright infringement" by taking their books from online repositories of pirated works and feeding them into Meta's flagship generative AI system Llama. Lengthy and distinctively written passages of text -- such as those found in books -- are highly useful for teaching generative AI chatbots the patterns of human language. "Meta could and should have paid" to buy and license those literary works, the authors' attorneys argued. Meta countered in court filings that U.S. copyright law "allows the unauthorized copying of a work to transform it into something new" and that the new, AI-generated expression that comes out of its chatbots is fundamentally different from the books it was trained on. "After nearly two years of litigation, there still is no evidence that anyone has ever used Llama as a substitute for reading Plaintiffs' books, or that they even could," Meta's attorneys argued. Meta says Llama won't output the actual works it has copied, even when asked to do so. "No one can use Llama to read Sarah Silverman's description of her childhood, or Junot Diaz's story of a Dominican boy growing up in New Jersey," its attorneys wrote. Accused of pulling those books from online "shadow libraries," Meta has also argued that the methods it used have "no bearing on the nature and purpose of its use" and it would have been the same result if the company instead struck a deal with real libraries. Such deals are how Google built its online Google Books repository of more than 20 million books, though it also fought a decade of legal challenges before the U.S. Supreme Court in 2016 let stand lower court rulings that rejected copyright infringement claims. The authors' case against Meta forced CEO Mark Zuckerberg to be deposed, and has disclosed internal conversations at the company over the ethics of tapping into pirated databases that have long attracted scrutiny. "Authorities regularly shut down their domains and even prosecute the perpetrators," the authors' attorneys argued in a court filing. "That Meta knew taking copyrighted works from pirated databases could expose the company to enormous risk is beyond dispute: it triggered an escalation to Mark Zuckerberg and other Meta executives for approval. Their gamble should not pay off." "Whatever the merits of generative artificial intelligence, or GenAI, stealing copyrighted works off the Internet for one's own benefit has always been unlawful," they argued. The named plaintiffs are Jacqueline Woodson, Richard Kadrey, Andrew Sean Greer, Rachel Louise Snyder, David Henry Hwang, Ta-Nehisi Coates, Laura Lippman, Matthew Klam, Junot Diaz, Sarah Silverman, Lysa TerKeurst, Christopher Golden and Christopher Farnsworth. Most of the plaintiffs had asked Chhabria to rule now, rather than wait for a jury trial, on the basic claim of whether Meta infringed on their copyrights. Two of the plaintiffs, Ta-Nehisi Coates and Christopher Golden, did not seek such summary judgment. Chhabri said in the ruling that while he had "no choice" but to grant Meta's summary judgment tossing the case, "in the grand scheme of things, the consequences of this ruling are limited. This is not a class action, so the ruling only affects the rights of these 13 authors -- not the countless others whose works Meta used to train its models." Copyright 2025 The Associated Press. All rights reserved. This material may not be published, broadcast, rewritten or redistributed.
[67]
Anthropic wins ruling on AI training in copyright lawsuit but must face trial on pirated books
In a test case for the artificial intelligence industry, a federal judge has ruled that AI company Anthropic didn't break the law by training its chatbot Claude on millions of copyrighted books. But the company is still on the hook and must now go to trial over how it acquired those books by downloading them from online "shadow libraries" of pirated copies. U.S. District Judge William Alsup of San Francisco said in a ruling filed late Monday that the AI system's distilling from thousands of written works to be able to produce its own passages of text qualified as "fair use" under U.S. copyright law because it was "quintessentially transformative." "Like any reader aspiring to be a writer, Anthropic's (AI large language models) trained upon works not to race ahead and replicate or supplant them -- but to turn a hard corner and create something different," Alsup wrote. But while dismissing a key claim made by the group of authors who sued the company for copyright infringement last year, Alsup also said Anthropic must still go to trial in December over its alleged theft of their works. "Anthropic had no entitlement to use pirated copies for its central library," Alsup wrote. A trio of writers -- Andrea Bartz, Charles Graeber and Kirk Wallace Johnson -- alleged in their lawsuit last summer that Anthropic's practices amounted to "large-scale theft," and that the company "seeks to profit from strip-mining the human expression and ingenuity behind each one of those works." As the case proceeded over the past year in San Francisco's federal court, documents disclosed in court showed Anthropic's internal concerns about the legality of their use of online repositories of pirated works. So the company later shifted its approach and attempted to purchase copies of digitized books. "That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for the theft but it may affect the extent of statutory damages," Alsup wrote. The ruling could set a precedent for similar lawsuits that have piled up against Anthropic competitor OpenAI, maker of ChatGPT, as well as against Meta Platforms, the parent company of Facebook and Instagram. Anthropic -- founded by ex-OpenAI leaders in 2021 -- has marketed itself as the more responsible and safety-focused developer of generative AI models that can compose emails, summarize documents and interact with people in a natural way. But the lawsuit filed last year alleged that Anthropic's actions "have made a mockery of its lofty goals" by tapping into repositories of pirated writings to build its AI product. Anthropic said Tuesday it was pleased that the judge recognized that AI training was transformative and consistent with "copyright's purpose in enabling creativity and fostering scientific progress." Its statement didn't address the piracy claims.
[68]
Anthropic Wins Key US Ruling on AI Training in Authors' Copyright Lawsuit
The proposed class action is one of several lawsuits brought by authors A federal judge in San Francisco ruled late on Monday that Anthropic's use of books without permission to train its artificial intelligence system was legal under US copyright law. Siding with tech companies on a pivotal question for the AI industry, US District Judge William Alsup said Anthropic made "fair use" of books by writers Andrea Bartz, Charles Graeber and Kirk Wallace Johnson to train its Claude large language model. Alsup also said, however, that Anthropic's copying and storage of more than seven million pirated books in a "central library" infringed the authors' copyrights and was not fair use. The judge has ordered a trial in December to determine how much Anthropic owes for the infringement. US copyright law says that willful copyright infringement can justify statutory damages of up to $150,000 (roughly Rs. 1.28 crore) per work. An Anthropic spokesperson said the company was pleased that the court recognized its AI training was "transformative" and "consistent with copyright's purpose in enabling creativity and fostering scientific progress." The writers filed the proposed class action against Anthropic last year, arguing that the company, which is backed by Amazon and Alphabet, used pirated versions of their books without permission or compensation to teach Claude to respond to human prompts. The proposed class action is one of several lawsuits brought by authors, news outlets and other copyright owners against companies including OpenAI, Microsoft, and Meta Platforms over their AI training. The doctrine of fair use allows the use of copyrighted works without the copyright owner's permission in some circumstances. Fair use is a key legal defense for the tech companies, and Alsup's decision is the first to address it in the context of generative AI. AI companies argue their systems make fair use of copyrighted material to create new, transformative content, and that being forced to pay copyright holders for their work could hamstring the burgeoning AI industry. Anthropic told the court that it made fair use of the books and that US copyright law "not only allows, but encourages" its AI training because it promotes human creativity. The company said its system copied the books to "study Plaintiffs' writing, extract uncopyrightable information from it, and use what it learned to create revolutionary technology." Copyright owners say that AI companies are unlawfully copying their work to generate competing content that threatens their livelihoods. Alsup agreed with Anthropic on Monday that its training was "exceedingly transformative." "Like any reader aspiring to be a writer, Anthropic's LLMs trained upon works not to race ahead and replicate or supplant them -- but to turn a hard corner and create something different," Alsup said. Alsup also said, however, that Anthropic violated the authors' rights by saving pirated copies of their books as part of a "central library of all the books in the world" that would not necessarily be used for AI training. Anthropic and other prominent AI companies including OpenAI and Meta Platforms have been accused of downloading pirated digital copies of millions of books to train their systems. Anthropic had told Alsup in a court filing that the source of its books was irrelevant to fair use. "This order doubts that any accused infringer could ever meet its burden of explaining why downloading source copies from pirate sites that it could have purchased or otherwise accessed lawfully was itself reasonably necessary to any subsequent fair use," Alsup said on Monday. © Thomson Reuters 2025
[69]
Judge Rules in Favor of Meta, Against Authors Sarah Silverman and Ta-Nehisi Coates, in AI Copyright Case
The judge said the authors failed to show "significant market harm" due to Meta training its AI models on their copyrighted works. A district judge sided with tech giant Meta on Wednesday in a major copyright infringement case, Richard Kadrey, et al. v. Meta Platforms Inc. It marks the second time this week that tech companies have scored major legal victories over AI copyright disputes against individuals. In the case, 13 authors, including Richard Kadrey, Sarah Silverman, Junot Diaz, and Ta-Nehisi Coates, argued that Meta violated copyright laws by training its AI models on their copyrighted works without their permission. They provided exhibits showing that Meta's Llama AI model could thoroughly summarize their books when prompted to do so, indicating that the AI had ingested their work in training. The case was filed in July 2023. During the discovery phase, it was uncovered that Meta had used 7.5 million pirated books and 81 million research papers to train its AI model. On Wednesday, U.S. District Judge Vince Chhabria of San Francisco ruled in a 40-page decision that Meta's use of books to train its AI model was protected under the fair use doctrine in U.S. copyright law. The fair use doctrine permits the use of copyrighted material without obtaining permission from the copyright holder in certain cases. What qualifies as fair use depends on factors like how different the end work is from the original and whether the use harms the existing or future market for the copyrighted work. Related: 'Bottomless Pit of Plagiarism': Disney, Universal File the First Major Hollywood Lawsuit Against an AI Startup Chhabria said that while it "is generally illegal to copy protected works without permission," the plaintiffs failed in this case to show that Meta's use of copyrighted material caused "market harm." They didn't show, for instance, that Meta's AI spits out excerpts of books verbatim, creates AI copycat books, or prevents the authors from getting AI licensing deals. "Meta has defeated the plaintiffs' half-hearted argument that its copying causes or threatens significant market harm," Chhabria stated in the ruling. Furthermore, Meta's purpose of copying books "for a transformative purpose" is protected under the fair use doctrine, the judge ruled. Earlier this week, a different judge came to the same conclusion in the class action case Bartz v. Anthropic. U.S. District Judge William Alsup of San Francisco stated in a ruling filed on Monday that $61.5 billion AI startup Anthropic was allowed to train its AI model on copyrighted books under the fair use doctrine because the end product was "exceedingly transformative." Related: 'Extraordinarily Expensive': Getty Images Is Pouring Millions of Dollars Into One AI Lawsuit, CEO Says Anthropic trained its AI on books not to duplicate them or replace them, but to "create something different" in the form of AI answers, Alsup wrote. The ruling marked the first time that a federal judge has sided with tech companies over creatives in an AI copyright lawsuit. Now Chhabria's decision marks the second time that tech companies have triumphed in court against individuals in copyright cases. The judge noted that the ruling does not mean that "Meta's use of copyrighted materials to train its language models is lawful," but only means that "these plaintiffs made the wrong arguments" and that Meta's arguments won in this case. "We appreciate today's decision from the Court," a Meta spokesperson said in a statement on Wednesday, per CNBC. "Open-source AI models are powering transformative innovations, productivity and creativity for individuals and companies, and fair use of copyright material is a vital legal framework for building this transformative technology." Other AI copyright cases are making their way through the courts, including one filed by authors Kai Bird, Jia Tolentino, Daniel Okrent, and several others against Microsoft earlier this week. The lawsuit, filed in New York federal court on Tuesday, alleges that Microsoft violated copyright by training AI on the authors' work.
[70]
Federal Judge Rules It's Legal to Train AI on Copyrighted Books, Marking Major Win for AI Companies
A federal judge ruled for the first time that it was legal for $61.5 billion AI startup, Anthropic, to train its AI model on copyrighted books without compensating or crediting the authors. U.S. District Judge William Alsup of San Francisco stated in a ruling filed on Monday that Anthropic's use of copyrighted, published books to train its AI model was "fair use" under U.S. copyright law because it was "exceedingly transformative." Alsup compared the situation to a human reader learning how to be a writer by reading books, for the purpose of creating a new work. "Like any reader aspiring to be a writer, Anthropic's [AI] trained upon works not to race ahead and replicate or supplant them -- but to turn a hard corner and create something different," Alsup wrote. According to the ruling, although Anthropic's use of copyrighted books as training material for Claude was fair use, the court will hold a trial on pirated books used to create Anthropic's central library and determine the resulting damages. Related: 'Extraordinarily Expensive': Getty Images Is Pouring Millions of Dollars Into One AI Lawsuit, CEO Says The ruling, the first time that a federal judge has sided with tech companies over creatives in an AI copyright lawsuit, creates a precedent for courts to favor AI companies over individuals in AI copyright disputes. These copyright lawsuits rely on how a judge interprets the fair use doctrine, a concept in copyright law that permits the use of copyrighted material without obtaining permission from the copyright holder. Fair use rulings depend on how different the end work is from the original, what the end work is being used for, and if it is being replicated for commercial gain. The plaintiffs in the class action case, Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson, are all authors who allege that Anthropic used their work to train its chatbot without their permission. They filed the initial complaint, Bartz v. Anthropic, in August 2024, alleging that Anthropic had violated copyright law by pirating books and replicating them to train its AI chatbot. The ruling details that Anthropic downloaded millions of copyrighted books for free from pirate sites. The startup also bought print copies of copyrighted books, some of which it already had in its pirated library. Employees tore off the bindings of these books, cut down the pages, scanned them, and stored them in digital files to add to a central digital library. From this central library, Anthropic selected different groupings of digitized books to train its AI chatbot, Claude, the company's primary revenue driver. Related: 'Bottomless Pit of Plagiarism': Disney, Universal File the First Major Hollywood Lawsuit Against an AI Startup The judge ruled that because Claude's output was "transformative," Anthropic was permitted to use the copyrighted works under the fair use doctrine. However, Anthropic still has to go to trial over the books it pirated. "Anthropic had no entitlement to use pirated copies for its central library," the ruling reads. Claude has proven to be lucrative. According to the ruling, Anthropic made over one billion dollars in annual revenue last year from corporate clients and individuals paying a subscription fee to use the AI chatbot. Paid subscriptions for Claude range from $20 per month to $100 per month. Anthropic faces another lawsuit from Reddit. In a complaint filed earlier this month in Northern California court, Reddit claimed that Anthropic used its site for AI training material without permission.
[71]
What Do Meta and Anthropic's 'Fair Use' Wins Mean for A.I. Copyright Cases?
Fair use rulings favor A.I. firms Meta and Anthropic, yet concerns over piracy and creative market impact remain unresolved. As generative A.I. tools continue to proliferate at a rapid pace, lawsuits from content creators concerned about how these systems are trained have followed just as swiftly. While two rulings this week favored Anthropic and Meta, upholding their use of copyrighted books to train large language models (LLMs), they also spotlighted unresolved issues, including the use of pirated materials and whether a new legal framework may be needed for this emerging technology. And uncertainty remains about how A.I. companies will fare in future lawsuits. Sign Up For Our Daily Newsletter Sign Up Thank you for signing up! By clicking submit, you agree to our <a href="http://observermedia.com/terms">terms of service</a> and acknowledge we may use your information to send you emails, product samples, and promotions on this website and other properties. You can opt out anytime. See all of our newsletters "Both cases are broadly positive," Brandon Butler, executive director of Re:Create, a coalition focused on balanced copyright, told Observer. "But these are District Court decisions, so there will be more steps down the road and there's a lot of other cases out there." Judges debate over generative A.I.'s "transformative nature" On June 23, a federal judge ruled in favor of Anthropic in a lawsuit filed last year by a group of authors who claimed the company's Claude models were trained on copyrighted books without permission or compensation. Judge William Alsup found that Anthropic's use was protected under the "fair use" doctrine, citing the transformative nature of how the company used the material. "Like any reader aspiring to be a writer, Anthropic's LLMs trained upon works not to race ahead and replicate or supplant them -- but to turn a hard corner and create something different," Alsup wrote in his decision. However, he criticized Anthropic's decision to download millions of copyrighted books from pirate websites. A separate trial scheduled for December will determine whether the company owes damages. In a statement, Anthropic said it was pleased with the court's recognition of its transformative use, calling the ruling "consistent with copyright's purpose in enabling creativity and fostering scientific progress." In 2024, Meta was also sued by a group of authors, including comedian Sarah Silverman and writer Ta-Nehisi Coates. A ruling from Judge Vince Chhabria on June 25 sided with the tech giant -- though with some caveats. While Chhabria found that Meta's use of copyrighted books to train its Llama models qualified as fair use, he noted that the plaintiffs had made flawed arguments, failing to show that Meta's actions harmed the market for authors. Chhabria also criticized Judge Alsup's earlier ruling for focusing "heavily on the transformative nature of generative A.I. while brushing aside concerns about the harm it can inflict on the market for the works it gets trained on" He suggested that market impact will become increasingly important in future fair use rulings. Generative A.I., he warned, has the potential to "flood the market" with an endless stream of images, songs, articles, and books created with far less effort than by humans -- undermining incentives for people to create "the old-fashioned way." Meta, for its part, said it welcomed the decision. "Open-source A.I. models are powering transformative innovations, productivity, and creativity for individuals and companies," the company said in a statement. "Fair use of copyright material is a vital legal framework for building this transformative technology." By raising the issue of market dilution, Judge Chhabria's decision could influence the growing number of lawsuits facing a wide range of A.I. companies sued by authors, news publishers, film studios and artists. "In cases involving uses like Meta's, it seems like the plaintiffs will often win -- at least where those cases have better-developed records on the market effects of the defendant's use," Chhabria noted. Certain sectors, such as news publishing, may have stronger arguments on this front due to the direct competitive threat posed by A.I. tools, according to Butler. "I do suspect that as these cases go on, other plaintiffs are going to use that theory and see if other judges agree," he said. For now, however, there's little doubt that the recent rulings in favor of Anthropic and Meta represent early wins for tech companies. "Certainly this is not the end of the story," said Butler. "It is the very, very beginning -- but it's a very positive beginning."
[72]
Anthropic wins key ruling on AI in authors' copyright lawsuit
A federal judge in San Francisco ruled late Monday that Anthropic's use of books without permission to train its artificial intelligence system was legal under US copyright law. Siding with tech companies on a pivotal question for the AI industry, US District Judge William Alsup said Anthropic made "fair use" of books by writers Andrea Bartz, Charles Graeber and Kirk Wallace Johnson to train its Claude large language model. Alsup also said, however, that Anthropic's storage of the authors' books in a "central library" violated their copyrights and was not fair use. Spokespeople for Anthropic and attorneys for the authors did not immediately respond to requests for comment on the ruling on Tuesday. The writers sued Anthropic last year, arguing that the company, which is backed by Amazon and Alphabet, used pirated versions of their books without permission or compensation to teach Claude to respond to human prompts. The class action lawsuit is one of several brought by authors, news outlets and other copyright owners against companies including OpenAI, Microsoft and Meta Platforms over their AI training. The doctrine of fair use allows the use of copyrighted works without the copyright owner's permission in some circumstances. Fair use is a key legal defense for the tech companies, and Alsup's decision is the first to address it in the context of generative AI. AI companies argue their systems make fair use of copyrighted material to create new, transformative content, and that being forced to pay copyright holders for their work could hamstring the burgeoning AI industry. Anthropic told the court that it made fair use of the books and that US copyright law "not only allows, but encourages" its AI training because it promotes human creativity. The company said its system copied the books to "study Plaintiffs' writing, extract uncopyrightable information from it, and use what it learned to create revolutionary technology." Copyright owners say that AI companies are unlawfully copying their work to generate competing content that threatens their livelihoods. Alsup agreed with Anthropic on Monday that its training was "exceedingly transformative." "Like any reader aspiring to be a writer, Anthropic's LLMs trained upon works not to race ahead and replicate or supplant them - but to turn a hard corner and create something different," Alsup said.
[73]
US judge backs using copyrighted books to train AI
A US federal judge has sided with Anthropic regarding training its artificial intelligence models on copyrighted books without authors' permission, a decision with the potential to set a major legal precedent in AI deployment. Tremendous amounts of data are needed to train large language models powering generative AI. Musicians, book authors, visual artists and news publications have sued various AI companies that used their data without permission or payment.A US federal judge has sided with Anthropic regarding training its artificial intelligence models on copyrighted books without authors' permission, a decision with the potential to set a major legal precedent in AI deployment. District Court Judge William Alsup ruled on Monday that the company's training of its Claude AI models with books bought or pirated was allowed under the "fair use" doctrine in the US Copyright Act. "Use of the books at issue to train Claude and its precursors was exceedingly transformative and was a fair use," Alsup wrote in his decision. "The technology at issue was among the most transformative many of us will see in our lifetimes," Alsup added in his 32-page decision, comparing AI training to how humans learn by reading books. Tremendous amounts of data are needed to train large language models powering generative AI. Musicians, book authors, visual artists and news publications have sued various AI companies that used their data without permission or payment. AI companies generally defend their practices by claiming fair use, arguing that training AI on large datasets fundamentally transforms the original content and is necessary for innovation. "We are pleased that the court recognized that using 'works to train LLMs was transformative,'" an Anthropic spokesperson said in response to an AFP query. The judge's decision is "consistent with copyright's purpose in enabling creativity and fostering scientific progress," the spokesperson added. - Blanket protection rejected - The ruling stems from a class-action lawsuit filed by authors Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson, who accused Anthropic of illegally copying their books to train Claude, the company's AI chatbot that rivals ChatGPT. However, Alsup rejected Anthropic's bid for blanket protection, ruling that the company's practice of downloading millions of pirated books to build a permanent digital library was not justified by fair use protections. Along with downloading books from websites offering pirated works, Anthropic bought copyrighted books, scanned the pages and stored them in digital formats, according to court documents. Anthropic's aim was to amass a library of "all the books in the world" for training AI models on content as deemed fit, the judge said in his ruling. While training AI models on the pirated content posed no legal violation, downloading pirated copies to build a general-purpose library constituted copyright infringement, the judge ruled, regardless of eventual training use. The case will now proceed to trial on damages related to the pirated library copies, with potential penalties including financial damages. Anthropic said it disagreed with going to trial on this part of the decision and was evaluating its legal options. "Judge Alsup's decision is a mixed bag," said Keith Kupferschmid, chief executive of US nonprofit Copyright Alliance. "In some instances AI companies should be happy with the decision and in other instances copyright owners should be happy." Valued at $61.5 billion and heavily backed by Amazon, Anthropic was founded in 2021 by former OpenAI executives. The company, known for its Claude chatbot and AI models, positions itself as focused on AI safety and responsible development.
[74]
Judge Dismisses Authors' Copyright Lawsuit Against Meta Over AI Training
A group of authors accused Meta Platforms of stealing their works to train its artificial intelligence technology. A federal judge on Wednesday sided with Facebook parent Meta Platforms in dismissing a copyright infringement lawsuit from a group of authors who accused the company of stealing their works to train its artificial intelligence technology. The ruling from U.S. District Judge Vince Chhabri was the second in a week from San Francisco's federal court to dismiss major copyright claims from book authors against the rapidly developing AI industry. Chhabri found that 13 authors who sued Meta "made the wrong arguments" and tossed the case. But the judge also said that the ruling is limited to the authors in the case and does not mean that Meta's use of copyrighted materials is lawful. Lawyers for the plaintiffs -- a group of well-known writers that includes comedian Sarah Silverman and authors Jacqueline Woodson and Ta-Nehisi Coates -- didn't immediately respond to a request for comment Wednesday. Meta also didn't immediately respond to a request for comment. "This ruling does not stand for the proposition that Meta's use of copyrighted materials to train its language models is lawful," Chhabri wrote. "It stands only for the proposition that these plaintiffs made the wrong arguments and failed to develop a record in support of the right one." On Monday, from the same courthouse, U.S. District Judge William Alsup ruled that AI company Anthropic didn't break the law by training its chatbot Claude on millions of copyrighted books, but the company must still go to trial for illicitly acquiring those books from pirate websites instead of buying them. But the actual process of an AI system distilling from thousands of written works to be able to produce its own passages of text qualified as "fair use" under U.S. copyright law because it was "quintessentially transformative," Alsup wrote. Chhabria, in his Meta ruling, criticized Alsup's reasoning on the Anthropic case, arguing that "Alsup focused heavily on the transformative nature of generative AI while brushing aside concerns about the harm it can inflict on the market for the works it gets trained on." Chhabria suggested that a case for such harm can be made. In the Meta case, the authors had argued in court filings that Meta is "liable for massive copyright infringement" by taking their books from online repositories of pirated works and feeding them into Meta's flagship generative AI system Llama. Lengthy and distinctively written passages of text -- such as those found in books -- are highly useful for teaching generative AI chatbots the patterns of human language. "Meta could and should have paid" to buy and license those literary works, the authors' attorneys argued. Meta countered in court filings that U.S. copyright law "allows the unauthorized copying of a work to transform it into something new" and that the new, AI-generated expression that comes out of its chatbots is fundamentally different from the books it was trained on. "After nearly two years of litigation, there still is no evidence that anyone has ever used Llama as a substitute for reading Plaintiffs' books, or that they even could," Meta's attorneys argued. Meta says Llama won't output the actual works it has copied, even when asked to do so. "No one can use Llama to read Sarah Silverman's description of her childhood, or Junot Diaz's story of a Dominican boy growing up in New Jersey," its attorneys wrote. Accused of pulling those books from online "shadow libraries," Meta has also argued that the methods it used have "no bearing on the nature and purpose of its use" and it would have been the same result if the company instead struck a deal with real libraries. Such deals are how Google built its online Google Books repository of more than 20 million books, though it also fought a decade of legal challenges before the U.S. Supreme Court in 2016 let stand lower court rulings that rejected copyright infringement claims. The authors' case against Meta forced CEO Mark Zuckerberg to be deposed, and has disclosed internal conversations at the company over the ethics of tapping into pirated databases that have long attracted scrutiny. "Authorities regularly shut down their domains and even prosecute the perpetrators," the authors' attorneys argued in a court filing. "That Meta knew taking copyrighted works from pirated databases could expose the company to enormous risk is beyond dispute: it triggered an escalation to Mark Zuckerberg and other Meta executives for approval. Their gamble should not pay off." "Whatever the merits of generative artificial intelligence, or GenAI, stealing copyrighted works off the Internet for one's own benefit has always been unlawful," they argued. The named plaintiffs are Jacqueline Woodson, Richard Kadrey, Andrew Sean Greer, Rachel Louise Snyder, David Henry Hwang, Ta-Nehisi Coates, Laura Lippman, Matthew Klam, Junot Diaz, Sarah Silverman, Lysa TerKeurst, Christopher Golden and Christopher Farnsworth. 20 Years Of Free Journalism Your Support Fuels Our Mission Your Support Fuels Our Mission For two decades, HuffPost has been fearless, unflinching, and relentless in pursuit of the truth. Support our mission to keep us around for the next 20 -- we can't do this without you. We remain committed to providing you with the unflinching, fact-based journalism everyone deserves. Thank you again for your support along the way. We're truly grateful for readers like you! Your initial support helped get us here and bolstered our newsroom, which kept us strong during uncertain times. Now as we continue, we need your help more than ever. We hope you will join us once again. We remain committed to providing you with the unflinching, fact-based journalism everyone deserves. Thank you again for your support along the way. We're truly grateful for readers like you! Your initial support helped get us here and bolstered our newsroom, which kept us strong during uncertain times. Now as we continue, we need your help more than ever. We hope you will join us once again. Support HuffPost Already contributed? Log in to hide these messages. 20 Years Of Free Journalism For two decades, HuffPost has been fearless, unflinching, and relentless in pursuit of the truth. Support our mission to keep us around for the next 20 -- we can't do this without you. Support HuffPost Already contributed? Log in to hide these messages. Most of the plaintiffs had asked Chhabria to rule now, rather than wait for a jury trial, on the basic claim of whether Meta infringed on their copyrights. Two of the plaintiffs, Ta-Nehisi Coates and Christopher Golden, did not seek such summary judgment. Chhabri said in the ruling that while he had "no choice" but to grant Meta's summary judgment tossing the case, "in the grand scheme of things, the consequences of this ruling are limited. This is not a class action, so the ruling only affects the rights of these 13 authors -- not the countless others whose works Meta used to train its models."
[75]
Judge dismisses authors' copyright lawsuit against Meta over AI training
The Wednesday ruling from US District Judge Vince Chhabria was the second in a week from San Francisco's federal court to dismiss major copyright claims from book authors against the rapidly developing AI industry. A federal judge sided with Facebook parent Meta Platforms in dismissing a copyright infringement lawsuit from a group of authors who accused the company of stealing their works to train its artificial intelligence technology. The Wednesday ruling from US District Judge Vince Chhabria was the second in a week from San Francisco's federal court to dismiss major copyright claims from book authors against the rapidly developing AI industry. Chhabria found that 13 authors who sued Meta "made the wrong arguments" and tossed the case. But the judge also said that the ruling is limited to the authors in the case and does not mean that Meta's use of copyrighted materials is lawful. "This ruling does not stand for the proposition that Meta's use of copyrighted materials to train its language models is lawful," Chhabria wrote. "It stands only for the proposition that these plaintiffs made the wrong arguments and failed to develop a record in support of the right one." Lawyers for the plaintiffs - a group of well-known writers that includes comedian Sarah Silverman and authors Jacqueline Woodson and Ta-Nehisi Coates - said in a statement that the "court ruled that AI companies that 'feed copyright-protected works into their models without getting permission from the copyright holders or paying for them' are generally violating the law. Yet, despite the undisputed record of Meta's historically unprecedented pirating of copyrighted works, the court ruled in Meta's favor. We respectfully disagree with that conclusion." Meta said it appreciates the decision. "Open-source AI models are powering transformative innovations, productivity and creativity for individuals and companies, and fair use of copyright material is a vital legal framework for building this transformative technology," the Menlo Park, California-based company said in a statement. Although Meta prevailed in its request to dismiss the case, it could turn out to be a pyrrhic victory. In his 40-page ruling, Chhabria repeatedly indicated reasons to believe that Meta and other AI companies have turned into serial copyright infringers as they train their technology on books and other works created by humans, and seemed to be inviting other authors to bring cases to his court presented in a manner that would allow them to proceed to trial. The judge scoffed at arguments that requiring AI companies to adhere to decades-old copyright laws would slow down advances in a crucial technology at a pivotal time. "These products are expected to generate billions, even trillions of dollars for the companies that are developing them. If using copyrighted works to train the models is as necessary as the companies say, they will figure out a way to compensate copyright holders for it." On Monday, from the same courthouse, U.S. District Judge William Alsup ruled that AI company Anthropic didn't break the law by training its chatbot Claude on millions of copyrighted books, but the company must still go to trial for illicitly acquiring those books from pirate websites instead of buying them. But the actual process of an AI system distilling from thousands of written works to be able to produce its own passages of text qualified as "fair use" under U.S. copyright law because it was "quintessentially transformative," Alsup wrote. In the Meta case, the authors had argued in court filings that Meta is "liable for massive copyright infringement" by taking their books from online repositories of pirated works and feeding them into Meta's flagship generative AI system Llama. Lengthy and distinctively written passages of text - such as those found in books - are highly useful for teaching generative AI chatbots the patterns of human language. "Meta could and should have paid" to buy and license those literary works, the authors' attorneys argued. Meta countered in court filings that U.S. copyright law "allows the unauthorized copying of a work to transform it into something new" and that the new, AI-generated expression that comes out of its chatbots is fundamentally different from the books it was trained on. "After nearly two years of litigation, there still is no evidence that anyone has ever used Llama as a substitute for reading Plaintiffs' books, or that they even could," Meta's attorneys argued. Meta says Llama won't output the actual works it has copied, even when asked to do so. "No one can use Llama to read Sarah Silverman's description of her childhood, or Junot Diaz's story of a Dominican boy growing up in New Jersey," its attorneys wrote. Accused of pulling those books from online "shadow libraries," Meta has also argued that the methods it used have "no bearing on the nature and purpose of its use" and it would have been the same result if the company instead struck a deal with real libraries. Such deals are how Google built its online Google Books repository of more than 20 million books, though it also fought a decade of legal challenges before the U.S. Supreme Court in 2016 let stand lower court rulings that rejected copyright infringement claims. The authors' case against Meta forced CEO Mark Zuckerberg to be deposed, and has disclosed internal conversations at the company over the ethics of tapping into pirated databases that have long attracted scrutiny. "Authorities regularly shut down their domains and even prosecute the perpetrators," the authors' attorneys argued in a court filing. "That Meta knew taking copyrighted works from pirated databases could expose the company to enormous risk is beyond dispute: it triggered an escalation to Mark Zuckerberg and other Meta executives for approval. Their gamble should not pay off." The named plaintiffs are Jacqueline Woodson, Richard Kadrey, Andrew Sean Greer, Rachel Louise Snyder, David Henry Hwang, Ta-Nehisi Coates, Laura Lippman, Matthew Klam, Junot Diaz, Sarah Silverman, Lysa TerKeurst, Christopher Golden and Christopher Farnsworth. Chhabria said in the ruling that while he had "no choice" but to grant Meta's summary judgment tossing the case, "in the grand scheme of things, the consequences of this ruling are limited. This is not a class action, so the ruling only affects the rights of these 13 authors -- not the countless others whose works Meta used to train its models."
[76]
US judge sides with Meta in AI training copyright case
Musicians, book authors, visual artists and news publications have sued various AI companies that used their data without permission or payment. AI companies generally defend their practices by claiming fair use, arguing that training AI on large datasets fundamentally transforms the original content and is necessary for innovation. A US judge on Wednesday handed Meta a victory over authors who accused the tech giant of violating copyright law by training Llama artificial intelligence on their creations without permission. District Court Judge Vince Chhabria in San Francisco ruled that Meta's use of the works to train its AI model was "transformative" enough to constitute "fair use" under copyright law, in the second such courtroom triumph for AI firms this week. However, it came with a caveat that the authors could have pitched a winning argument that by training powerful generative AI with copyrighted works, tech firms are creating a tool that could let a sea of users compete with them in the literary marketplace. "No matter how transformative (generative AI) training may be, it's hard to imagine that it can be fair use to use copyrighted books to develop a tool to make billions or trillions of dollars while enabling the creation of a potentially endless stream of competing works that could significantly harm the market for those books," Chhabria said in his ruling. Tremendous amounts of data are needed to train large language models powering generative AI. Musicians, book authors, visual artists and news publications have sued various AI companies that used their data without permission or payment. AI companies generally defend their practices by claiming fair use, arguing that training AI on large datasets fundamentally transforms the original content and is necessary for innovation. "We appreciate today's decision from the court," a Meta spokesperson said in response to an AFP inquiry. "Open-source AI models are powering transformative innovations, productivity and creativity for individuals and companies, and fair use of copyright material is a vital legal framework for building this transformative technology." In the case before Chhabria, a group of authors sued Meta for downloading pirated copies of their works and using them to train the open-source Llama generative AI, according to court documents. Books involved in the suit include Sarah Silverman's comic memoir "The Bedwetter" and Junot Diaz's Pulitzer Prize-winning novel "The Brief Wondrous Life of Oscar Wao," the documents showed. "This ruling does not stand for the proposition that Meta's use of copyrighted materials to train its language models is lawful," the judge stated. "It stands only for the proposition that these plaintiffs made the wrong arguments and failed to develop a record in support of the right one."
[77]
Judge Rules Anthropic's AI Training As Fair Use, But Authors Can Still Sue Over Pirated Books; Separate Trial Will Decide Infringement And Damages
With the rise of generative AI, there seems to be an issue of blurring boundaries when it comes to original content and reproduced content, and the economic threat it tends to pose to creative professionals. This controversy is especially applicable when it comes to training AI models like ChatGPT or Claude, and where massive datasets are used to help the models learn and be able to generate new content. Anthropic has been under hot water for quite some time and accused of training its Claude AI models using copyrighted material, but the case seems to be finally shaping up in favor of Anthropic as a judge recently ruled that the method of AI training was fair use under U.S. copyright law. Anthropic has been fighting on the legal end and facing accusations of using copyrighted books to train its Claude AI model. U.S. federal judge William Alsup of the Northern District of California, however, passed a major ruling on June 24, 2025, in favor of Anthropic, where it was explicitly stated that the use of legally purchased and digitized books to train the Claude AI model is deemed fair use under the U.S. copyright law. The judge highlighted how turning text into AI knowledge and not copying or redistributing does meet the criteria of what is classified as fair use. While Alsup ruled out that using legally obtained copyrighted content for training generative AI models and to learn is not copying and marked the clear distinction between the two, the judge, however, held Anthropic accountable for using pirated books from sites such as Book3 and LibGen and showed little tolerance for illegal data sourcing even if the intention behind it is transformative. Judge Alsup further maintained that a separate trial would be held for Anthropic's use of pirated content and to decide the damages it would have to pay. Alsup maintained: This order doubts that any accused infringer could ever meet its burden of explaining why downloading source copies from pirate sites that it could have purchased or otherwise accessed lawfully was itself reasonably necessary to any subsequent fair use. By having the case turn two-fold, the court left the door open for authors to pursue a separate piracy trial of Anthropic. This is a major turning point for the future course of AI and its way of training AI models. By ruling that while training on legally purchased books falls under fair use, acquiring data through piracy cannot be justified, the judge is setting a strong precedent for tech companies and any AI cases going forward.
[78]
U.S. judge rules for Meta in AI training copyright case but says it isn't lawful
A U.S. judge on Wednesday handed Meta a victory over authors who accused the technology giant of violating copyright law by training Llama artificial intelligence on their creations without permission. District Court Judge Vince Chhabria in San Francisco ruled that Meta's use of the works to train its AI model was "transformative" enough to constitute "fair use" under copyright law, in the second such courtroom triumph for AI firms this week. However, it came with a caveat that the authors could have pitched a winning argument -- that by training powerful generative AI with copyrighted works, technology firms are creating a tool that could let a sea of users compete with them in the literary marketplace.
[79]
Bad News for Movie Studios: Authors Just Lost on a Key Issue In a Major AI Lawsuit
Indie Films Are Increasingly Getting Tax Credits to Shoot in California as Features Flee That's how a federal court characterized Amazon-backed Anthropic's use of millions of books across the web to teach its artificial intelligence system. It's the first decision to consider the issue and will serve as a template for other courts overseeing similar cases. And studios, now that some have entered the fight over the industry-defining technology, should be uneasy about the ruling. The thrust of these cases will be decided by one question: Are AI companies covered by fair use, the legal doctrine in intellectual property law that allows creators to build upon copyrighted works without a license? On that issue, a court found that Anthropic is on solid legal ground, at least with respect to training. The technology is "among the most transformative many of us will see in our lifetimes," wrote U.S. District Judge William Alsup. Still, Anthropic will face a trial over illegally downloading seven millions books to create a library that was used for training. That it later purchased copies of those books it stole off the internet earlier to cover its tracks doesn't absolve it of liability, the court concluded. The company faces potential damages of hundreds of millions of dollars stemming from the decision that could lead to Disney and Universal getting a similar payout depending on what they unearth in discovery over how Midjourney allegedly obtained copies of thousands of films that were repurposed to teach its image generator. Last year, authors filed a lawsuit against Anthropic accusing it of illegally downloading and copying their books to power its AI chatbot Claude. The company chose not to move to dismiss the complaint and instead skipped straight for a decision on fair use. In the ruling, the court found that authors don't have the right to exclude Anthropic from using their works to train its technology, much in the same way they don't have the right to exclude any person from reading their books to learn how to write. "Everyone reads texts, too, then writes new texts," reads the order. "They may need to pay for getting their hands on a text in the first instance. But to make anyone pay specifically for the use of a book each time they read it, each time they recall it from memory, each time they later draw upon it when writing new things in new ways would be unthinkable." If someone were to read all the modern-day classics, memorize them and emulate a blend of their writing, that wouldn't constitute copyright infringement, the court concluded. Like any reader who wants to be a writer, Anthropic's technology draws upon works not to replicate or supplant them but to create something entirely different, according to the order. Those aren't the findings that Disney or Universal -- both of whom are suing Midjourney for copyright infringement -- wanted or expected. For them, there's reason to worry that Alsup's analysis will shape the way in which the judge overseeing their case considers undermining development of a technology that was found by another court to be revolutionary (or something close to it). More broadly, it could be found that AI video generators, like Sora, are simply distilling every movie ever made to create completely new works. "This Anthropic decision will likely be cited by all creators of AI models to support the argument that fair use applies to the use of massive datasets to train foundational models," says Daniel Barsky, an intellectual property lawyer at Holland & Knight. Important to note: The authors didn't allege that responses generated by Anthropic infringed upon their works. And if they had, they would've lost that argument under the court's finding that guardrails are in place to ensure that no infringing ever reached users. Alsup compared it to Google imposing limits on how many snippets of text from any one book could be seen by a user through its Google Book service, preventing its search function from being misused as a way to access full books for free. "Here, if the outputs seen by users had been infringing, Authors would have a different case," Alsup writes. "And, if the outputs were ever to become infringing, Authors could bring such a case. But that is not this case." But that could be the case for Midjourney, which returns nearly exact replicas of frames from films in some instances. When prompted with "Thanos Infinity War," Midjourney -- an AI program that translates text into hyper-realistic graphics -- replied with an image of the purple-skinned villain in a frame that appears to be taken from the Marvel movie or promotional materials, with few to no alterations made. A shot of Tom Cruise in the cockpit of a fighter jet, from Top Gun: Maverick, is produced when the tool was asked for a frame from the film. The chatbots can seemingly replicate almost any animation style, generating startlingly accurate characters from titles ranging from DreamWorks' Shrek to Pixar's Ratatouille to Warner Bros.' The Lego Movie. "The fact that Midjourney generates copies and derivatives of" films from Disney and Universal proves that the company, without their knowledge or permission, "copied plaintiffs' copyrighted works to train and develop" its technology, states the complaint. Also at play: The possibility that Midjourney pirated the studios' movies. In the June 23 ruling, Alsup found that Anthropic illegally downloading seven million books to build a library to be used for training isn't covered by fair use. He said that the company could've instead paid for the copies. Such piracy, the court concluded, is "inherently, irredeemably infringing." With statutory damages for willful copyright infringement reaching up to $150,000 per work, massive payouts are a possibility.
[80]
Sarah Silverman Loses Key Issue in AI Lawsuit Against Meta, But Creators Get a Silver Lining
Hollywood South? Texas Makes Its Bid With Major Film Incentive Expansion In most circumstances, it's illegal for companies to use copyright-protected material to train their AI systems without permission or payment, a court said, while finding that a lawsuit from Sarah Silverman against Meta doesn't present one of those cases. U.S. District Judge Vince Chhabria on Wednesday sided with Meta on the novel legal question of whether AI companies are covered by fair use, the legal doctrine in intellectual property law that allows creators to build upon copyrighted works absent licenses. It's the second decision this week finding in favor of an AI firm on the issue, with another federal judge on Monday ruling against authors in a separate lawsuit by concluding that Anthropic is on solid legal ground over the legality of training. Still, Judge Chhabria cautioned not to extend his ruling as a defense of the practice since he was constrained by lawyers for the authors choosing not to advance certain arguments that he viewed as favorable to their side. Those arguments, in his view, relate to AI tools generating works that are so similar to creators' works that they'll compete with the originals and indirectly substitute for them. "No matter how transformative LLM training may be, it's hard to imagine that it can be fair use to use copyrighted books to develop a tool to make billions or trillions of dollars while enabling the creation of a potentially endless stream of competing works that could significantly harm the market for those books," Chhabria wrote. In Wednesday's ruling, the court called Meta's utilization of the books for training "highly transformative." By learning from the authors' works, the company's large language model Llama can edit emails, write skits or provide translation services, according to the order. The logic mirrors that of U.S. District Judge William Alsup, who concluded that authors in the lawsuit he's overseeing don't have the right to exclude Anthropic from using their works to train its technology as long as they purchased the books. Like any reader who wants to be a writer, the judge said, the Amazon-backed company's AI tool draws upon works not to replicate or supplant them but to create something entirely different, according to the order. Still, a finding that a work is "transformative" doesn't automatically provide protection from infringement under copyright law. Other factors are considered in the analysis. This includes potential harm in the market for the copyrighted material. Legal maneuvering played a critical part in Meta's win. Lawyers representing the authors opted to pay barely any attention to the issue of Meta copying books to create a product that will likely flood the market with similar reproductions -- a theory the court called a "potentially winning argument." Instead, they focused on how Meta's theft of authors' books for training harms the market for licensing their works for that purpose. Consider people using AI tools to generate massive amounts of text in significantly less time than it would take to write it themselves. In that scenario, they use services like ChatGPT or Claude to create and sell books, competing with the works used for training by OpenAI or Anthropic. "It's easy to imagine that AI-generated books could successfully crowd out lesser-known works or works by up-and-coming authors," Chhabria wrote. "While AI-generated books probably wouldn't have much of an effect on the market for the works of Agatha Christie, they could very well prevent the next Agatha Christie from getting noticed or selling enough books to keep writing." The court said that lawyers for the authors should've offered evidence that Meta allows users to create works that directly compete against Silverman's memoir or Rachel Louise Snyder's nonfiction works on domestic violence. Also a focus of the ruling: the notion that using books to teach people is not remotely similar to using them to create a product that a single person could employ to generate countless competing books using a fraction of the time and creativity it would otherwise take. Across dozens of lawsuits over the past two years, AI companies defending themselves from accusations of illegally hoovering up any and every creative work on the internet to train their systems have offered arguments that can be read as more rhetorical than legal: Don't rule against us or you'll stop the development of a groundbreaking technology. To this, the court responded, "These products are expected to generate billions, even trillions, of dollars for the companies that are developing them. If using copyrighted works to train the models is as necessary as the companies say, they will figure out a way to compensate copyright holders for it." Meta still faces a separate claim related to allegations that it illegally distributed the authors' books during the process to pirate their works. The authors, who include Silverman, Ta-Nehisi Coates and Richard Kadrey, are represented by David Boies, Joseph Saveri and Matthew Butterick, among others.
[81]
Court Rules Anthropic Doesn't Need Permission to Train AI With Books | PYMNTS.com
According to the report, U.S. District Judge William Alsup found that Anthropic made "fair use" of books by writers Andrea Bartz, Charles Graeber and Kirk Wallace Johnson in training its Claude large language model (LLM). However, Alsup also ruled that Anthropic's copying and storage of more than 7 million pirated books in a "central library" violated the authors' copyrights and was not fair use, and ordered a trial in December to decide how much Anthropic owes for the infringement. The Reuters report noted that U.S. copyright law holds that willful copyright infringement can justify statutory damages of up to $150,000 per work. A spokesperson for Anthropic told Reuters the company was pleased that the court recognized its AI training was "transformative" and "consistent with copyright's purpose in enabling creativity and fostering scientific progress." The writers filed the proposed class action against Anthropic last year, contending the Amazon and Google-backed company used pirated versions of their books without their consent or compensation to teach Claude to reply to human prompts. The news comes days after the BBC threatened legal action against AI search engine Perplexity, alleging that the company's "default AI model" was trained using the network's material. The BBC has demanded that Perplexity end all scraping of its content, delete any copies used for AI development, and propose compensation for the alleged infringement. A report by the Financial Times noted that this is the first time the BBC has sought legal recourse over content scraping by AI firms, a sign of the mounting concerns that its freely available public sector content is being widely repurposed without authorization. The broadcaster claims that parts of its content have been reproduced verbatim by Perplexity, with links to BBC articles surfacing in search results, including material that was only recently published online. BBC executives maintain that such practices harm the BBC's reputation for impartial journalism and hurt public trust, pointing to internal research that found 17% of Perplexity responses using BBC sources had significant inaccuracies or missing context. Recent coverage by PYMNTS has spotlighted the rising friction between generative AI companies and publishers over content scraping.
[82]
Judge rules Anthropic's use of books to train AI model is fair use
June 24 (UPI) -- A judge ruled the Anthropic artificial intelligence company didn't violate copyright laws when it used millions of copyrighted books to train its AI. According to his ruling, U.S. District Judge William Alsup concluded Monday "that the training use was a fair use." However, that doesn't mean Anthropic is out of the woods legally, as it's still potentially on the hook for allegedly having pirated books. Alsup wrote in his conclusion that although it was not legally wrong for Anthropic to train its AI with the unlawfully downloaded materials. "We will have a trial on the pirated copies used to create Anthropic's central library and the resulting damages, actual or statutory," he said. The owners of Anthropic claimed that they eventually started paying for downloaded books. "That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for the theft but it may affect the extent of statutory damages," Alsup wrote. The case document states that Anthropic offers an AI software service called "Claude," which is able to simulate human writing and reading because it was trained with books and other texts that were taken from a central library of materials gathered by the parent company. Authors Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson are the plaintiffs in the case, as they wrote books that Anthropic allegedly "copied from pirated and purchased sources." None of the usage was authorized by the authors. The case further purports that the owners of Anthropic knowingly downloaded at least seven million books, which they knew were pirated copies. It is unclear when a new trial in regard to the purported purposely downloading of pirated books will take place or if it has yet to be set.
[83]
US Court Backs Anthropic in AI Training with Purchased Books
Using purchased copyrighted works to train AI models is fair use under the US Copyright Law, a US district court of Northern California ruled on June 23. This decision came in the court case between authors Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson and Anthropic. As part of the AI model training process, Anthropic purchased copyrighted books and also used copyrighted works from piracy sites, with some overlap between the purchased and copyrighted works. From this library of books, Anthropic selected specific content to train its models on, and the books selected for the process included those of the authors. While the court sides with Anthropic in the case of the books that it purchased, it says that the books the company pirated were inherently infringing. "This order doubts that any accused infringer could ever meet its burden of explaining why downloading source copies from pirate sites that it could have purchased or otherwise accessed lawfully was itself reasonably necessary to any subsequent fair use," it notes. The court says that it will have a trial later on the pirated copies that the company used to create its central library, and the fact that Anthropic may have bought those books later will not absolve it of liability for theft. In 2021, Anthropic co-founder Ben Mann downloaded Books3, a database of over 196,640 books that he knew were pirated copies. Similarly, he also downloaded at least 5 million pirated books from LibGen. Then, in July 2022, Anthropic downloaded 2 million pirated book copies from Pirate Library Mirror. These databases included at least two books from each of the authors who filed the infringement lawsuit. In February 2024, after a change in the company's perspective about book piracy, the company bulk purchased books from distributors and retailers for its research library. It had originally spoken to publishers about licensing, but did not follow through with those conversations. Once Anthropic got the physical copies of books from distributors and retailers, it stripped them of their binding, cut up the pages, and scanned them into a digital form. These scanned copies include all the works that the authors have issued so far. The court notes that Anthropic may have encountered the authors' books on other occasions, too, such as while copying book reviews, academic papers, or blog posts. The company retained pirated copies of work even after it decided that it would not use a specific work for training purposes. Anthropic copied each work it trained its AI models on from the central library. Then, the company removed all the repetitive information from the copyrighted work, including headers, footers, page numbers, and multiple copies of the same book. After this, the company tokenised the book, which the company copied repeatedly during the training process. As per the order, the authors argued that each fully trained Anthropic model retains compressed copies of the works the company trained it on. This means that if someone compelled the model to recite the work, the AI model would have complied. Additionally, the company has filtering processes in place in its AI models for both user inputs and the model's output. The authors do not allege that the company has ever or will ever provide an infringing copy of their work to the end user. While Claude can create work that is as well written as that of the writers, it has not created any substantial knock-offs or exact copies of the work. Under section 107 of the US Copyright Act, courts can consider four factors when determining fair use -- To analyse whether or not Anthorpic's use of the work constitutes fair use, the court has to analyse the purpose of said use. Anthropic claims that the only purpose it used the books for was to train LLMs. On the other hand, the authors argue that the company used the works twice -- first to build a central library and second to train specific LLMs using shifting sets and subsets of that content. Authors also contend that turning the print version of the books into a digital format itself is an infringement. The court analyses the issue, factoring in each of the four factors for determining fair use. "The 'purpose and character' of using works to train LLMs was transformative -- spectacularly so," the court says, noting that all parties agree that model training was one of the purposes for which Anthropic used the copyrighted works. The court mentions that the company placed software between the user and the LLM to ensure that no infringing output ever reached the user. The authors have also not argued that any infringing content ever reached the user. If the outputs that the users saw had infringing content, the case would have been different; the court notes and says that the authors could bring a case if such a thing happened. However, for now, the authors focused exclusively on the infringement at the input end. The court says that everyone reads texts and then writes new texts. While they may have to pay to get their hands on the text to read it, to make them pay to use a book each time they read it or each time they recall it from memory, or draw upon it when writing new things would be "unthinkable." The authors have argued that the LLM training is intended to memorise the creative elements of their work. "Yes, Claude has outputted grammar, composition, and style that the underlying LLM distilled from thousands of works. But if someone were to read all the modern-day classics because of their exceptional expression, memorise them, and then emulate a blend of their best writing, would that violate the Copyright Act? Of course not," the court said, analysing the author's argument. Another key argument from the authors' end is that computers should not be allowed to do what people do. They cited a legal decision to support their claim; however, the court assessed that in that case, one of the parties had used the other's proprietary system for finding court opinions in response to a given legal topic to create a competing tool. This purpose was not transformative. The court says that a better comparison of the authors' situation would be an AI tool trained on court opinions and briefs to generate fresh legal writing. In fact, a different court had heard such a case and found fair use. With regards to the books that Anthropic bought and turned into digital copies for its library, the court finds its use transformative. It also mentions that Anthropic purchased these books fair and square, and with each purchase comes the entitlement to dispose of the copy as the owner sees fit. It was entitled to store the digital versions in a central library for all ordinary uses. The company did not create new copies of the work through the digital format; it destroyed the print original and replaced it with the digital. There is no evidence that Anthropic shared these copies with anyone outside of the organisation. "Yes, Authors also might have wished to charge Anthropic more for digital than for print copies. And, this order takes for granted that the Authors could have succeeded if Anthropic had been barred from the format change," the court holds. While the court sides with Anthropic on the fact that changing the book from print to digital was transformative, it does not agree that creating a central library for LLM training was transformative as well. This is because before buying the books from distributors and retailers, the company had already pirated 7 million books. Using these books was infringement, even if the company immediately used them for a transformative use and discarded them. However, here Anthropic did not even do that, it downloaded the pirated works and maintained them in its library 'forever'. Anthropic retained the pirated books even after it decided which content it would train models on. "Pirating copies to build a research library without paying for it, and to retain copies should they prove useful for one thing or another, was its own use -- and not a transformative one," the court says. The main function of this point in the law is to reveal the differences between the nature of the works at issue and their secondary use, and to show the relationship between how much was taken and the secondary use. Given the fact that the company chose the authors' works specifically for their expressive content (rather than just factual information), this factor weighs against finding Anthropic's use to be fair use. To analyse this, the court considers the amount of the copyrighted work used against the proposed transformative purpose. Based on a judgment from a different case, noting that what matters is the amount and substance of what Anthropic makes accessible to the public, and there is no argument that Anthropic made the works accessible to the general public. To this, the authors respond that the copying of their works, using them in training, was both extremely extensive and strictly not necessary. The court notes that while it is true that Anthropic could have used some other books or no books at all for training its LLMs, the company has shown reasonable evidence why it was necessary to use them anyway. The company needed billions of words to train its LLMs, even if it used a text that comprised a small fraction of books and a larger fraction of other texts, Anthropic still would have needed hundreds of thousands of books. While the court concludes that this factor goes in favour of Anthropic's fair use argument, it clarifies that this does not include the pirated books the company maintained in its library. These pirated books point against fair use. The copies used to train the LLMs do not displace the demand for the copies of the Authors' work in a way that counts under the Copyright Act. The authors contend that the training activity will result in an explosion of books that compete with their works. "[The] Authors' complaint is no different than it would be if they complained that training schoolchildren to write well would result in an explosion of competing works. This is not the kind of competitive or creative displacement that concerns the Copyright Act. The Act seeks to advance original works of authorship, not to protect authors against competition," the court says in this context. Another argument that the courts raised was that the training activity displaced (or will displace) a potential market where authors could license their works specifically for AI training purposes. Anthropic argued in response that the transaction costs of getting licensed content would be so high that they would either stop trying to license content altogether or stop developing AI technology entirely. The court argues that even if the authors are in the right here, such a market is "not one the Copyright Act entitles Authors to exploit." This factor also favours Anthropic's fair use argument. However, it points against Anthropic with regard to the pirated books.
[84]
Meta Wins Copyright Case Over AI Training Materials
The United States District Court for the Northern District of California recently ruled in favour of Meta over a copyright infringement lawsuit filed by a group of thirteen authors. The plaintiffs had alleged that Meta not only used copyrighted material to train its Large Language Models (LLMs) but also accessed them through digital piracy. Meta, had argued that it had operated within the fair use provisions of copyright law. While ruling in favour of Meta, the judge emphasised that the case hinged on the plaintiffs being able to prove that Meta's actions caused substantial market dilution to the detriment of the authors. However, the court noted that the plaintiffs had failed to present any substantial evidence to this effect, leading to a summary judgment in favour of Meta. The US Copyright Act lists four factors to be considered in determining whether a given use is fair: The first factor considers whether the use of copyrighted material is "transformative" -- that is, whether it went beyond mere copying and added something new with a different purpose. The judge granted this factor to Meta, stating, "There is no serious question that Meta's use of the plaintiffs' books had a 'further purpose' and 'different character' than the books -- that it was highly transformative." The judge rejected the authors' arguments that the use wasn't transformative. The authors claimed Meta's use had "no critical bearing" on their books, but the judge said criticism isn't required for fair use. The authors also argued that since Llama could mimic their writing styles, Meta was just "repackaging" their work. The judge disagreed, finding that even with adversarial prompts designed to make Llama reproduce training data, it wouldn't output more than 50 words from any of the authors' books. The court also acknowledged that Meta had developed Llama for commercial reasons and expected it to generate $460 billion to $1.4 trillion in revenue over ten years. However, while the commercial aspect could weigh against fair use, it wasn't decisive especially when the use was highly transformative. The judge also rejected the authors' argument that Meta's downloading from shadow libraries (illegal book repositories) automatically defeated fair use, stating that the point of fair use analysis was to determine if the copying was legitimate in the first place. However, the judge commented that Meta's use of piracy was not irrelevant, as it could show bad faith and potentially support illegal libraries if Meta's downloads benefited them financially. But the judge found no evidence that Meta's downloads actually supported these libraries financially. This factor grants greater copyright protection to creative works or artistic expression than to non-creative works like computer code. The judge gave this factor to the plaintiffs, holding that their work -- literature -- was precisely what copyright is intended to protect. Meta had argued that this factor favoured them because they only used the books to access "functional elements," not to capitalise on creative expression. The judge rejected this argument, stating that Meta's work relied upon the creative expression present in the books. LLMs learned statistical patterns between words, syntax, grammar, and other parts of language, but even the statistical relationships were a product of creative expression. However, the judge noted that the second factor did not play a decisive role in most fair use judgments. This factor asks about the amount of copyrighted content copied and whether that was reasonable for its purpose. The judge gave this factor to Meta, even though the tech giant had copied the books in their entirety. The court held that feeding a whole book to an LLM did more to train it than feeding only half the book would. Therefore, it was reasonably necessary for Meta to use the entirety of the works. In addition, the judge stated that the amount of copying was not particularly relevant in this case, as Meta's AI was not reproducing copies of the authors' work verbatim. As such, there was little risk of Llama being a substitute for the books in question. The fourth factor considers the potential impact on the market for the copyrighted work. The court clarified that the key harm to consider is market substitution -- meaning whether the secondary use replaces or competes with the original work in a way that could hurt its market. The court acknowledged that the plaintiffs had argued that Meta's use of their books to train a large language model like Llama could harm the market for their works, either by directly copying text or generating works that compete with the originals. However, the court found that the plaintiffs did not provide enough evidence to show significant harm. It determined that Llama did not generate enough of the plaintiffs' books, or text similar enough, to pose a real threat to their market. The court also discussed the possibility of market dilution, where Llama might generate enough similar works to compete with the original books, but again found that the plaintiffs did not present sufficient evidence to support this claim. They failed to provide concrete proof that AI-generated books had already harmed the market for their works or that they would do so in the future. This lawsuit is only one of numerous ongoing cases against alleged copyright infringements by AI companies. In the same week, the same US district court ruled in favour of AI firm Anthropic, affirming its defense that using purchased copyrighted works to train AI models is fair use. Interestingly, while the court sided with Anthropic in the case of the books it purchased, it said that the books the company pirated were inherently infringing. "This order doubts that any accused infringer could ever meet its burden of explaining why downloading source copies from pirate sites that it could have purchased or otherwise accessed lawfully was itself reasonably necessary to any subsequent fair use," it noted. Training AI models requires scraping vast quantities of data -- often from the open internet, but also from books, movies, YouTube videos, and social media platforms. Many publishers and content creators have alleged that such data mining practices constitute copyright infringement. OpenAI is the subject of at least nine such lawsuits in the US alone, including from the New York Times and the Authors Guild.
[85]
Is An AI Model Training On A Bought Copyright Work Fair Dealing?
A US district court recently held that using purchased copyrighted works is within the scope of the US copyright law. The decision came in a court case between Anthropic and a group of authors who claimed infringement for Anthropic's use of their copyrighted works in model training. To train its AI models, Anthropic had created a library of books, which included pirated books from SciHub and Books3, and books that the company bought from retailers and distributors, tore up, and turned into digital books. The court has limited its decision to the books that Anthropic turned into digital copies. To assess whether Anthropic's use of the books was legally permissible, the court put it through the four-factor test of 'fair use' under the US copyright law. These include the purpose and character of use, nature of copyrighted work, the amount of portion copied, effect of the use on market value. While the US found that Anthropic's use of the purchased books was fair and transformative, meeting three out of the four test criteria, Indian courts may have a different view about the situation based on India's Copyright Act, 1957. This law, while specifying certain cases where copyright infringement does not apply, does not contain a clear-cut 'fair use' exemption. Section 52 of the Indian Copyright Act includes specific acts that do not constitute a copyright infringement, one of which is 'fair dealing' with any work. This includes purposes such as: However, there is no easy answer to whether Anthropic's use of the licensed books would fall within the scope of fair dealing in India. Speaking to MediaNama, Sneha Jain, Partner at Saikrishna & Associates, explained that there are two schools of thought about 'transformative use' as envisioned under the US court's decision. One school of thought argues that the fair-dealing provision under Indian law is not akin to transformative use. The other, while acknowledging that transformative use may not be statutorily available, points to judgments where Indian courts have allowed the actions of entities who, based on a strict interpretation of the law, could have been considered infringers. "For instance, in the Syndicate Of The Press Of The University vs B.D. Bhandari case, the Delhi High Court (HC) noted that the exceptions stipulated under Section 52 did not apply to the case. However, it adopted the transformative use defence as available in US law to hold that a guidebook would be non-infringing," Jain explained, suggesting that the courts have previously relied on the US concept of transformative use in India. She added that even the Supreme Court (SC) did not interfere with the findings of the Delhi HC. While assessing the validity of Anthropic's argument, the US court had mentioned that everyone reads texts and learns from them, and writes their own texts. While people may have to pay for these books to get their hands on them to begin with, they can continue to re-read them or recall them from memory, the courts pointed out. "For centuries, we have read and re-read books. We have admired, memorised, and internalised their sweeping themes, their substantive points, and their stylistic solutions to recurring writing problems," the court explained. This argument from the court substantiated the fact that research and learning after acquiring the book was fair use. Under the Indian copyright law, private or personal use (including research) falls within fair dealing. However, Jain pointed out that Anthropic is not necessarily making 'personal' use of the content it is training its models on. "The law qualifies the research exemption with the words 'private or personal'. Because if you interpret all research to be exempt, then honestly, nobody would even need a subscription to any service that provides research content," she explained. In the Anthropic case, the authors argued that the court should not allow computers to do what people do. They cited a legal decision to support their claim; however, the court assessed that in the cited case, one of the parties had used the other's proprietary system for finding court opinions in response to a given legal topic to create a competing tool. The same was not applicable here. "Like any reader aspiring to be a writer, Anthropic's LLMs (Large Language Models) trained upon works not to race ahead and replicate or supplant them, but to turn a hard corner and create something different," the court argued in response. It further added that if Anthropic needed to make copies within the LLM or otherwise for the AI training process - that also constituted a transformative use. With regards to the Indian law, Intellectual Property (IP) lawyer Rahul Ajatshatru explained that, according to him, personal use or personal research should be restricted to research by natural persons even when done for a juristic person. He argued that when the Copyright Act, 1957, was enacted (even when last amended in 2012) it did not consider machine learning and AI-based research. For context, companies can fall under the scope of a juristic person of a country. The law treats juristic persons the same as it treats natural persons. "In the case of AI, no human is doing the research. It is the algorithm-based software that's 'learning' from the copyrighted works. According to me, research here is to be narrowly read, research by humans and not by computer programmes. A human mind should be permitted to copy in order to learn from a copyright-protected work without permission," he elaborated. Ajatshatru explained that copyright owners, in terms of the Indian Copyright Act, have the exclusive right to make a copy of their copyrighted work or store it in a digital medium. His comments come in the context of the fact that Anthropic bought physical books and scanned them to turn them into a digital format. "In terms of Section 52 of the Act, making a copy of a copyrighted work (without permission) for personal use or research is permitted. But, the key question is whether the permission to make a digital copy to train an AI or make an AI learn is covered under 'private' or 'personal' use or research," he said. Ajatshatru cautioned that it is important to look at the intent of the act while interpreting it. He mentioned that one should not forget the balancing act that the statute plays between copyright owners and copyright users. The US court's decision comes at a time when the issues of AI and copyright protection are gaining momentum across the world. In India, OpenAI and Indian news agency Asia News International (ANI) are contesting a copyright dispute over OpenAI's alleged use of ANI's copyrighted content in training its AI models. The country's Central Government has also formed a multi-stakeholder committee to study how AI intersects with the Copyright Act, 1957. "The US court's decision lends little assistance to the Delhi HC, because the law is different: the terminology, ingredients and requirements are different in both the statutes," Ajatshatru pointed out when asked about the implications of the District Court of California's Anthropic judgment on the ongoing AI and copyright case in India. He added that the concept and treatment of "fair use" is different from that of "fair dealing" under India's Copyright Act, and as such, one should not copy it blindly. Jain, however, argued that just like every other court decision, this would provide guidance. She explained that if courts were to continue to side with AI companies on fair use, she wouldn't be surprised if the lobbying to amend the Copyright Act gains more momentum.
[86]
Anthropic wins key ruling on AI in authors' copyright lawsuit
A federal judge in San Francisco ruled late Monday that Anthropic's use of books without permission to train its artificial intelligence system was legal under U.S. copyright law. Siding with tech companies on a pivotal question for the AI industry, U.S. District Judge William Alsup said Anthropic made "fair use" of books by writers Andrea Bartz, Charles Graeber and Kirk Wallace Johnson to train its Claude large language model. Alsup also said, however, that Anthropic's storage of the authors' books in a "central library" violated their copyrights and was not fair use. Spokespeople for Anthropic and attorneys for the authors did not immediately respond to requests for comment on the ruling on Tuesday. The writers sued Anthropic last year, arguing that the company, which is backed by Amazon and Alphabet, used pirated versions of their books without permission or compensation to teach Claude to respond to human prompts. The class action lawsuit is one of several brought by authors, news outlets and other copyright owners against companies including OpenAI, Microsoft and Meta Platforms over their AI training. The doctrine of fair use allows the use of copyrighted works without the copyright owner's permission in some circumstances. Fair use is a key legal defense for the tech companies, and Alsup's decision is the first to address it in the context of generative AI. AI companies argue their systems make fair use of copyrighted material to create new, transformative content, and that being forced to pay copyright holders for their work could hamstring the burgeoning AI industry. Anthropic told the court that it made fair use of the books and that U.S. copyright law "not only allows, but encourages" its AI training because it promotes human creativity. The company said its system copied the books to "study Plaintiffs' writing, extract uncopyrightable information from it, and use what it learned to create revolutionary technology." Copyright owners say that AI companies are unlawfully copying their work to generate competing content that threatens their livelihoods. Alsup agreed with Anthropic on Monday that its training was "exceedingly transformative." "Like any reader aspiring to be a writer, Anthropic's LLMs trained upon works not to race ahead and replicate or supplant them -- but to turn a hard corner and create something different," Alsup said. ---
[87]
Big Win for Meta: AI Copyright Case Outcome Could Shape Industry Future
Judge Rejects Plaintiffs' Copyright Claims in Meta AI Case, Citing Fair Use Meta achieved a legal victory in a copyright lawsuit filed by 13 authors. The lawsuit claimed that Meta used their books without permission to train its artificial intelligence models, particularly LLaMA. The central issue in the case was whether Meta's practice of copying copyrighted books to train large language models (LLMs) violated U.S. copyright law. Ultimately, the judge ruled that Meta's actions fell under fair use and decided in favor of the company.
[88]
Amazon-backed Anthropic wins key ruling in AI copyright lawsuit filed...
A federal judge in San Francisco ruled late on Monday that Anthropic's use of books without permission to train its artificial intelligence system was legal under US copyright law. Siding with tech companies on a pivotal question for the AI industry, US District Judge William Alsup said Anthropic made "fair use" of books by writers Andrea Bartz, Charles Graeber and Kirk Wallace Johnson to train its Claude large language model. Alsup also said, however, that Anthropic's copying and storage of more than 7 million pirated books in a "central library" infringed the authors' copyrights and was not fair use. The judge has ordered a trial in December to determine how much Anthropic owes for the infringement. US copyright law says that willful copyright infringement can justify statutory damages of up to $150,000 per work. An Anthropic spokesperson said the company was pleased that the court recognized its AI training was "transformative" and "consistent with copyright's purpose in enabling creativity and fostering scientific progress." The writers filed the proposed class action against Anthropic last year, arguing that the company, which is backed by Amazon and Alphabet, used pirated versions of their books without permission or compensation to teach Claude to respond to human prompts. The proposed class action is one of several lawsuits brought by authors, news outlets and other copyright owners against companies including OpenAI, Microsoft and Meta Platforms over their AI training. The doctrine of fair use allows the use of copyrighted works without the copyright owner's permission in some circumstances. Fair use is a key legal defense for the tech companies, and Alsup's decision is the first to address it in the context of generative AI. AI companies argue their systems make fair use of copyrighted material to create new, transformative content, and that being forced to pay copyright holders for their work could hamstring the burgeoning AI industry. Anthropic told the court that it made fair use of the books and that US copyright law "not only allows, but encourages" its AI training because it promotes human creativity. The company said its system copied the books to "study Plaintiffs' writing, extract uncopyrightable information from it, and use what it learned to create revolutionary technology." Copyright owners say that AI companies are unlawfully copying their work to generate competing content that threatens their livelihoods. Alsup agreed with Anthropic on Monday that its training was "exceedingly transformative." "Like any reader aspiring to be a writer, Anthropic's LLMs trained upon works not to race ahead and replicate or supplant them -- but to turn a hard corner and create something different," Alsup said. Alsup also said, however, that Anthropic violated the authors' rights by saving pirated copies of their books as part of a "central library of all the books in the world" that would not necessarily be used for AI training. Anthropic and other prominent AI companies including OpenAI and Meta Platforms have been accused of downloading pirated digital copies of millions of books to train their systems. Anthropic had told Alsup in a court filing that the source of its books was irrelevant to fair use. "This order doubts that any accused infringer could ever meet its burden of explaining why downloading source copies from pirate sites that it could have purchased or otherwise accessed lawfully was itself reasonably necessary to any subsequent fair use," Alsup said on Monday.
[89]
Anthropic wins key US ruling on AI training in authors' copyright lawsuit
(Reuters) -A federal judge in San Francisco ruled late on Monday that Anthropic's use of books without permission to train its artificial intelligence system was legal under U.S. copyright law. Siding with tech companies on a pivotal question for the AI industry, U.S. District Judge William Alsup said Anthropic made "fair use" of books by writers Andrea Bartz, Charles Graeber and Kirk Wallace Johnson to train its Claude large language model. Alsup also said, however, that Anthropic's copying and storage of more than 7 million pirated books in a "central library" infringed the authors' copyrights and was not fair use. The judge has ordered a trial in December to determine how much Anthropic owes for the infringement. U.S. copyright law says that willful copyright infringement can justify statutory damages of up to $150,000 per work. An Anthropic spokesperson said the company was pleased that the court recognized its AI training was "transformative" and "consistent with copyright's purpose in enabling creativity and fostering scientific progress." The writers filed the proposed class action against Anthropic last year, arguing that the company, which is backed by Amazon and Alphabet, used pirated versions of their books without permission or compensation to teach Claude to respond to human prompts. The proposed class action is one of several lawsuits brought by authors, news outlets and other copyright owners against companies including OpenAI, Microsoft and Meta Platforms over their AI training. The doctrine of fair use allows the use of copyrighted works without the copyright owner's permission in some circumstances. Fair use is a key legal defense for the tech companies, and Alsup's decision is the first to address it in the context of generative AI. AI companies argue their systems make fair use of copyrighted material to create new, transformative content, and that being forced to pay copyright holders for their work could hamstring the burgeoning AI industry. Anthropic told the court that it made fair use of the books and that U.S. copyright law "not only allows, but encourages" its AI training because it promotes human creativity. The company said its system copied the books to "study Plaintiffs' writing, extract uncopyrightable information from it, and use what it learned to create revolutionary technology." Copyright owners say that AI companies are unlawfully copying their work to generate competing content that threatens their livelihoods. Alsup agreed with Anthropic on Monday that its training was "exceedingly transformative." "Like any reader aspiring to be a writer, Anthropic's LLMs trained upon works not to race ahead and replicate or supplant them -- but to turn a hard corner and create something different," Alsup said. Alsup also said, however, that Anthropic violated the authors' rights by saving pirated copies of their books as part of a "central library of all the books in the world" that would not necessarily be used for AI training. Anthropic and other prominent AI companies including OpenAI and Meta Platforms have been accused of downloading pirated digital copies of millions of books to train their systems. Anthropic had told Alsup in a court filing that the source of its books was irrelevant to fair use. "This order doubts that any accused infringer could ever meet its burden of explaining why downloading source copies from pirate sites that it could have purchased or otherwise accessed lawfully was itself reasonably necessary to any subsequent fair use," Alsup said on Monday. (Reporting by Blake Brittain in Washington; Editing by Alexia Garamfalvi, Chizu Nomiyama, Louise Heavens and Matthew Lewis)
[90]
Fair use vs copyright: Anthropic's case and its impact on AI training
Court decision reshapes AI copyright future, separating ethical data use from unlawful content scraping. In a groundbreaking legal decision that could redefine the boundaries of copyright law in the age of artificial intelligence, a U.S. federal court has ruled that Anthropic, the AI startup behind the Claude language model, did not infringe on copyright when it used books to train its AI as long as those books were legally acquired. The court deemed the use transformative under the doctrine of fair use. But while Anthropic scored a major win, it's far from being off the hook. The company still faces serious legal trouble over millions of pirated books allegedly used in the early stages of model training. The lawsuit, Bartz v. Anthropic, filed by authors Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson, accused Anthropic of unlawfully using their copyrighted books to train its Claude models. Their complaint mirrored a growing wave of legal action by writers, artists, and publishers pushing back against the unlicensed use of their creative work by generative AI firms. At the heart of the case was the question: Can AI training on copyrighted text be considered fair use? Also read: OpenAI faces legal heat in India, here's why U.S. District Judge William Alsup answered with a resounding yes, partly. He ruled that Anthropic's use of purchased books to train Claude was "exceedingly transformative," likening the model's learning process to how a human might absorb and learn from literature to write something new. This ruling sets a major precedent. It marks the first time a U.S. court has explicitly endorsed the idea that AI model training - when done with lawfully obtained materials - can qualify as fair use. The decision could offer a protective shield to companies like OpenAI, Meta, and Google, who face similar lawsuits over the data they used to train large language models. For Anthropic and its peers, it's a legal win that affirms what many in tech have argued for years: that ingesting massive datasets to teach AI isn't the same as republishing or plagiarizing content. Rather, it's akin to how a person might read thousands of books to understand storytelling techniques, then write something original. Yet, the court's decision also came with a sharp rebuke. Judge Alsup made clear that Anthropic's alleged use of over 7 million pirated books, including titles from shadow libraries like Library Genesis, did not fall under the umbrella of fair use. He ruled that building a centralized library of stolen books was "unquestionably illegal" and ordered the case to proceed to trial on that front. If found liable, Anthropic could face damages amounting to billions of dollars. Also read: Meta accused of using pirated books for training AI with Mark Zuckerberg's approval This nuanced judgment reflects the complexity of AI copyright issues. On one hand, the court recognized that AI training is fundamentally different from copying and distributing. On the other, it held firm that fair use doesn't give tech companies a free pass to use stolen content. This balance is likely to influence future court decisions and how AI companies approach model training going forward. The case arrives at a time when AI development is racing ahead, often faster than the legal frameworks meant to govern it. Most copyright laws, including the U.S. fair use doctrine, were written before the internet, let alone generative AI, became part of daily life. This ruling may accelerate a push toward clearer guidelines for AI training. For now, it gives AI companies firmer legal ground if they can prove their data was lawfully sourced and their models don't reproduce copyrighted works verbatim. In practical terms, AI developers are likely to audit their training data sources more thoroughly, avoid unlicensed or pirated material and pursue formal licensing deals with publishers and authors At the same time, creators and rights holders are likely to continue pressing for compensation and more robust protection. Several authors' groups and publishers have already called for legislative updates to copyright law in light of AI's transformative nature. The court's decision in favor of Anthropic has sparked both celebration and concern. Tech advocates hail it as a win for innovation, creativity, and the open sharing of knowledge. Critics, however, worry that it opens the door to corporations profiting from creative work without compensation. With a trial on Anthropic's alleged use of pirated books scheduled for later this year, the story is far from over. But for now, the decision stands as a pivotal moment in the legal story of AI, one that sets new precedent and forces everyone, from engineers to lawmakers, to rethink the rules of creativity in the algorithmic age.
[91]
Meta wins AI copyright case filed by authors over book use
However, the judge made it clear that this ruling doesn't give Meta a free pass to use copyrighted content for AI training. Meta has won a major legal battle in a copyright lawsuit filed by 13 authors who claimed the company used their books without permission to train its artificial intelligence systems. The case centred around whether it was legal for the company to copy copyrighted works for training large language models (LLMs). On Wednesday, Judge Vince Chhabria ruled in Meta's favour, stating that the company is "entitled to summary judgment on its fair use defence to the claim that copying these plaintiffs' books for use as LLM training data was infringement." However, the judge made it clear that this ruling doesn't give Meta a free pass to use copyrighted content for AI training. "This ruling does not stand for the proposition that Meta's use of copyrighted materials to train its language models is lawful," Judge Chhabria explained, according to The Verge. "It stands only for the proposition that these plaintiffs made the wrong arguments and failed to develop a record in support of the right one." Also read: Meta faces backlash as technical error bans thousands of Facebook groups In the lawsuit, the authors made several claims about how Meta's AI model used their work. They argued that Llama could reproduce sections of their books and that Meta's use of their work harmed their ability to license it for training purposes. But the judge dismissed these points as weak, calling them "clear losers." The judge stated, "Llama is not capable of generating enough text from the plaintiffs' books to matter, and the plaintiffs are not entitled to the market for licensing their works as AI training data." Also read: Microsoft plans to cut more jobs at Xbox division next week in latest round of layoffs According to Judge Chhabria, plaintiffs failed to present a strong enough case to support the claim that Meta's copying would lead to "a product that will likely flood the market with similar works, causing market dilution." This ruling comes a day after Anthropic won in a separate case, where a federal judge ruled that using legally purchased copies of books to train its AI models qualifies as fair use.
Share
Copy Link
Recent rulings in favor of Meta and Anthropic in AI copyright cases mark significant victories for tech companies, but judges caution that these decisions don't set blanket precedents for AI training on copyrighted works.
In a series of landmark rulings, federal judges have sided with artificial intelligence (AI) companies Meta and Anthropic in lawsuits concerning the use of copyrighted books for AI model training. These decisions mark significant victories for the tech industry, potentially setting the stage for how future copyright cases involving AI might be adjudicated 12.
Source: The Verge
U.S. District Judge William Alsup ruled in favor of Anthropic, stating that using copyrighted works to train large language models (LLMs) was "quintessentially transformative" and "necessary" for building world-class AI models 2. Similarly, Judge Vince Chhabria granted summary judgment to Meta, finding that the plaintiffs failed to provide sufficient evidence of market harm caused by Meta's use of their copyrighted works 14.
However, both judges emphasized that their rulings were limited in scope and should not be interpreted as blanket approval for all AI training practices involving copyrighted materials 125.
The cases hinged on the interpretation of the fair use doctrine, a provision in copyright law that allows for limited use of copyrighted material without permission for purposes such as commentary, criticism, or research 3. In both rulings, the judges found that the AI companies' use of copyrighted books for training purposes fell under fair use, albeit for different reasons 12.
Despite these victories for AI companies, the rulings leave room for future challenges. Judge Chhabria explicitly stated that his decision "does not stand for the proposition that Meta's use of copyrighted materials to train its language models is lawful" 4. He suggested that cases with better-developed evidence on market effects might yield different outcomes 45.
Source: The Register
These cases are part of a broader legal landscape involving numerous lawsuits against major tech companies, including Google, OpenAI, and Microsoft 35. The outcomes of these cases could significantly impact the AI industry, potentially requiring companies to establish new licensing deals or explore alternative training methods 5.
While Anthropic won on the fair use argument for AI training, the company still faces a trial over allegations of book piracy. Judge Alsup ruled that Anthropic's downloading of millions of pirated books to build a "central library" did not constitute fair use and could result in damages 2.
Source: MediaNama
Both rulings emphasized the importance of demonstrating market harm in copyright cases. Judge Chhabria noted that markets for certain types of works, such as news articles, might be more vulnerable to indirect competition from AI outputs 4. This suggests that future cases, particularly those involving different types of copyrighted works, may yield different results 5.
As the AI industry continues to evolve, these rulings provide important insights into how courts may approach the intersection of copyright law and AI technology. However, they also underscore the need for clearer legal frameworks to address the unique challenges posed by AI's use of copyrighted materials 5.
NVIDIA announces significant upgrades to its GeForce NOW cloud gaming service, including RTX 5080-class performance, improved streaming quality, and an expanded game library, set to launch in September 2025.
10 Sources
Technology
16 hrs ago
10 Sources
Technology
16 hrs ago
Nvidia is reportedly developing a new AI chip, the B30A, based on its latest Blackwell architecture for the Chinese market. This chip is expected to outperform the currently allowed H20 model, raising questions about U.S. regulatory approval and the ongoing tech trade tensions between the U.S. and China.
11 Sources
Technology
16 hrs ago
11 Sources
Technology
16 hrs ago
SoftBank Group has agreed to invest $2 billion in Intel, buying common stock at $23 per share. This strategic investment comes as Intel undergoes a major restructuring under new CEO Lip-Bu Tan, aiming to regain its competitive edge in the semiconductor industry, particularly in AI chips.
18 Sources
Business
8 hrs ago
18 Sources
Business
8 hrs ago
Databricks, a data analytics firm, is set to raise its valuation to over $100 billion in a new funding round, showcasing the strong investor interest in AI startups. The company plans to use the funds for AI acquisitions and product development.
7 Sources
Business
42 mins ago
7 Sources
Business
42 mins ago
OpenAI introduces ChatGPT Go, a new subscription plan priced at ₹399 ($4.60) per month exclusively for Indian users, offering enhanced features and affordability to capture a larger market share.
15 Sources
Technology
8 hrs ago
15 Sources
Technology
8 hrs ago