34 Sources
[1]
Key fair use ruling clarifies when books can be used for AI training
Artificial intelligence companies don't need permission from authors to train their large language models (LLMs) on legally acquired books, US District Judge William Alsup ruled Monday. The first-of-its-kind ruling that condones AI training as fair use will likely be viewed as a big win for AI companies, but it also notably put on notice all the AI companies that expect the same reasoning will apply to training on pirated copies of books -- a question that remains unsettled. In the specific case that Alsup is weighing -- which pits book authors against Anthropic -- Alsup found that "the purpose and character of using copyrighted works to train LLMs to generate new text was quintessentially transformative" and "necessary" to build world-class AI models. Importantly, this case differs from other lawsuits where authors allege that AI models risk copying and distributing their work. Because authors suing Anthropic did not allege that any of Anthropic's outputs reproduced their works or expressive style, Alsup found there was no threat that Anthropic's text generator, Claude, might replace authors in their markets. And that lacking argument did tip the fair use analysis in favor of Anthropic. "Like any reader aspiring to be a writer, Anthropic's LLMs trained upon works not to race ahead and replicate or supplant them -- but to turn a hard corner and create something different," Alsup wrote. Alsup's ruling surely disappointed authors, who instead argued that Claude's reliance on their texts could generate competing summaries or alternative versions of their stories. The judge claimed these complaints were akin to arguing "that training schoolchildren to write well would result in an explosion of competing works." "This is not the kind of competitive or creative displacement that concerns the Copyright Act," Alsup wrote. "The Act seeks to advance original works of authorship, not to protect authors against competition." Alsup noted that authors would be able to raise new claims if they found evidence of infringing Claude outputs. That could change the fair use calculus, as it might in a case where a judge recently suggested that Meta's AI products might be "obliterating" authors' markets for works. "Authors concede that training LLMs did not result in any exact copies nor even infringing knockoffs of their works being provided to the public," Alsup wrote. "If that were not so, this would be a different case. Authors remain free to bring that case in the future should such facts develop." Anthropic must face trial over book piracy Anthropic is "pleased" with the ruling, issuing a statement applauding the court for recognizing "that using 'works to train LLMs was transformative -- spectacularly so.'" But Anthropic is not off the hook, granted summary judgment on AI training as fair use, but is still facing a trial over piracy that Alsup ruled did not favor a fair use finding. In the Anthropic case, the AI company is accused of downloading 7 million pirated books to build a research library where copies would be kept "forever" regardless of whether they were ever used for AI training. Seemingly realizing piracy may trigger legal challenges, Anthropic had later tried to replace pirated books with legally purchased copies. But the company also argued that even the initial copying of these pirated books was an "intermediary" step necessary to advance the transformative use of training AI. And perhaps at its least persuasive, Anthropic also argued that because it could have borrowed the books it stole, the theft alone shouldn't "short-circuit" the fair use analysis. But Alsup was not swayed by those arguments, noting that copying books from a pirate site is copyright infringement, "full stop." He rejected "Anthropic's assumption that the use of the copies for a central library can be excused as fair use merely because some will eventually be used to train LLMs," and he cast doubt on whether any of the other AI lawsuits debating piracy could ever escape without paying damages. "This order doubts that any accused infringer could ever meet its burden of explaining why downloading source copies from pirate sites that it could have purchased or otherwise accessed lawfully was itself reasonably necessary to any subsequent fair use," Alsup wrote. "Such piracy of otherwise available copies is inherently, irredeemably infringing even if the pirated copies are immediately used for the transformative use and immediately discarded." But Alsup said that the Anthropic case may not even need to decide on that, since Anthropic's retention of pirated books for its research library alone was not transformative. Alsup wrote that Anthropic's argument to hold onto potential AI training material it pirated in case it ever decided to use it for AI training was an attempt to "fast glide over thin ice." Additionally Alsup pointed out that Anthropic's early attempts to get permission to train on authors' works withered, as internal messages revealed the company concluded that stealing books was considered the more cost-effective path to innovation "to avoid 'legal/practice/business slog,' as cofounder and chief executive officer Dario Amodei put it." "Anthropic is wrong to suppose that so long as you create an exciting end product, every 'back-end step, invisible to the public,' is excused," Alsup wrote. "Here, piracy was the point: To build a central library that one could have paid for, just as Anthropic later did, but without paying for it." To avoid maximum damages in the event of a loss, Anthropic will likely continue arguing that replacing pirated books with purchased books should water down authors' fight, Alsup's order suggested. "That Anthropic later bought a copy of a book it earlier stole off the Internet will not absolve it of liability for the theft, but it may affect the extent of statutory damages," Alsup noted.
[2]
A federal judge sides with Anthropic in lawsuit over training AI on books without authors' permission | TechCrunch
Federal judge William Alsup ruled that it was legal for Anthropic to train its AI models on published books without the authors' permission. This marks the first time that the courts have given credence to AI companies' claim that fair use doctrine can absolve AI companies from fault when they use copyrighted materials to train LLMs. This decision comes as a blow to authors, artists, and publishers who have brought dozens of lawsuits against companies like OpenAI, Meta, Midjourney, Google, and more. While the ruling is not a guarantee that other judges will follow Judge Alsup's lead, it lays the foundations for a precedent that would side with tech companies over creatives. These lawsuits often depend on how a judge interprets fair use doctrine, a notoriously finicky carve out of copyright law that hasn't been updated since 1976 -- a time before the internet, let alone the concept of generative AI training sets. Fair use rulings take into account what the work is being used for (parody and education can be viable), whether or not it's being reproduced for commercial gain (you can write Star Wars fan fiction, but you can't sell it), and how transformative a derivative work is from the original. Companies like Meta have made similar fair use arguments in defense of training on copyrighted works, though before this week's decision, it was less clear how the courts would sway. In this particular case of Bartz v. Anthropic, the group of plaintiff authors also brought into question the manner in which Anthropic attained and stored their works. According to the lawsuit, Anthropic sought to create a "central library" of "all the books in the world" to keep "forever." But millions of these copyrighted books were downloaded for free from pirate sites, which is unambiguously illegal. While the judge granted that Anthropic's training of these materials was a fair use, the court will hold a trial about the nature of the "central library." "We will have a trial on the pirated copies used to create Anthropic's central library and the resulting damages," Judge Alsup wrote in the decision. "That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for theft but it may affect the extent of statutory damages."
[3]
Anthropic Scores a Landmark AI Copyright Win -- but Will Face Trial Over Piracy Claims
While the startup has won its "fair use" argument, it potentially faces billions of dollars in damages for allegedly pirating over 7 million books to build a digital library. Anthropic has scored a major victory in an ongoing legal battle over artificial intelligence models and copyright, one that may reverberate across the dozens of other AI copyright lawsuits winding through the legal system in the United States. A court has determined that it was legal for Anthropic to train its AI tools on copyrighted works, arguing that the behavior is shielded by the "fair use" doctrine, which allows for unauthorized use of copyrighted materials under certain conditions. "The training use was a fair use," senior district judge William Alsup wrote in a summary judgement order released late Monday evening. In copyright law, one of the main ways courts determine whether using copyrighted works without permission is fair use is to examine whether the use was "transformative," which means that it is not a substitute for the original work but rather something new. "The technology at issue was among the most transformative many of us will see in our lifetimes," Alsup wrote. "This is the first major ruling in a generative AI copyright case to address fair use in detail," says Chris Mammen, a managing partner at Womble Bond Dickinson who focuses on intellectual property law. "Judge Alsup found that training an LLM is transformative use -- even when there is significant memorization. He specifically rejected the argument that what humans do when reading and memorizing is different in kind from what computers do when training an LLM." The case, a class action lawsuit brought by book authors who alleged that Anthropic had violated their copyright by using their works without permission, was first filed in August 2024 in the US District Court for the Northern District of California. Anthropic is the first artificial intelligence company to win this kind of battle, but the victory comes with a large asterisk attached. While Alsup found that Anthropic's training was fair use, he ruled that the authors could take Anthropic to trial over pirating their works. While Anthropic eventually shifted to training on purchased copies of the books, it had nevertheless first collected and maintained an enormous library of pirated materials. "Anthropic downloaded over seven million pirated copies of books, paid nothing, and kept these pirated copies in its library even after deciding it would not use them to train its AI (at all or ever again). Authors argue Anthropic should have paid for these pirated library copies. This order agrees," Alsup writes. "We will have a trial on the pirated copies used to create Anthropic's central library and the resulting damages," the order concludes. Anthropic did not immediately respond to requests for comment. Lawyers for the plaintiffs declined to comment. The lawsuit, Bartz v. Anthropic was first filed less than a year ago; Anthropic asked for summary judgement on the fair use issue in February. It's notable that Alsup has far more experience with fair use questions than the average federal judge, as he presided over the initial trial in Google v. Oracle, a landmark case about tech and copyright that eventually went before the Supreme Court.
[4]
Judge OKs Anthropic's Use of Copyrighted Books in AI Training. That's Bad News for Creators
Katelyn is a writer with CNET covering social media, AI and online services. She graduated from the University of North Carolina at Chapel Hill with a degree in media and journalism. You can often find her with a novel and an iced coffee during her time off. Anthropic's use of copyright-protected books in its AI training process was "exceedingly transformative" and fair use, US senior district judge William Alsup ruled on Monday. It's the first time a judge has decided in favor of an AI company on the issue of fair use, in a significant win for generative AI companies and a blow for creators. Fair use is a doctrine that's part of US copyright law. It's a four-part test that, when the criteria is met, lets people and companies use protected content without the rights holder's permission for specific purposes, like when writing a term paper. Tech companies say that fair use exceptions are essential in order for them to access the massive quantities of human-generated content they need to develop the most advanced AI systems. Writers, actors and many other kinds of creators have been equally clear in arguing that the use of their content to propel AI is not fair use. Publishers, artists and content catalog owners have filed lawsuits alleging that AI companies like OpenAI, Meta and Midjourney are infringing on their protected intellectual property in attempt to circumvent costly, but standard, licensing procedures. (Disclosure: Ziff Davis, CNET's parent company, in April filed a lawsuit against OpenAI, alleging it infringed Ziff Davis copyrights in training and operating its AI systems.) The authors suing Anthropic for copyright infringement say their books were also obtained illegally -- that is, the books were pirated. That leads to the second part of Alsup's ruling, based on his concerns about Anthropic's methods of obtaining the books. In the ruling, he writes that Anthropic co-founder Ben Mann knowingly downloaded unauthorized copies of 5 million books from LibGen and an additional 2 million from Pirate Library Mirror (PirLiMi). The ruling also outlines how Anthropic deliberately obtained print copies of the books it previously pirated in order to create "its own catalog of bibliographic metadata." Anthropic vice president Tom Turvey, the ruling says, was "tasked with obtaining 'all the books in the world' while still avoiding as much 'legal/practice/business slog.'" That meant buying physical books from publishers to create a digital database. The Anthropic team destroyed and discarded millions of used books in this process in order to prep them for machine-readable scanning, by stripping them from their bindings and cutting them down to fit. Anthropic's acquisition and digitization of the print books was fair use, the ruling says. But it adds: "Creating a permanent, general-purpose library was not itself a fair use excusing Anthropic's piracy." Alsup ordered a new trial regarding the pirated library. Anthropic is one of many AI companies facing copyright claims in court, so this week's ruling is likely to have massive ripple effects across the industry. We'll have to see how the piracy claims resolve before we know how much money Anthropic may be ordered to pay in damages. But if the scales tip to grant multiple AI companies fair use exceptions, the creative industry and the people who work in it will certainly suffer damages, too.
[5]
Judge: It's Fair Use to Train AI on Books You Bought, But Not Ones You Pirated
A large language model is as free to read as you and me, a federal judge held Tuesday -- unless that LLM's creators didn't pay for the books used to train that AI system. Judge William Alsup's Tuesday order turns aside part of a class-action lawsuit filed by book authors Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson against the AI firm Anthropic but agrees with one of their key claims. That means Alsup's 32-page opinion could still prove expensive for the company behind the Claude series of AI models. The most important part of Alsup's ruling is that Anthropic has a fair-use defense for digitizing copies of the authors' books that it purchased to train the San Francisco firm's AI models. Calling that an "exceedingly transformative" use, Alsup found that the authors had no more right to demand payment for it than to charge a human reader for learning from their writing. "Everyone reads texts, too, then writes new texts," he wrote. "But to make anyone pay specifically for the use of a book each time they read it, each time they recall it from memory, each time they later draw upon it when writing new things in new ways would be unthinkable." In a later paragraph, Alsup compared the plaintiffs' argument to a complaint that "training schoolchildren to write well would result in an explosion of competing works." He concluded: "This is not the kind of competitive or creative displacement that concerns the Copyright Act." This case, unlike many other recent lawsuits brought against the operators of AI platforms, did not involve any claims that Claude had recreated or recited any copyrighted works: "Authors do not allege that any infringing copy of their works was or would ever be provided to users by the Claude service." Alsup also found that Anthropic did nothing wrong in its original act of book digitization. The company purchased paperback copies of books, scanned and digitized their contents as if they were CDs being ripped to copy to an iPod, and then destroyed the printed originals. "One replaced the other," Alsup writes. "And, there is no evidence that the new, digital copy was shown, shared, or sold outside the company." (Contrast that with the ruling by a panel of judges on a different federal circuit court last September that the Internet Archive had no right to turn digital copies of books it had legally obtained and scanned into e-book loans.) But Anthropic didn't just buy books by the truckload; it also downloaded millions of unauthorized copies of books from online troves of pirated works to speed up training Claude, then kept those copies around just in case. "Every factor points against fair use," Alsup wrote. He found that the company offered no justification "except for Anthropic's pocketbook and convenience." Anthropic's comment to The Verge stuck to the positive parts of Alsup's statement: "We are pleased that the Court recognized that using 'works to train LLMs was transformative -- spectacularly so.'" In October, News Corp. sued Perplexity, alleging that its answers represented a "substitute product" for that conglomerate's own work. In February, Thomson Reuters won a suit against a now-defunct startup called Ross Intelligence that had trained its AI service on the news agency's Westlaw reference to offer a competing service. Earlier in June, Disney and Universal sued the generative-AI image-generation platform Midjourney for offering near-lookalike depictions of those studios' copyrighted characters. PCMag's parent company Ziff Davis is also among the publishers pursuing litigation against AI platforms, having filed a lawsuit against OpenAI in April 2025 alleging it infringed Ziff Davis copyrights in training and operating its AI systems.
[6]
Judge rules mostly for Anthropic in AI book training case
Anthropic scores a qualified victory in fair use case, but got slapped for using over 7 million pirated copies One of the most tech-savvy judges in the US has ruled that Anthropic is within its rights to scan purchased books to train its Claude AI model, but that pirating content is legally out of bounds. In training its model, Anthropic bought millions of books, many second-hand, then cut them up and digitized the content. It also downloaded over 7 million pirated books from Books3 dataset, Library Genesis (Libgen), and the Pirate Library Mirror (PiLiMi), and that was the sticking point for Judge William Alsup of California's Northern District court. On Monday, he ruled that simply digitizing a print copy counted as fair use under current US law, as there was no duplication of the copyrighted work since the printed pages were destroyed after they were scanned. However, Anthropic may have to face trial over the use of pirated material. "As Anthropic trained successive LLMs, it became convinced that using books was the most cost-effective means to achieve a world-class LLM," Alsup wrote [PDF] in Monday's ruling. "During this time, however, Anthropic became 'not so gung ho about' training on pirated books 'for legal reasons.' It kept them anyway." Anthropic became 'not so gung ho about' training on pirated books 'for legal reasons.' The case was filed by three authors - Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson - who claimed that Anthropic illegally used their fiction and non-fiction works to train Claude. At least two of each author's books were included in the pirated material used by Anthropic. Alsup noted that Anthropic hired the former head of partnerships at Google's book-scanning project, Tom Turvey, who began conversations with publishers about licensing content, as other AI developers have done. But these talks were abandoned in favor of simply buying millions of books, taking the pages out, and scanning them, which the judge ruled was fair use. "We are pleased that the Court recognized that using 'works to train LLMs was transformative -- spectacularly so,'" an Anthropic spokesperson told The Register. "Consistent with copyright's purpose in enabling creativity and fostering scientific progress, Anthropic's LLMs trained upon works not to race ahead and replicate or supplant them -- but to turn a hard corner and create something different." On the matter of piracy, however, Alsup noted that in January or February 2021, Anthropic cofounder Ben Mann "downloaded Books3, an online library of 196,640 books that he knew had been assembled from unauthorized copies of copyrighted books -- that is, pirated." In June, he downloaded "at least five million copies of books" from Libgen, and in July 2022, another two million copies were downloaded from PiLiMi, both of which Alsup classified as "pirate libraries." Alsup found that the pirated works weren't necessarily used to train Claude, but that the company had retained them. That could prove legally problematic for the startup, Alsup ruled, since they were kept for "Anthropic's pocketbook and convenience," he found. "This order grants summary judgment for Anthropic that the training use was a fair use. And, it grants that the print-to-digital format change was a fair use for a different reason," he wrote. "But it denies summary judgment for Anthropic that the pirated library copies must be treated as training copies. We will have a trial on the pirated copies used to create Anthropic's central library and the resulting damages, actual or statutory (including for willfulness). That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for the theft but it may affect the extent of statutory damages." Alsup's ruling is mixed news for Anthropic, but he does know his onions. For the last quarter of a century, Alsup has presided over some of the biggest tech trials in history, and his rulings have been backed up by the Supreme Court in some cases. Alsup, a coder for over two decades (primarily in BASIC), presided over the Oracle-Google trial over fair use of Java code in Android, which led him to dabbling in that language. More recently, he sentenced former Google self-driving car engineer Anthony Levandowski to 18 months in prison for stealing proprietary info from his work at Google and bringing it to a new startup, Otto, which he later sold to Uber. President Trump later commuted the sentence in 2021. Bartz and Johnson had no comment at the time of going to press. Graeber declined to discuss the verdict. ®
[7]
Anthropic wins key ruling on AI in authors' copyright lawsuit
June 24 (Reuters) - A federal judge in San Francisco ruled late Monday that Anthropic's use of books without permission to train its artificial intelligence system was legal under U.S. copyright law. Siding with tech companies on a pivotal question for the AI industry, U.S. District Judge William Alsup said Anthropic made "fair use" of books by writers Andrea Bartz, Charles Graeber and Kirk Wallace Johnson to train its Claude large language model. Alsup also said, however, that Anthropic's storage of the authors' books in a "central library" violated their copyrights and was not fair use. Spokespeople for Anthropic and attorneys for the authors did not immediately respond to requests for comment on the ruling on Tuesday. The writers sued Anthropic last year, arguing that the company, which is backed by Amazon (AMZN.O), opens new tab and Alphabet (GOOGL.O), opens new tab, used pirated versions of their books without permission or compensation to teach Claude to respond to human prompts. The class action lawsuit is one of several brought by authors, news outlets and other copyright owners against companies including OpenAI, Microsoft (MSFT.O), opens new tab and Meta Platforms (META.O), opens new tab over their AI training. The doctrine of fair use allows the use of copyrighted works without the copyright owner's permission in some circumstances. Fair use is a key legal defense for the tech companies, and Alsup's decision is the first to address it in the context of generative AI. AI companies argue their systems make fair use of copyrighted material to create new, transformative content, and that being forced to pay copyright holders for their work could hamstring the burgeoning AI industry. Anthropic told the court that it made fair use of the books and that U.S. copyright law "not only allows, but encourages" its AI training because it promotes human creativity. The company said its system copied the books to "study Plaintiffs' writing, extract uncopyrightable information from it, and use what it learned to create revolutionary technology." Copyright owners say that AI companies are unlawfully copying their work to generate competing content that threatens their livelihoods. Alsup agreed with Anthropic on Monday that its training was "exceedingly transformative." "Like any reader aspiring to be a writer, Anthropic's LLMs trained upon works not to race ahead and replicate or supplant them -- but to turn a hard corner and create something different," Alsup said. Reporting by Blake Brittain in Washington; Editing by Chizu Nomiyama and Louise Heavens Our Standards: The Thomson Reuters Trust Principles., opens new tab Suggested Topics:LitigationIntellectual Property Blake Brittain Thomson Reuters Blake Brittain reports on intellectual property law, including patents, trademarks, copyrights and trade secrets, for Reuters Legal. He has previously written for Bloomberg Law and Thomson Reuters Practical Law and practiced as an attorney.
[8]
Judge rules Anthropic did not violate authors' copyrights with AI book training
Dario Amodei, Anthropic CEO, speaking on CNBC's Squawk Box outside the World Economic Forum in Davos, Switzerland on Jan. 21st, 2025. Anthropic's use of books to train its artificial intelligence model Claude was "fair use" and "transformative," a federal judge ruled late on Monday. Amazon-backed Anthropic's AI training did not violate the authors' copyrights since the large language models "have not reproduced to the public a given work's creative elements, nor even one author's identifiable expressive style," wrote U.S. District Judge William Alsup. "The purpose and character of using copyrighted works to train LLMs to generate new text was quintessentially transformative," Alsup wrote. "Like any reader aspiring to be a writer." The decision is a significant win for AI companies as legal battles play out over the use and application of copyrighted works in developing and training LLMs. Alsup's ruling begins to establish the legal limits and opportunities for the industry going forward.
[9]
Judge rules Anthropic's AI training on copyrighted materials is fair use
Anthropic has a mixed result in a class action lawsuit brought by a group of authors who claimed the company used their copyrighted creations without permission. On the positive side for the artificial intelligence company, senior district judge William Alsup of the US District Court for the Northern District of California determined that Anthropic's training of its AI tools on copyrighted works was protected as fair use. Developing large language models for artificial intelligence has created a copyright law boondoggle as creators attempt to protect their works and tech companies or to gather more training materials. Alsup's ruling is one of the first that will likely set the foundation for legal precedents around what AI tools can and cannot do. Using copyright materials can be deemed fair use if the output is determined to be "transformative," or not a substitute for the original work. "The technology at issue was among the most transformative many of us will see in our lifetimes," Alsup . Despite the fair use designation, the ruling does still provide some recourse for the writers; they can choose to take Anthropic to court for piracy. "Anthropic downloaded over seven million pirated copies of books, paid nothing, and kept these pirated copies in its library even after deciding it would not use them to train its AI (at all or ever again)," Alsup wrote. "Authors argue Anthropic should have paid for these pirated library copies. This order agrees."
[10]
Judge rules AI company Anthropic didn't break copyright law but must face trial over pirated books
In a test case for the artificial intelligence industry, a federal judge has ruled that AI company Anthropic didn't break the law by training its chatbot Claude on millions of copyrighted books. But the company is still on the hook and could now go to trial over how it acquired those books by downloading them from online "shadow libraries" of pirated copies. U.S. District Judge William Alsup of San Francisco said in a ruling filed late Monday that the AI system's distilling from thousands of written works to be able to produce its own passages of text qualified as "fair use" under U.S. copyright law because it was "quintessentially transformative." "Like any reader aspiring to be a writer, Anthropic's (AI large language models) trained upon works not to race ahead and replicate or supplant them -- but to turn a hard corner and create something different," Alsup wrote. But while dismissing the key copyright infringement claim made by the group of authors who sued the company last year, Alsup also said Anthropic must still go to trial over its alleged theft of their works. "Anthropic had no entitlement to use pirated copies for its central library," Alsup wrote. A trio of writers -- Andrea Bartz, Charles Graeber and Kirk Wallace Johnson -- alleged in their lawsuit last summer that Anthropic committed "large-scale theft" by allegedly training its popular chatbot Claude on pirated copies of copyrighted books, and that the company "seeks to profit from strip-mining the human expression and ingenuity behind each one of those works." As the case proceeded over the past year in San Francisco's federal court, documents disclosed in court showed Anthropic's internal concerns about the legality of their use of online repositories of pirated works. So the company later shifted its approach and attempted to purchase copies of digitized books. "That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for the theft but it may affect the extent of statutory damages," Alsup wrote. The ruling could set a precedent for similar lawsuits that have piled up against Anthropic competitor OpenAI, maker of ChatGPT, as well as against Meta Platforms, the parent company of Facebook and Instagram. Anthropic -- founded by ex-OpenAI leaders in 2021 -- has marketed itself as the more responsible and safety-focused developer of generative AI models that can compose emails, summarize documents and interact with people in a natural way. But the lawsuit filed last year alleged that Anthropic's actions "have made a mockery of its lofty goals" by tapping into repositories of pirated writings to build its AI product.
[11]
Judge Rules AI Companies Can Use Some Copyrighted Works to Fuel Their Sludge
The legal decision sets a precedent for the pilfering of creative works for AI-fuel. This week, a federal judge handed AI companies a major win, potentially setting a legal precedent for the industry to plunder copyrighted materials to train their large language models. Anthropic, the large AI company backed by Amazon, has been in a pitched legal battle with a group of writers and journalists who sued the company last summer and accused it of illegally using their works to train the company's flagship chatbot, Claude. The legality of the AI industry's entire business model has long depended on the question of whether it is kosher to hoover up large amounts of copyrighted data from all over the web and then feed it into an algorithm to produce "original" text. Anthropic has maintained that its use of the writers' work falls under fair use and is therefore legal. This week, the federal judge presiding over the case, William Alsup, partially agreed. In his ruling, Alsup claimed that, by training its LLM without the authors' permission, Anthropic did not infringe on copyrighted materials because the work it produced was, in his eyes, original. He claimed that the company's algorithms have... "...not reproduced to the public a given work’s creative elements, nor even one author’s identifiable expressive style...Yes, Claude has outputted grammar, composition, and style that the underlying LLM distilled from thousands of works. But if someone were to read all the modern-day classics because of their exceptional expression, memorize them, and then emulate a blend of their best writing, would that violate the Copyright Act? Of course not." Alsup's ruling departs quite a bit from the writers' litigation, which accused Anthropic of "strip-mining" human expression and ingenuity for the sake of corporate profits. This ruling is just one judge's opinion, but critics fear it could easily set a precedent for other legal decisions across the country. AI companies have been sued dozens of times by creatives on similar grounds. While Alsup's decision may signal broader victories for the AI industry, it isn't exactly what you would call a win for Anthropic. That's because Alsup also ruled that the specific way in which Anthropic nabbed some of the copyrighted materials for its LLMâ€"by downloading over 7 million pirated booksâ€"could be illegal, and would require a separate trial. “We will have a trial on the pirated copies used to create Anthropic’s central library and the resulting damages,†Alsup wrote. “That Anthropic later bought a copy of a book [that] it earlier stole off the internet will not absolve it of liability for theft, but it may affect the extent of statutory damages.†When reached for comment by Gizmodo, Anthropic provided the following statement: “We are pleased that the Court recognized that using â€~works to train LLMs was transformative â€" spectacularly so.’ Consistent with copyright’s purpose in enabling creativity and fostering scientific progress, â€~Anthropic’s LLMs trained upon works not to race ahead and replicate or supplant them â€" but to turn a hard corner and create something different.’†Alsup has presided over several prominent cases involving large tech companies, including Uber, DoorDash, and Waymo. More recently, Alsup ordered the Trump administration to reinstate thousands of fired probationary workers who were pushed out by Elon Musk's DOGE initiative.
[12]
Judge Rules Training AI on Authors' Books Is Legal But Pirating Them Is Not
A federal judge in California ruled Monday that Anthropic likely violated copyright law when it pirated authors' books to create a giant dataset and "forever" library but that training its AI on those books without authors' permission constitutes transformative fair use under copyright law. The complex decision is one of the first of its kind in a series of high-profile copyright lawsuits brought by authors and artists against AI companies, and it's largely a very bad decision for authors, artists, writers, and web developers. This case, in which authors Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson sued Anthropic, maker of the Claude family of large language models, is one of dozens of high-profile lawsuits brought against AI giants. The authors sued Anthropic because the company scraped full copies of their books for the purposes of training their AI models from a now-notorious dataset called Books3, as well as from the piracy websites LibGen and Pirate Library Mirror (PiLiMi). The suit also claims that Anthropic bought used physical copies of books and scanned them for the purposes of training AI. "From the start, Anthropic 'had many places from which' it could have purchased books, but it preferred to steal them to avoid 'legal/practice/business slog,' as cofounder and chief executive officer Dario Amodei put it. So, in January or February 2021, another Anthropic cofounder, Ben Mann, downloaded Books3, an online library of 196,640 books that he knew had been assembled from unauthorized copies of copyrighted books -- that is, pirated," William Alsup, a federal judge for the Northern District of California, wrote in his decision Monday. "Anthropic's next pirated acquisitions involved downloading distributed, reshared copies of other pirate libraries. In June 2021, Mann downloaded in this way at least five million copies of books from Library Genesis, or LibGen, which he knew had been pirated. And, in July 2022, Anthropic likewise downloaded at least two million copies of books from the Pirate Library Mirror, or PiLiMi, which Anthropic knew had been pirated." Notably, Anthropic also created an internal, "general-purpose library" made up partially of pirated copyrighted works for "various uses for which the company might have of them," in addition to scraping the books for the purposes of training AI. William Alsup, a federal judge for the Northern District of California, wrote in his decision Monday that the creation of this "pirated library ... points against fair use" and must be considered at trial. At a hearing in May, Alsup signaled that he was leaning toward making this type of decision: "I'm inclined to say they did violate the Copyright Act but the subsequent uses were fair use," Alsup said. "The downloaded pirated copies used to build a central library were not justified by a fair use. Every factor points against fair use," Alsup wrote. "Anthropic employees said copies of works (pirated ones, too) would be retained 'forever' for 'general purpose' even after Anthropic determined they would never be used for training LLMs. A separate justification was required for each use. None is even offered here except for Anthropic's pocketbook and convenience.We will have a trial on the pirated copies used to create Anthropic's central library and the resulting damages, actual or statutory (including for willfulness). That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for the theft but it may affect the extent of statutory damages." Still, this is not a good decision for authors, because the judge ruled that actually training AI on the works was not illegal, though it is too early to say exactly what this means in a larger context. At the moment, it suggests that training an AI on legally purchased works is sufficiently transformative, but that pirating those works in the first place is not. This case did not consider what it means for AI training of free-to-access content on the open web, on social media, from libraries, etc. It's largely a win for AI companies, who, when faced with these sorts of lawsuits, have almost universally said that their data scraping and training is legal as a transformative fair use under copyright law, arguing they do not need to ask for permission or provide compensation when they scrape the internet to build AI tools. This lawsuit does not allege that Anthropic or Claude directly recreated parts of the authors' books to its users: "When each LLM was put into a public-facing version of Claude, it was complemented by other software that filtered user inputs to the LLM and filtered outputs from the LLM back to the user," Alsup wrote in his order. "As a result, Authors do not allege that any infringing copy of their works was or would ever be provided to users by the Claude service. Yes, Claude could help less capable writers create works as well-written as Authors' and competing in the same categories. But Claude created no exact copy, nor any substantial knock-off. Nothing traceable to Authors' works. Such allegations are simply not part of plaintiffs' amended complaint, nor in our record." Many other copyright lawsuits against AI companies argue that not only are AI companies training on pirated copyrighted data, but that the AI tools they create then regurgitate large passages of those copyrighted works either verbatim or in a substantially similar style. Researchers found, for example, that Meta's AI has "memorized" huge portions of books and will regurgitate them. This case largely considered whether the actual training itself is a violation of copyright law. "The use of the books at issue to train Claude and its precursors was exceedingly transformative and was a fair use under Section 107 of the Copyright Act," Alsup wrote in his order. "And, the digitization of the books purchased in print form by Anthropic was also a fair use but not for the same reason as applies to the training copies. Instead, it was a fair use because all Anthropic did was replace the print copies it had purchased for its central library with more convenient space-saving and searchable digital copies for its central library -- without adding new copies, creating new works, or redistributing existing copies." In February, Thompson Reuters won a case against a competitor in which it claimed its competitor illegally scraped its works to train AI. There are currently dozens of similar lawsuits winding their way through the legal system right now, so it's likely to take a few more decisions before we get a full picture of what courts think about the legality of mass, unauthorized AI data training.
[13]
Federal Judge Gives AI Companies a Landmark 'Fair Use' Victory
American artificial intelligence (AI) company Anthropic, which develops large language models competing with platforms like OpenAI's ChatGPT and Google's Gemini, has won a key ruling in a United States federal court. A federal judge ruled this week that AI developers can train AI models on copyrighted content without obtaining permission from the content creators. As The Verge reports, U.S. Federal Judge William Alsup of the Northern District of California ruled that Anthropic has the legal right to train AI models using copyrighted work. Judge Alsup says that this use falls under fair use. In the lawsuit, Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson v. Anthropic PBC, the three plaintiffs, all authors, allege that Anthropic had no right to use their protected works to train its family of Claude AI models. Judge Alsup disagrees, ruling that Anthropic's use of the plaintiffs' works, which included buying physical books, stripping them bare, and scanning the text into its training workflow, falls under fair use. Fair use has long been a crucial defense for AI companies, which require huge libraries of human-created work to sufficiently train their various AI models. Understandably, artists have resisted, and copyright infringement lawsuits have popped up left and right. Alsup's ruling is multi-faceted, however. While the federal judge has sided with Anthropic on the matter of using legally acquired, copyrighted materials to train AI models, the judge takes significant issue with some of Anthropic's other behavior, including storing more than seven million pirated books in a central library. This is not protected under the fair use doctrine, and the judge has set a second trial later this year to determine the damages Anthropic may owe for this infringement. As Reuters reports, "U.S. copyright law says that willful copyright infringement can justify statutory damages of up to $150,000 per work." However, potentially even more influential than Judge Alsup's ruling that training AI on copyrighted material can be protected under the doctrine of fair use is his additional decision that building AI models using copyrighted work can be considered sufficiently transformative to avoid violating copyright. "To summarize the analysis that now follows, the use of the books at issue to train Claude and its precursors was exceedingly transformative and was a fair use under Section 107 of the Copyright Act," Judge Alsup writes. "Anthropic's LLMs have not reproduced to the public a given work's creative elements, nor even one author's identifiable expressive style (assuming arguendo that these are even copyrightable). Yes, Claude has outputted grammar, composition, and style that the underlying LLM distilled from thousands of works," Alsup continues elsewhere in his ruling. "But if someone were to read all the modern-day classics because of their exceptional expression, memorize them, and then emulate a blend of their best writing, would that violate the Copyright Act? Of course not." Alsup calls using legally-acquired copyrighted works to train LLMs as "quintessentially transformative," claiming that Anthropic is using existing works "not to race ahead and replicate or supplant" the creators, but to "turn a hard corner and create something different." In their lawsuit, the plaintiffs alleged that, in general, training LLMs would "result in an explosion of works competing with their works," as Alsup characterizes it. The judge strongly disagrees with this complaint. "But Authors' complaint is no different than it would be if they complained that training schoolchildren to write well would result in an explosion of competing works," Alsup writes. "This is not the kind of competitive or creative displacement that concerns the Copyright Act. The Act seeks to advance original works of authorship, not to protect authors against competition." As a result of Alsup's ruling, AI companies now have a proven avenue through which they can defend their training work on the grounds of fair use. The ruling also asserts that some training applications are sufficiently transformative to be legally protected. There is little doubt that this new ruling could prove to be a landmark case that influences how other judges handle copyright claims levied against AI companies. That said, Anthropic will still need to answer for its alleged piracy.
[14]
AI training is 'fair use' federal judge rules in Anthropic copyright case
A federal judge in San Francisco has ruled that training an AI model on copyrighted works without specific permission to do so was not a violation of copyright law. U.S. District Judge William Alsup said that AI company Anthropic could assert a "fair use" defense against copyright claims for training its Claude AI models on copyrighted books. But the judge also ruled that it mattered exactly how those books were obtained. Alsup supported Anthropic's claim that it was "fair use" for it to purchase millions of books and then digitize them for use in AI training. The judge said it was not ok, however, for Anthropic to have also downloaded millions of pirated copies of books from the internet and then maintained a digital library of those pirated copies. The judge ordered a separate trial on Anthropic's storage of those pirated books, which could determine the company's liability and any damages related to that potential infringement. The judge has also not yet ruled whether to grant the case class action status, which could dramatically increase the financial risks to Anthropic if it is found to have infringed on authors' rights. In finding that it was "fair use" for Anthropic to train its AI models on books written by three authors -- Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson -- who had filed a lawsuit against the AI company for copyright violations, Alsup addressed a question that has simmered since before OpenAI's ChatGPT kick-started the generative AI boom in 2022: Can copyrighted data be used to train generative AI models without the owner's consent? Dozens of AI and copyright-related lawsuits have been filed over the past three years, most of which hinge on the concept of fair use, a doctrine that allows the use of copyrighted material without permission if the use is sufficiently transformative -- meaning it must serve a new purpose or add new meaning, rather than simply copying or substituting the original work. Alsup's ruling may set a precedent for these other copyright cases -- although it is also likely that many of these rulings will be appealed, meaning it will take years until there is clarity around AI and copyright in the U.S. According to the judge's ruling, Anthropic's use of the books to train Claude was "exceedingly transformative" and constituted "fair use under Section 107 of the Copyright Act." Anthropic told the court that its AI training was not only permissible, but aligned with the spirit of U.S. copyright law, which it argued "not only allows, but encourages" such use because it promotes human creativity. The company said it copied the books to "study Plaintiffs' writing, extract uncopyrightable information from it, and use what it learned to create revolutionary technology." While training AI models with copyrighted data may be considered fair use, Anthropic's separate action of building and storing a searchable repository of pirated books is not, Alsop ruled. Alsup noted that the fact that Anthropic later bought a copy of a book it earlier stole off the internet "will not absolve it of liability for the theft but it may affect the extent of statutory damages." The judge also looked askance at Anthropic's acknowledgement that it had turned to downloading pirated books in order to save time and money in building its AI models. "This order doubts that any accused infringer could ever meet its burden of explaining why downloading source copies from pirate sites that it could have purchased or otherwise accessed lawfully was itself reasonably necessary to any subsequent fair use," Alsup said. The "transformative" nature of AI outputs is important, but it's not the only thing that matters when it comes to fair use. There are three other factors to consider: what kind of work it is (creative works get more protection than factual ones), how much of the work is used (the less, the better), and whether the new use hurts the market for the original. For example, there is the ongoing case against Meta and OpenAI by comedian Sarah Silverman and two other authors, who filed copyright infringement lawsuits in 2023 alleging that pirated versions of their works were used without permission to train AI language models. The defendants recently argued that the use falls under fair use doctrine because AI systems "study" works to "learn" and create new, transformative content. Federal district judge Vince Chhabria pointed out that even if this is true, the AI systems are "dramatically changing, you might even say obliterating, the market for that person's work." But he also took issue with the plaintiffs, saying that their lawyers had not provided enough evidence of potential market impacts. Alsup's decision differed markedly from Chhabria's on this point. Alsup said that while it was undoubtedly true that Claude could lead to increase competition for the authors' works, this kind of "competitive or creative displacement is not the kind of competitive or creative displacement that concerns the Copyright Act" Copyright's purpose was to encourage the creation of new works, not to shield authors from competition, Alsup said, and he likened the authors' objections to Claude to the fear that teaching school children to write well might also result in an explosion of competing books. Alsup also took note in his ruling that Anthropic had built "guardrails" into Claude that were meant to prevent it from producing outputs that directly plagiarized the books on which it had been trained. Neither Anthropic nor the plaintiffs' lawyers immediately responded to requests to comment on the Alsup's decision.
[15]
Courts say AI training on copyrighted material is legal
A ruling in a U.S. District Court has effectively given permission to train artificial intelligence models using copyrighted works, in a decision that's extremely problematic for creative industries. Content creators and artists have been suffering for years, with AI companies scraping their sited and scanning books to train large language models (LLMs) without permission. That data is then used for generative AI and other machine learning tasks, and then monetized by the scraping company with no compensation for the original host or author. Following a ruling by a U.S. District Court for the Northern District of California issued on Tuesday, companies are being given free rein to train with just about any published media that they want to harvest. The ruling is based on a lawsuit from Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson against Anthropic dating back to 2024. The suit accused the company of using pirated material to train its Claude AI models. This included Anthropic creating digital copies of printed books for AI model training. The ruling from Judge William Alsup -- a judge very familiar to readers of AppleInsider -- rules in favor of each side in various ways. However, the weight of the ruling certainly sides with Anthropic and AI scrapers in this instance. Under the ruling, Judge Alsup says that copies used to train specific LLMs was justifiable as fair use. "The technology at issue was among the most transformative many of us will see in our lifetimes," Alsup commented. For physical copies that were converted from a print library to a digital library, this was also deemed fair use. Furthermore, using that content to train LLMs was also fair use. Alsup compared the author's complaint to if the same argument was used against an effort to train schoolchildren how to write well. It's not clear how that applies, given that artificial intelligence models are not considered "schoolchildren" in any legal sense. In that argument, Alsup ruled that the Copyright Act is intended to advance original works of authorship, not to "protect authors against competition." Where the authors saw a small amount of success was in the usage of pirated works. Creating a library of pirated digital books, even if they are not used for the training of a model, does not constitute fair use. That also remains the case if Anthropic later bought a copy of a pirated book after pirating it in the first place. On the matter of the piracy argument, the court will be holding a trial to determine damages against Anthropic. The ruling is terrible for artists, musicians, and writers. Other professions where machine learning models could be a danger to their livelihoods will have issues too -- like judges who once said that they took a coding class once, and therefore knew what they were talking about with tech. AI models take advantage of the hard work and life experiences of media creators, and pass it off as its own. At the same time, it leaves content producers with few options to take to combat the phenomenon. As it stands, the ruling will clearly be precedent in other lawsuits in the AI space, especially when dealing with the producers of original works that are pillaged for training purposes. Over the years, AI companies were attacked for grabbing any data they could to feed the LLMs, even content scraped from the Internet without permission. This is a problem that manifests in quite a few ways. The most obvious is in generative AI, as the models could be trained to create images in specific styles, which devalues the work of actual artists. A example of a fightback is a lawsuit from Disney and Universal against Midjourney, which surfaced in early June. The company behind the AI image generator is accused of mass copyright infringement, for training the models on image of the most recognizable characters from the studio. The studios unite in calling Midjourney "a bottomless pit of plagarism," built on the unauthorized use of protected material. When you have two major media companies that are usually bitter rivals uniting for a single cause, you know it's a serious issue. It's also a growing issue for websites and publishers, like AppleInsider. Instead of using a search tool and viewing websites for information, a user can simply ask for a customized summary from an AI model, without needing to visit the site that it has sourced the information from in the first place. And, that information is often wrong, combined with data from other sources, polluting the original meaning of the content. For instance, we've seen our tips on how to do something plagiarized with sections reproduced verbatim, and mashed up out of order with that from other sites, making a procedure that doesn't work. The question of how to deal with compensating the lost revenues of publishers is still one that has not yet been answered in a meaningful way. There are some companies that have been trying to stay on the more ethical side of things, with Apple among them. Apple has offered news publishers millions to license content, for training its generative AI. It has also paid for licenses from Shutterstock, which helped develop its visual engines used for Apple Intelligence features. Major publishers have also taken to blocking AI services from accessing their archives, doing so via robots.txt. However, this only stops ethical scrapers, not everyone. And, scraping an entire site takes server power and bandwidth -- which is not free for the hosting site that's getting scraped. The ruling also follows after an increase in efforts from major tech companies to lobby for a block on U.S. states introducing AI regulation for a decade. Meanwhile in the EU, there have been attempts to sign tech companies up to an AI Pact, to develop AI in safe ways. Apple is apparently not involved in either effort.
[16]
US judge backs using copyrighted books to train AI
A US federal judge has sided with Anthropic regarding training its artificial intelligence models on copyrighted books without authors' permission, a decision with the potential to set a major legal precedent in AI deployment. District Court Judge William Alsup ruled on Monday that the company's training of its Claude AI models with books bought or pirated was allowed under the "fair use" doctrine in the US Copyright Act. "Use of the books at issue to train Claude and its precursors was exceedingly transformative and was a fair use," Alsup wrote in his decision. "The technology at issue was among the most transformative many of us will see in our lifetimes," Alsup added in his 32-page decision, comparing AI training to how humans learn by reading books. Tremendous amounts of data are needed to train large language models powering generative AI. Musicians, book authors, visual artists and news publications have sued various AI companies that used their data without permission or payment. AI companies generally defend their practices by claiming fair use, arguing that training AI on large datasets fundamentally transforms the original content and is necessary for innovation. "We are pleased that the court recognized that using 'works to train LLMs was transformative,'" an Anthropic spokesperson said in response to an AFP query. The judge's decision is "consistent with copyright's purpose in enabling creativity and fostering scientific progress," the spokesperson added. Blanket protection rejected The ruling stems from a class-action lawsuit filed by authors Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson, who accused Anthropic of illegally copying their books to train Claude, the company's AI chatbot that rivals ChatGPT. However, Alsup rejected Anthropic's bid for blanket protection, ruling that the company's practice of downloading millions of pirated books to build a permanent digital library was not justified by fair use protections. Along with downloading of books from websites offering pirated works, Anthropic bought copyrighted books, scanned the pages and stored them in digital format, according to court documents. Anthropic's aim was to amass a library of "all the books in the world" for training AI models on content as deemed fit, the judge said in his ruling. While training AI models on the pirated content posed no legal violation, downloading pirated copies to build a general-purpose library constituted copyright infringement, regardless of eventual training use. The case will now proceed to trial on damages related to the pirated library copies, with potential penalties including financial damages. Anthropic said it disagreed with going to trial on this part of the decision and was evaluating its legal options. Valued at $61.5 billion and heavily backed by Amazon, Anthropic was founded in 2021 by former OpenAI executives. The company, known for its Claude chatbot and AI models, positions itself as focused on AI safety and responsible development.
[17]
Anthropic's landmark copyright ruling is a victory for the AI industry -- but the company is still on the hook for piracy claims
In a test case for the artificial intelligence industry, a federal judge has ruled that AI company Anthropic didn't break the law by training its chatbot Claude on millions of copyrighted books. But the company is still on the hook and must now go to trial over how it acquired those books by downloading them from online "shadow libraries" of pirated copies. U.S. District Judge William Alsup of San Francisco said in a ruling filed late Monday that the AI system's distilling from thousands of written works to be able to produce its own passages of text qualified as "fair use" under U.S. copyright law because it was "quintessentially transformative." "Like any reader aspiring to be a writer, Anthropic's (AI large language models) trained upon works not to race ahead and replicate or supplant them -- but to turn a hard corner and create something different," Alsup wrote. But while dismissing a key claim made by the group of authors who sued the company for copyright infringement last year, Alsup also said Anthropic must still go to trial in December over its alleged theft of their works. "Anthropic had no entitlement to use pirated copies for its central library," Alsup wrote. A trio of writers -- Andrea Bartz, Charles Graeber and Kirk Wallace Johnson -- alleged in their lawsuit last summer that Anthropic's practices amounted to "large-scale theft," and that the company "seeks to profit from strip-mining the human expression and ingenuity behind each one of those works." As the case proceeded over the past year in San Francisco's federal court, documents disclosed in court showed Anthropic's internal concerns about the legality of their use of online repositories of pirated works. So the company later shifted its approach and attempted to purchase copies of digitized books. "That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for the theft but it may affect the extent of statutory damages," Alsup wrote. The ruling could set a precedent for similar lawsuits that have piled up against Anthropic competitor OpenAI, maker of ChatGPT, as well as against Meta Platforms, the parent company of Facebook and Instagram. Anthropic -- founded by ex-OpenAI leaders in 2021 -- has marketed itself as the more responsible and safety-focused developer of generative AI models that can compose emails, summarize documents and interact with people in a natural way. But the lawsuit filed last year alleged that Anthropic's actions "have made a mockery of its lofty goals" by tapping into repositories of pirated writings to build its AI product. Anthropic said Tuesday it was pleased that the judge recognized that AI training was transformative and consistent with "copyright's purpose in enabling creativity and fostering scientific progress." Its statement didn't address the piracy claims.
[18]
Anthropic wins key AI copyright case, but remains on the hook for using pirated books
Alain Sherter is a senior managing editor with CBS News. He covers business, economics, money and workplace issues for CBS MoneyWatch. Anthropic has won a major legal victory in a case over whether the artificial intelligence company was justified in hoovering up millions of copyrighted books to train its chatbot. In a ruling that could set an important precedent for similar disputes, Judge William Alsup of the United States District Court for the Northern District of California on Tuesday said Anthropic's use of legally purchased books to train its AI model, Claude, did not violate U.S. copyright law. Anthropic, which was founded by former executives with ChatGPT developer OpenAI, introduced Claude in 2023. Like other generative AI bots, the tool lets users ask natural language questions and then provides neatly summarized answers using AI trained on millions of books, articles and other material. Alsup ruled that Anthropic's use of copyrighted books to train its language learning model, or LLM, was "quintessentially transformative" and did not violate "fair use" doctrine under copyright law. "Like any reader aspiring to be a writer, Anthropic's LLMs trained upon works not to race ahead and replicate or supplant them, but to turn a hard corner and create something different," his decision states. By contrast, Alsup also found that Anthropic may have broken the law when it separately downloaded millions of pirated books and said it will face a separate trial in December over this issue. Court documents revealed that Anthropic employees expressed concern concern about the legality of using pirate sites to access books. The company later shifted its approach and hired a former Google executive in charge of Google Books, a searchable library of digitized books that successfully weathered years of copyright battles. Anthropic cheered the ruling. "We are pleased that the Court recognized that using 'works to train LLMs (language learning models) was transformative -- spectacularly so," an Anthropic spokesperson told CBS News in an email. The ruling stems from a case filed last year by three authors in federal court. After Anthropic used copies of their books to train Claude, Andrea Bartz, Charles Graeber and Kirk Wallace Johnson sued Anthropic for alleged copyright infringement, claiming the company's practices amounted to "large-scale theft." The authors also alleged that Anthropic "seeks to profit from strip-mining the human expression and ingenuity behind each one of those works." The authors' attorneys declined comment. Other AI companies have also come under fire over the material they use to build their language learning models. The New York Times, for example, sued Open AI and Microsoft in 2023, claiming that the tech companies used millions of its articles to train their automated chatbots. At the same time, some media companies and publishers are also seeking compensation by licensing their content to companies like Anthropic and OpenAI.
[19]
US judge backs using copyrighted books to train AI
San Francisco (United States) (AFP) - A US federal judge has sided with Anthropic regarding training its artificial intelligence models on copyrighted books without authors' permission, a decision with the potential to set a major legal precedent in AI deployment. District Court Judge William Alsup ruled on Monday that the company's training of its Claude AI models with books bought or pirated was allowed under the "fair use" doctrine in the US Copyright Act. "Use of the books at issue to train Claude and its precursors was exceedingly transformative and was a fair use," Alsup wrote in his decision. "The technology at issue was among the most transformative many of us will see in our lifetimes," Alsup added in his 32-page decision, comparing AI training to how humans learn by reading books. Tremendous amounts of data are needed to train large language models powering generative AI. Musicians, book authors, visual artists and news publications have sued various AI companies that used their data without permission or payment. AI companies generally defend their practices by claiming fair use, arguing that training AI on large datasets fundamentally transforms the original content and is necessary for innovation. "We are pleased that the court recognized that using 'works to train LLMs was transformative,'" an Anthropic spokesperson said in response to an AFP query. The judge's decision is "consistent with copyright's purpose in enabling creativity and fostering scientific progress," the spokesperson added. Blanket protection rejected The ruling stems from a class-action lawsuit filed by authors Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson, who accused Anthropic of illegally copying their books to train Claude, the company's AI chatbot that rivals ChatGPT. However, Alsup rejected Anthropic's bid for blanket protection, ruling that the company's practice of downloading millions of pirated books to build a permanent digital library was not justified by fair use protections. Along with downloading of books from websites offering pirated works, Anthropic bought copyrighted books, scanned the pages and stored them in digital format, according to court documents. Anthropic's aim was to amass a library of "all the books in the world" for training AI models on content as deemed fit, the judge said in his ruling. While training AI models on the pirated content posed no legal violation, downloading pirated copies to build a general-purpose library constituted copyright infringement, regardless of eventual training use. The case will now proceed to trial on damages related to the pirated library copies, with potential penalties including financial damages. Anthropic said it disagreed with going to trial on this part of the decision and was evaluating its legal options. Valued at $61.5 billion and heavily backed by Amazon, Anthropic was founded in 2021 by former OpenAI executives. The company, known for its Claude chatbot and AI models, positions itself as focused on AI safety and responsible development.
[20]
Federal judge rules copyrighted books are fair use for AI training
Anthropic CEO Dario Amodei, right, and Chief Product Officer Mike Krieger talk after unveiling Claude 4 during the Code with Claude conference on May 22 in San Francisco.Don Feria / AP Content Services for Anthropic A federal judge has sided with Anthropic in a major copyright ruling, declaring that artificial intelligence developers can train on published books without authors' consent. The decision, filed Monday in the U.S. District Court for the Northern District of California, sets a precedent that training AI systems on copyrighted works constitutes fair use. Though it doesn't guarantee other courts will follow, Judge William Alsup's ruling marks the first of dozens of ongoing copyright lawsuits to give an answer on fair use in the context of generative AI. It's a question that's been raised by creatives across various industries for years since generative AI tools exploded into the mainstream, allowing users to easily produce art from models trained on copyrighted work -- often without the human creator's knowledge or permission. AI companies have been hit with a slew of copyright lawsuits from media companies, music labels and authors since 2023. Artists have signed multiple open letters urging government officials and AI developers to constrain the unauthorized use of copyrighted works. In recent years, companies have also increasingly inked licensing deals with AI developers to dictate terms of use for their artists' works. Alsup on Monday ruled on a lawsuit filed by three authors -- Andrea Bartz, Charles Graeber and Kirk Wallace Johnson -- last August, who claimed that Anthropic ignored copyright protections when it pirated millions of books and digitized purchased books to feed into its large language models, which helped train them to generate human-like text responses. "The copies used to train specific LLMs were justified as a fair use," Alsup wrote in the ruling. "Every factor but the nature of the copyrighted work favors this result. The technology at issue was among the most transformative many of us will see in our lifetimes." His decision stated that Anthropic's use of the books to train its models, including versions of its flagship AI model Claude, was "exceedingly transformative" enough to fall under fair use. Fair use, as defined by the Copyright Act, takes into account four factors: the purpose of the use, what kind of copyrighted work is used (creative works get stronger protection than factual works), how much of the work was used, and whether the use hurts the market value of the original work. "We are pleased that the Court recognized that using 'works to train LLMs was transformative -- spectacularly so,'" Anthropic said in a statement, quoting the ruling. "Consistent with copyright's purpose in enabling creativity and fostering scientific progress, 'Anthropic's LLMs trained upon works not to race ahead and replicate or supplant them -- but to turn a hard corner and create something different.'" Bartz and Johnson did not immediately respond to requests for comment. Graeber declined to comment. Alsup noted, however, that all of the authors' works contained "expressive elements" earning them stronger copyright protection, which is a factor that points against fair use, although not enough to sway the overall ruling. He also added that while making digital copies of purchased books was fair use, downloading pirated copies for free did not constitute fair use. But aside from the millions of pirated copies, Alsup wrote, copying entire works to train AI models was "especially reasonable" because the models didn't reproduce those copies for public access, and doing so "did not and will not displace demand" for the original books. His ruling stated that although AI developers can legally train AI models on copyrighted works without permission, they should obtain those works through legitimate means that don't involve pirating or other forms of theft. Despite siding with the AI company on fair use, Alsup wrote that Anthropic will still face trial for the pirated copies it used to create its massive central library of books used to train AI. "That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for the theft," Alsup wrote, "but it may affect the extent of statutory damages."
[21]
Anthropic's AI copyright 'win' is more complicated than it looks
Judge William Alsup of the U.S. District Court for the Northern District of California ruled that Anthropic's use of copyrighted material for training was fair use. His decision carries weight. "Authors cannot rightly exclude anyone from using their works for training or learning as such," Alsup wrote. "Everyone reads texts, too, then writes new texts. They may need to pay for getting their hands on a text in the first instance. But to make anyone pay specifically for the use of a book each time they read it, each time they recall it from memory, each time they later draw upon it when writing new things in new ways would be unthinkable." Alsup called training Claude "exceedingly transformative," comparing the model to "any reader aspiring to be a writer." That language helps explain why tech lobbyists were quick to call it a major win. Experts agreed. "It's a pretty big win actually for the future of AI training," says Andres Guadamuz, an intellectual property expert at the University of Sussex who has closely followed AI copyright cases. But he adds: "It could be bad for Anthropic specifically depending on authors winning the piracy issue, but that's still very far away."
[22]
Judge sides with Anthropic in landmark AI copyright case, but orders it to go on trial over piracy claims - SiliconANGLE
Judge sides with Anthropic in landmark AI copyright case, but orders it to go on trial over piracy claims Anthropic PBC scored a major victory for itself and the broader artificial intelligence industry today when a federal judge ruled that it hasn't broken the law by training its chatbot Claude on hundreds of legally-purchased books that were later digitized without the authors' permission. However, the company is still on the hook for millions of pirated copies of books that it downloaded from the internet and used to train its models. U.S. District Judge William Alsup of the Northern District of California said in a ruling today that the way Anthropic's models distill information from thousands of written works and produce their own unique text meets the definition of "fair use" under U.S. copyright law. He justified this because the model's outputs are essentially new. "Like any reader aspiring to be a writer, Anthropic's models trained upon works not to race ahead and replicate or supplant them - but to turn a hard corner and create something different," Alsup wrote in his judgement. But although the judge dismissed one of the claims made in a class action lawsuit by a trio of authors last year, he ordered that Anthropic must stand trial in December for allegedly stealing thousands of copyrighted works. "Anthropic had no entitlement to use pirated copies for its central library," Alsup said. The lawsuit, filed last year by authors Andrea Bartz, Charles Graeber and Kirk Wallace Johnson last summer, alleges that the company's AI model training practices amount to "large-scale theft" of thousands of copyrighted books. It also alleged that the company sought to profit by "strip-mining the human expression and ingenuity behind each of those works." During the case, it was revealed in documents disclosed by Anthropic that a number of its researchers raised concerns over the legality of using online libraries of pirated books. That prompted the company to change its approach and purchase copies of hundreds of digitized works. But the judge said that although the company later purchased many copies of books legally, that doesn't absolve it of the liability for any earlier thefts. However, it "may affect the extent of statutory damages," Alsop added. Today's ruling could set a precedent for dozens of similar lawsuits that have been filed against Anthropic's competitors in the AI industry, including the ChatGPT creator OpenAI, as well as Meta Platforms Inc. and the AI search engine Perplexity AI Inc. Claims of copyright infringement have been piling up against AI companies, with dozens of cases filed by authors, media companies and music labels since 2023, when generative AI burst into the public consciousness. Creators have also signed multiple open letters calling on governments to rein in AI developers and prevent them from using copyrighted works for training their models. The furore has had a limited impact, with some AI companies responding by signing legal agreements with publishers that allow them to access their copyrighted materials. Anthropic, which was founded in 2021 by a number of ex-OpenAI employees, has positioned itself as being more responsible and safety-focused, but the lawsuit filed last year charges that its actions "made a mockery of its lofty goals" due to its practise of training its models on pirated works. In response to today's ruling, Anthropic did not address the piracy claims, but said it was pleased that the judge had recognized AI training is "transformative and consistent with copyright's purpose in enabling creativity and fostering scientific progress."
[23]
Anthropic wins ruling on AI training in copyright lawsuit but must face trial on pirated books
In a test case for the artificial intelligence industry, a federal judge has ruled that AI company Anthropic didn't break the law by training its chatbot Claude on millions of copyrighted books. But the company is still on the hook and must now go to trial over how it acquired those books by downloading them from online "shadow libraries" of pirated copies. U.S. District Judge William Alsup of San Francisco said in a ruling filed late Monday that the AI system's distilling from thousands of written works to be able to produce its own passages of text qualified as "fair use" under U.S. copyright law because it was "quintessentially transformative." "Like any reader aspiring to be a writer, Anthropic's (AI large language models) trained upon works not to race ahead and replicate or supplant them -- but to turn a hard corner and create something different," Alsup wrote. But while dismissing a key claim made by the group of authors who sued the company for copyright infringement last year, Alsup also said Anthropic must still go to trial in December over its alleged theft of their works. "Anthropic had no entitlement to use pirated copies for its central library," Alsup wrote. A trio of writers -- Andrea Bartz, Charles Graeber and Kirk Wallace Johnson -- alleged in their lawsuit last summer that Anthropic's practices amounted to "large-scale theft," and that the company "seeks to profit from strip-mining the human expression and ingenuity behind each one of those works." As the case proceeded over the past year in San Francisco's federal court, documents disclosed in court showed Anthropic's internal concerns about the legality of their use of online repositories of pirated works. So the company later shifted its approach and attempted to purchase copies of digitized books. "That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for the theft but it may affect the extent of statutory damages," Alsup wrote. The ruling could set a precedent for similar lawsuits that have piled up against Anthropic competitor OpenAI, maker of ChatGPT, as well as against Meta Platforms, the parent company of Facebook and Instagram. Anthropic -- founded by ex-OpenAI leaders in 2021 -- has marketed itself as the more responsible and safety-focused developer of generative AI models that can compose emails, summarize documents and interact with people in a natural way. But the lawsuit filed last year alleged that Anthropic's actions "have made a mockery of its lofty goals" by tapping into repositories of pirated writings to build its AI product. Anthropic said Tuesday it was pleased that the judge recognized that AI training was transformative and consistent with "copyright's purpose in enabling creativity and fostering scientific progress." Its statement didn't address the piracy claims.
[24]
Anthropic Wins Key Ruling on AI in Authors' Copyright Lawsuit
A federal judge in San Francisco ruled late on Monday that Anthropic's use of books without permission to train its artificial intelligence system was legal under U.S. copyright law. Siding with tech companies on a pivotal question for the AI industry, U.S. District Judge William Alsup said Anthropic made "fair use" of books by writers Andrea Bartz, Charles Graeber and Kirk Wallace Johnson to train its Claude large language model. Alsup also said, however, that Anthropic's copying and storage of more than 7 million pirated books in a "central library" infringed the authors' copyrights and was not fair use. The judge has ordered a trial in December to determine how much Anthropic owes for the infringement. U.S. copyright law says that willful copyright infringement can justify statutory damages of up to $150,000 per work.
[25]
Anthropic wins ruling on AI training in copyright lawsuit but must face trial on pirated books
In a test case for the artificial intelligence industry, a federal judge has ruled that AI company Anthropic didn't break the law by training its chatbot Claude on millions of copyrighted books. But the company is still on the hook and must now go to trial over how it acquired those books by downloading them from online "shadow libraries" of pirated copies. U.S. District Judge William Alsup of San Francisco said in a ruling filed late Monday that the AI system's distilling from thousands of written works to be able to produce its own passages of text qualified as "fair use" under U.S. copyright law because it was "quintessentially transformative." "Like any reader aspiring to be a writer, Anthropic's (AI large language models) trained upon works not to race ahead and replicate or supplant them -- but to turn a hard corner and create something different," Alsup wrote. But while dismissing a key claim made by the group of authors who sued the company for copyright infringement last year, Alsup also said Anthropic must still go to trial in December over its alleged theft of their works. "Anthropic had no entitlement to use pirated copies for its central library," Alsup wrote. A trio of writers -- Andrea Bartz, Charles Graeber and Kirk Wallace Johnson -- alleged in their lawsuit last summer that Anthropic's practices amounted to "large-scale theft," and that the company "seeks to profit from strip-mining the human expression and ingenuity behind each one of those works." As the case proceeded over the past year in San Francisco's federal court, documents disclosed in court showed Anthropic's internal concerns about the legality of their use of online repositories of pirated works. So the company later shifted its approach and attempted to purchase copies of digitized books. "That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for the theft but it may affect the extent of statutory damages," Alsup wrote. The ruling could set a precedent for similar lawsuits that have piled up against Anthropic competitor OpenAI, maker of ChatGPT, as well as against Meta Platforms, the parent company of Facebook and Instagram. Anthropic -- founded by ex-OpenAI leaders in 2021 -- has marketed itself as the more responsible and safety-focused developer of generative AI models that can compose emails, summarize documents and interact with people in a natural way. But the lawsuit filed last year alleged that Anthropic's actions "have made a mockery of its lofty goals" by tapping into repositories of pirated writings to build its AI product. Anthropic said Tuesday it was pleased that the judge recognized that AI training was transformative and consistent with "copyright's purpose in enabling creativity and fostering scientific progress." Its statement didn't address the piracy claims.
[26]
Federal Judge Rules It's Legal to Train AI on Copyrighted Books, Marking Major Win for AI Companies
A federal judge ruled for the first time that it was legal for $61.5 billion AI startup, Anthropic, to train its AI model on copyrighted books without compensating or crediting the authors. U.S. District Judge William Alsup of San Francisco stated in a ruling filed on Monday that Anthropic's use of copyrighted, published books to train its AI model was "fair use" under U.S. copyright law because it was "exceedingly transformative." Alsup compared the situation to a human reader learning how to be a writer by reading books, for the purpose of creating a new work. "Like any reader aspiring to be a writer, Anthropic's [AI] trained upon works not to race ahead and replicate or supplant them -- but to turn a hard corner and create something different," Alsup wrote. According to the ruling, although Anthropic's use of copyrighted books as training material for Claude was fair use, the court will hold a trial on pirated books used to create Anthropic's central library and determine the resulting damages. Related: 'Extraordinarily Expensive': Getty Images Is Pouring Millions of Dollars Into One AI Lawsuit, CEO Says The ruling, the first time that a federal judge has sided with tech companies over creatives in an AI copyright lawsuit, creates a precedent for courts to favor AI companies over individuals in AI copyright disputes. These copyright lawsuits rely on how a judge interprets the fair use doctrine, a concept in copyright law that permits the use of copyrighted material without obtaining permission from the copyright holder. Fair use rulings depend on how different the end work is from the original, what the end work is being used for, and if it is being replicated for commercial gain. The plaintiffs in the class action case, Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson, are all authors who allege that Anthropic used their work to train its chatbot without their permission. They filed the initial complaint, Bartz v. Anthropic, in August 2024, alleging that Anthropic had violated copyright law by pirating books and replicating them to train its AI chatbot. The ruling details that Anthropic downloaded millions of copyrighted books for free from pirate sites. The startup also bought print copies of copyrighted books, some of which it already had in its pirated library. Employees tore off the bindings of these books, cut down the pages, scanned them, and stored them in digital files to add to a central digital library. From this central library, Anthropic selected different groupings of digitized books to train its AI chatbot, Claude, the company's primary revenue driver. Related: 'Bottomless Pit of Plagiarism': Disney, Universal File the First Major Hollywood Lawsuit Against an AI Startup The judge ruled that because Claude's output was "transformative," Anthropic was permitted to use the copyrighted works under the fair use doctrine. However, Anthropic still has to go to trial over the books it pirated. "Anthropic had no entitlement to use pirated copies for its central library," the ruling reads. Claude has proven to be lucrative. According to the ruling, Anthropic made over one billion dollars in annual revenue last year from corporate clients and individuals paying a subscription fee to use the AI chatbot. Paid subscriptions for Claude range from $20 per month to $100 per month. Anthropic faces another lawsuit from Reddit. In a complaint filed earlier this month in Northern California court, Reddit claimed that Anthropic used its site for AI training material without permission.
[27]
Anthropic wins key ruling on AI in authors' copyright lawsuit
A federal judge in San Francisco ruled late Monday that Anthropic's use of books without permission to train its artificial intelligence system was legal under US copyright law. Siding with tech companies on a pivotal question for the AI industry, US District Judge William Alsup said Anthropic made "fair use" of books by writers Andrea Bartz, Charles Graeber and Kirk Wallace Johnson to train its Claude large language model. Alsup also said, however, that Anthropic's storage of the authors' books in a "central library" violated their copyrights and was not fair use. Spokespeople for Anthropic and attorneys for the authors did not immediately respond to requests for comment on the ruling on Tuesday. The writers sued Anthropic last year, arguing that the company, which is backed by Amazon and Alphabet, used pirated versions of their books without permission or compensation to teach Claude to respond to human prompts. The class action lawsuit is one of several brought by authors, news outlets and other copyright owners against companies including OpenAI, Microsoft and Meta Platforms over their AI training. The doctrine of fair use allows the use of copyrighted works without the copyright owner's permission in some circumstances. Fair use is a key legal defense for the tech companies, and Alsup's decision is the first to address it in the context of generative AI. AI companies argue their systems make fair use of copyrighted material to create new, transformative content, and that being forced to pay copyright holders for their work could hamstring the burgeoning AI industry. Anthropic told the court that it made fair use of the books and that US copyright law "not only allows, but encourages" its AI training because it promotes human creativity. The company said its system copied the books to "study Plaintiffs' writing, extract uncopyrightable information from it, and use what it learned to create revolutionary technology." Copyright owners say that AI companies are unlawfully copying their work to generate competing content that threatens their livelihoods. Alsup agreed with Anthropic on Monday that its training was "exceedingly transformative." "Like any reader aspiring to be a writer, Anthropic's LLMs trained upon works not to race ahead and replicate or supplant them - but to turn a hard corner and create something different," Alsup said.
[28]
US judge backs using copyrighted books to train AI
A US federal judge has sided with Anthropic regarding training its artificial intelligence models on copyrighted books without authors' permission, a decision with the potential to set a major legal precedent in AI deployment. Tremendous amounts of data are needed to train large language models powering generative AI. Musicians, book authors, visual artists and news publications have sued various AI companies that used their data without permission or payment.A US federal judge has sided with Anthropic regarding training its artificial intelligence models on copyrighted books without authors' permission, a decision with the potential to set a major legal precedent in AI deployment. District Court Judge William Alsup ruled on Monday that the company's training of its Claude AI models with books bought or pirated was allowed under the "fair use" doctrine in the US Copyright Act. "Use of the books at issue to train Claude and its precursors was exceedingly transformative and was a fair use," Alsup wrote in his decision. "The technology at issue was among the most transformative many of us will see in our lifetimes," Alsup added in his 32-page decision, comparing AI training to how humans learn by reading books. Tremendous amounts of data are needed to train large language models powering generative AI. Musicians, book authors, visual artists and news publications have sued various AI companies that used their data without permission or payment. AI companies generally defend their practices by claiming fair use, arguing that training AI on large datasets fundamentally transforms the original content and is necessary for innovation. "We are pleased that the court recognized that using 'works to train LLMs was transformative,'" an Anthropic spokesperson said in response to an AFP query. The judge's decision is "consistent with copyright's purpose in enabling creativity and fostering scientific progress," the spokesperson added. - Blanket protection rejected - The ruling stems from a class-action lawsuit filed by authors Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson, who accused Anthropic of illegally copying their books to train Claude, the company's AI chatbot that rivals ChatGPT. However, Alsup rejected Anthropic's bid for blanket protection, ruling that the company's practice of downloading millions of pirated books to build a permanent digital library was not justified by fair use protections. Along with downloading books from websites offering pirated works, Anthropic bought copyrighted books, scanned the pages and stored them in digital formats, according to court documents. Anthropic's aim was to amass a library of "all the books in the world" for training AI models on content as deemed fit, the judge said in his ruling. While training AI models on the pirated content posed no legal violation, downloading pirated copies to build a general-purpose library constituted copyright infringement, the judge ruled, regardless of eventual training use. The case will now proceed to trial on damages related to the pirated library copies, with potential penalties including financial damages. Anthropic said it disagreed with going to trial on this part of the decision and was evaluating its legal options. "Judge Alsup's decision is a mixed bag," said Keith Kupferschmid, chief executive of US nonprofit Copyright Alliance. "In some instances AI companies should be happy with the decision and in other instances copyright owners should be happy." Valued at $61.5 billion and heavily backed by Amazon, Anthropic was founded in 2021 by former OpenAI executives. The company, known for its Claude chatbot and AI models, positions itself as focused on AI safety and responsible development.
[29]
Court Rules Anthropic Doesn't Need Permission to Train AI With Books | PYMNTS.com
According to the report, U.S. District Judge William Alsup found that Anthropic made "fair use" of books by writers Andrea Bartz, Charles Graeber and Kirk Wallace Johnson in training its Claude large language model (LLM). However, Alsup also ruled that Anthropic's copying and storage of more than 7 million pirated books in a "central library" violated the authors' copyrights and was not fair use, and ordered a trial in December to decide how much Anthropic owes for the infringement. The Reuters report noted that U.S. copyright law holds that willful copyright infringement can justify statutory damages of up to $150,000 per work. A spokesperson for Anthropic told Reuters the company was pleased that the court recognized its AI training was "transformative" and "consistent with copyright's purpose in enabling creativity and fostering scientific progress." The writers filed the proposed class action against Anthropic last year, contending the Amazon and Google-backed company used pirated versions of their books without their consent or compensation to teach Claude to reply to human prompts. The news comes days after the BBC threatened legal action against AI search engine Perplexity, alleging that the company's "default AI model" was trained using the network's material. The BBC has demanded that Perplexity end all scraping of its content, delete any copies used for AI development, and propose compensation for the alleged infringement. A report by the Financial Times noted that this is the first time the BBC has sought legal recourse over content scraping by AI firms, a sign of the mounting concerns that its freely available public sector content is being widely repurposed without authorization. The broadcaster claims that parts of its content have been reproduced verbatim by Perplexity, with links to BBC articles surfacing in search results, including material that was only recently published online. BBC executives maintain that such practices harm the BBC's reputation for impartial journalism and hurt public trust, pointing to internal research that found 17% of Perplexity responses using BBC sources had significant inaccuracies or missing context. Recent coverage by PYMNTS has spotlighted the rising friction between generative AI companies and publishers over content scraping.
[30]
Bad News for Movie Studios: Authors Just Lost on a Key Issue In a Major AI Lawsuit
Indie Films Are Increasingly Getting Tax Credits to Shoot in California as Features Flee That's how a federal court characterized Amazon-backed Anthropic's use of millions of books across the web to teach its artificial intelligence system. It's the first decision to consider the issue and will serve as a template for other courts overseeing similar cases. And studios, now that some have entered the fight over the industry-defining technology, should be uneasy about the ruling. The thrust of these cases will be decided by one question: Are AI companies covered by fair use, the legal doctrine in intellectual property law that allows creators to build upon copyrighted works without a license? On that issue, a court found that Anthropic is on solid legal ground, at least with respect to training. The technology is "among the most transformative many of us will see in our lifetimes," wrote U.S. District Judge William Alsup. Still, Anthropic will face a trial over illegally downloading seven millions books to create a library that was used for training. That it later purchased copies of those books it stole off the internet earlier to cover its tracks doesn't absolve it of liability, the court concluded. The company faces potential damages of hundreds of millions of dollars stemming from the decision that could lead to Disney and Universal getting a similar payout depending on what they unearth in discovery over how Midjourney allegedly obtained copies of thousands of films that were repurposed to teach its image generator. Last year, authors filed a lawsuit against Anthropic accusing it of illegally downloading and copying their books to power its AI chatbot Claude. The company chose not to move to dismiss the complaint and instead skipped straight for a decision on fair use. In the ruling, the court found that authors don't have the right to exclude Anthropic from using their works to train its technology, much in the same way they don't have the right to exclude any person from reading their books to learn how to write. "Everyone reads texts, too, then writes new texts," reads the order. "They may need to pay for getting their hands on a text in the first instance. But to make anyone pay specifically for the use of a book each time they read it, each time they recall it from memory, each time they later draw upon it when writing new things in new ways would be unthinkable." If someone were to read all the modern-day classics, memorize them and emulate a blend of their writing, that wouldn't constitute copyright infringement, the court concluded. Like any reader who wants to be a writer, Anthropic's technology draws upon works not to replicate or supplant them but to create something entirely different, according to the order. Those aren't the findings that Disney or Universal -- both of whom are suing Midjourney for copyright infringement -- wanted or expected. For them, there's reason to worry that Alsup's analysis will shape the way in which the judge overseeing their case considers undermining development of a technology that was found by another court to be revolutionary (or something close to it). More broadly, it could be found that AI video generators, like Sora, are simply distilling every movie ever made to create completely new works. "This Anthropic decision will likely be cited by all creators of AI models to support the argument that fair use applies to the use of massive datasets to train foundational models," says Daniel Barsky, an intellectual property lawyer at Holland & Knight. Important to note: The authors didn't allege that responses generated by Anthropic infringed upon their works. And if they had, they would've lost that argument under the court's finding that guardrails are in place to ensure that no infringing ever reached users. Alsup compared it to Google imposing limits on how many snippets of text from any one book could be seen by a user through its Google Book service, preventing its search function from being misused as a way to access full books for free. "Here, if the outputs seen by users had been infringing, Authors would have a different case," Alsup writes. "And, if the outputs were ever to become infringing, Authors could bring such a case. But that is not this case." But that could be the case for Midjourney, which returns nearly exact replicas of frames from films in some instances. When prompted with "Thanos Infinity War," Midjourney -- an AI program that translates text into hyper-realistic graphics -- replied with an image of the purple-skinned villain in a frame that appears to be taken from the Marvel movie or promotional materials, with few to no alterations made. A shot of Tom Cruise in the cockpit of a fighter jet, from Top Gun: Maverick, is produced when the tool was asked for a frame from the film. The chatbots can seemingly replicate almost any animation style, generating startlingly accurate characters from titles ranging from DreamWorks' Shrek to Pixar's Ratatouille to Warner Bros.' The Lego Movie. "The fact that Midjourney generates copies and derivatives of" films from Disney and Universal proves that the company, without their knowledge or permission, "copied plaintiffs' copyrighted works to train and develop" its technology, states the complaint. Also at play: The possibility that Midjourney pirated the studios' movies. In the June 23 ruling, Alsup found that Anthropic illegally downloading seven million books to build a library to be used for training isn't covered by fair use. He said that the company could've instead paid for the copies. Such piracy, the court concluded, is "inherently, irredeemably infringing." With statutory damages for willful copyright infringement reaching up to $150,000 per work, massive payouts are a possibility.
[31]
Judge rules Anthropic's use of books to train AI model is fair use
June 24 (UPI) -- A judge ruled the Anthropic artificial intelligence company didn't violate copyright laws when it used millions of copyrighted books to train its AI. According to his ruling, U.S. District Judge William Alsup concluded Monday "that the training use was a fair use." However, that doesn't mean Anthropic is out of the woods legally, as it's still potentially on the hook for allegedly having pirated books. Alsup wrote in his conclusion that although it was not legally wrong for Anthropic to train its AI with the unlawfully downloaded materials. "We will have a trial on the pirated copies used to create Anthropic's central library and the resulting damages, actual or statutory," he said. The owners of Anthropic claimed that they eventually started paying for downloaded books. "That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for the theft but it may affect the extent of statutory damages," Alsup wrote. The case document states that Anthropic offers an AI software service called "Claude," which is able to simulate human writing and reading because it was trained with books and other texts that were taken from a central library of materials gathered by the parent company. Authors Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson are the plaintiffs in the case, as they wrote books that Anthropic allegedly "copied from pirated and purchased sources." None of the usage was authorized by the authors. The case further purports that the owners of Anthropic knowingly downloaded at least seven million books, which they knew were pirated copies. It is unclear when a new trial in regard to the purported purposely downloading of pirated books will take place or if it has yet to be set.
[32]
Anthropic wins key ruling on AI in authors' copyright lawsuit
A federal judge in San Francisco ruled late Monday that Anthropic's use of books without permission to train its artificial intelligence system was legal under U.S. copyright law. Siding with tech companies on a pivotal question for the AI industry, U.S. District Judge William Alsup said Anthropic made "fair use" of books by writers Andrea Bartz, Charles Graeber and Kirk Wallace Johnson to train its Claude large language model. Alsup also said, however, that Anthropic's storage of the authors' books in a "central library" violated their copyrights and was not fair use. Spokespeople for Anthropic and attorneys for the authors did not immediately respond to requests for comment on the ruling on Tuesday. The writers sued Anthropic last year, arguing that the company, which is backed by Amazon and Alphabet, used pirated versions of their books without permission or compensation to teach Claude to respond to human prompts. The class action lawsuit is one of several brought by authors, news outlets and other copyright owners against companies including OpenAI, Microsoft and Meta Platforms over their AI training. The doctrine of fair use allows the use of copyrighted works without the copyright owner's permission in some circumstances. Fair use is a key legal defense for the tech companies, and Alsup's decision is the first to address it in the context of generative AI. AI companies argue their systems make fair use of copyrighted material to create new, transformative content, and that being forced to pay copyright holders for their work could hamstring the burgeoning AI industry. Anthropic told the court that it made fair use of the books and that U.S. copyright law "not only allows, but encourages" its AI training because it promotes human creativity. The company said its system copied the books to "study Plaintiffs' writing, extract uncopyrightable information from it, and use what it learned to create revolutionary technology." Copyright owners say that AI companies are unlawfully copying their work to generate competing content that threatens their livelihoods. Alsup agreed with Anthropic on Monday that its training was "exceedingly transformative." "Like any reader aspiring to be a writer, Anthropic's LLMs trained upon works not to race ahead and replicate or supplant them -- but to turn a hard corner and create something different," Alsup said. ---
[33]
Amazon-backed Anthropic wins key ruling in AI copyright lawsuit filed...
A federal judge in San Francisco ruled late on Monday that Anthropic's use of books without permission to train its artificial intelligence system was legal under US copyright law. Siding with tech companies on a pivotal question for the AI industry, US District Judge William Alsup said Anthropic made "fair use" of books by writers Andrea Bartz, Charles Graeber and Kirk Wallace Johnson to train its Claude large language model. Alsup also said, however, that Anthropic's copying and storage of more than 7 million pirated books in a "central library" infringed the authors' copyrights and was not fair use. The judge has ordered a trial in December to determine how much Anthropic owes for the infringement. US copyright law says that willful copyright infringement can justify statutory damages of up to $150,000 per work. An Anthropic spokesperson said the company was pleased that the court recognized its AI training was "transformative" and "consistent with copyright's purpose in enabling creativity and fostering scientific progress." The writers filed the proposed class action against Anthropic last year, arguing that the company, which is backed by Amazon and Alphabet, used pirated versions of their books without permission or compensation to teach Claude to respond to human prompts. The proposed class action is one of several lawsuits brought by authors, news outlets and other copyright owners against companies including OpenAI, Microsoft and Meta Platforms over their AI training. The doctrine of fair use allows the use of copyrighted works without the copyright owner's permission in some circumstances. Fair use is a key legal defense for the tech companies, and Alsup's decision is the first to address it in the context of generative AI. AI companies argue their systems make fair use of copyrighted material to create new, transformative content, and that being forced to pay copyright holders for their work could hamstring the burgeoning AI industry. Anthropic told the court that it made fair use of the books and that US copyright law "not only allows, but encourages" its AI training because it promotes human creativity. The company said its system copied the books to "study Plaintiffs' writing, extract uncopyrightable information from it, and use what it learned to create revolutionary technology." Copyright owners say that AI companies are unlawfully copying their work to generate competing content that threatens their livelihoods. Alsup agreed with Anthropic on Monday that its training was "exceedingly transformative." "Like any reader aspiring to be a writer, Anthropic's LLMs trained upon works not to race ahead and replicate or supplant them -- but to turn a hard corner and create something different," Alsup said. Alsup also said, however, that Anthropic violated the authors' rights by saving pirated copies of their books as part of a "central library of all the books in the world" that would not necessarily be used for AI training. Anthropic and other prominent AI companies including OpenAI and Meta Platforms have been accused of downloading pirated digital copies of millions of books to train their systems. Anthropic had told Alsup in a court filing that the source of its books was irrelevant to fair use. "This order doubts that any accused infringer could ever meet its burden of explaining why downloading source copies from pirate sites that it could have purchased or otherwise accessed lawfully was itself reasonably necessary to any subsequent fair use," Alsup said on Monday.
[34]
Anthropic wins key US ruling on AI training in authors' copyright lawsuit
(Reuters) -A federal judge in San Francisco ruled late on Monday that Anthropic's use of books without permission to train its artificial intelligence system was legal under U.S. copyright law. Siding with tech companies on a pivotal question for the AI industry, U.S. District Judge William Alsup said Anthropic made "fair use" of books by writers Andrea Bartz, Charles Graeber and Kirk Wallace Johnson to train its Claude large language model. Alsup also said, however, that Anthropic's copying and storage of more than 7 million pirated books in a "central library" infringed the authors' copyrights and was not fair use. The judge has ordered a trial in December to determine how much Anthropic owes for the infringement. U.S. copyright law says that willful copyright infringement can justify statutory damages of up to $150,000 per work. An Anthropic spokesperson said the company was pleased that the court recognized its AI training was "transformative" and "consistent with copyright's purpose in enabling creativity and fostering scientific progress." The writers filed the proposed class action against Anthropic last year, arguing that the company, which is backed by Amazon and Alphabet, used pirated versions of their books without permission or compensation to teach Claude to respond to human prompts. The proposed class action is one of several lawsuits brought by authors, news outlets and other copyright owners against companies including OpenAI, Microsoft and Meta Platforms over their AI training. The doctrine of fair use allows the use of copyrighted works without the copyright owner's permission in some circumstances. Fair use is a key legal defense for the tech companies, and Alsup's decision is the first to address it in the context of generative AI. AI companies argue their systems make fair use of copyrighted material to create new, transformative content, and that being forced to pay copyright holders for their work could hamstring the burgeoning AI industry. Anthropic told the court that it made fair use of the books and that U.S. copyright law "not only allows, but encourages" its AI training because it promotes human creativity. The company said its system copied the books to "study Plaintiffs' writing, extract uncopyrightable information from it, and use what it learned to create revolutionary technology." Copyright owners say that AI companies are unlawfully copying their work to generate competing content that threatens their livelihoods. Alsup agreed with Anthropic on Monday that its training was "exceedingly transformative." "Like any reader aspiring to be a writer, Anthropic's LLMs trained upon works not to race ahead and replicate or supplant them -- but to turn a hard corner and create something different," Alsup said. Alsup also said, however, that Anthropic violated the authors' rights by saving pirated copies of their books as part of a "central library of all the books in the world" that would not necessarily be used for AI training. Anthropic and other prominent AI companies including OpenAI and Meta Platforms have been accused of downloading pirated digital copies of millions of books to train their systems. Anthropic had told Alsup in a court filing that the source of its books was irrelevant to fair use. "This order doubts that any accused infringer could ever meet its burden of explaining why downloading source copies from pirate sites that it could have purchased or otherwise accessed lawfully was itself reasonably necessary to any subsequent fair use," Alsup said on Monday. (Reporting by Blake Brittain in Washington; Editing by Alexia Garamfalvi, Chizu Nomiyama, Louise Heavens and Matthew Lewis)
Share
Copy Link
A federal judge rules that AI companies can train models on legally acquired books without author permission, marking a significant victory for AI firms. However, the use of pirated materials remains contentious and subject to further legal scrutiny.
In a groundbreaking decision, US District Judge William Alsup has ruled that artificial intelligence companies do not need permission from authors to train their large language models (LLMs) on legally acquired books 1. This first-of-its-kind ruling, which condones AI training as fair use, is likely to be viewed as a significant victory for AI companies while potentially setting a precedent for similar cases in the future 2.
Source: Entrepreneur
Judge Alsup found that "the purpose and character of using copyrighted works to train LLMs to generate new text was quintessentially transformative" and "necessary" to build world-class AI models 1. He likened the process to how humans learn from reading, stating, "Everyone reads texts, too, then writes new texts. But to make anyone pay specifically for the use of a book each time they read it, each time they recall it from memory, each time they later draw upon it when writing new things in new ways would be unthinkable" 5.
This ruling is particularly significant as it marks the first time a judge has decided in favor of an AI company on the issue of fair use 4. It could have far-reaching implications for the dozens of other AI copyright lawsuits currently in the US legal system 3. For AI companies, this decision provides a legal foundation for their training practices, potentially shielding them from copyright infringement claims when using legally acquired materials.
However, the ruling has disappointed authors and creators who argue that AI models' reliance on their texts could generate competing summaries or alternative versions of their stories 1. Judge Alsup dismissed these concerns, comparing them to arguing "that training schoolchildren to write well would result in an explosion of competing works" 15.
Source: The Register
While the fair use ruling is a win for Anthropic, the company still faces a trial over allegations of book piracy 1. Anthropic is accused of downloading 7 million pirated books to build a research library, an action that Judge Alsup found did not favor a fair use finding 13. The judge stated, "This order doubts that any accused infringer could ever meet its burden of explaining why downloading source copies from pirate sites that it could have purchased or otherwise accessed lawfully was itself reasonably necessary to any subsequent fair use" 1.
The upcoming trial on the piracy allegations could potentially result in significant damages for Anthropic. The extent of these damages may be affected by Anthropic's subsequent actions to replace pirated books with legally purchased copies 13. This aspect of the case highlights the importance for AI companies to ensure they are using legally obtained materials in their training processes.
Source: CNBC
This ruling is likely to have ripple effects across the AI industry 4. While it provides a legal basis for AI companies to train their models on copyrighted works, it also puts them on notice regarding the use of pirated materials. The decision may influence how other judges interpret fair use in the context of AI training, potentially shaping the future of AI development and copyright law 23.
As the AI industry continues to evolve, this ruling marks a significant milestone in the ongoing debate over intellectual property rights in the digital age. It underscores the need for a balance between fostering innovation in AI technology and protecting the rights of content creators.
Summarized by
Navi
[2]
The UK's Competition and Markets Authority (CMA) is considering designating Google with "strategic market status," which could lead to new regulations on its search engine operations, including fair ranking measures and increased publisher control over content use in AI-generated results.
22 Sources
Policy and Regulation
17 hrs ago
22 Sources
Policy and Regulation
17 hrs ago
OpenAI is developing collaboration features for ChatGPT, potentially rivaling Google Docs and Microsoft Word, as it aims to transform the AI chatbot into a comprehensive productivity tool.
3 Sources
Technology
9 hrs ago
3 Sources
Technology
9 hrs ago
Google DeepMind has released a new on-device AI model for robotics that can operate without cloud connectivity, marking a significant advancement in autonomous robot control and adaptability.
5 Sources
Technology
9 hrs ago
5 Sources
Technology
9 hrs ago
Google has donated its Agent2Agent (A2A) protocol to the Linux Foundation, aiming to establish open standards for AI agent interoperability across platforms and vendors.
4 Sources
Technology
17 hrs ago
4 Sources
Technology
17 hrs ago
Amazon is building a colossal AI-focused data center complex in Indiana, part of its Project Rainier initiative, to power AI startup Anthropic. This marks a new era of supersized data centers for AI computing.
2 Sources
Technology
9 hrs ago
2 Sources
Technology
9 hrs ago