2 Sources
[1]
What Two Judicial Rulings Mean for the Future of Generative AI
Should tech companies have free access to copyrighted books and articles for training their AI models? Two judges recently nudged us toward an answer. More than 40 lawsuits have been filed against AI companies since 2022. The specifics vary, but they generally seek to hold these companies accountable for stealing millions of copyrighted works to develop their technology. (The Atlantic is involved in one such lawsuit, against the AI firm Cohere.) Late last month, there were rulings on two of these cases, first in a lawsuit against Anthropic and, two days later, in one against Meta. Both of the cases were brought by book authors who alleged that AI companies had trained large language models using authors' work without consent or compensation. In each case, the judges decided that the tech companies were engaged in "fair use" when they trained their models with authors' books. Both judges said that the use of these books was "transformative" -- that training an LLM resulted in a fundamentally different product that does not directly compete with those books. (Fair use also protects the display of quotations from books for purposes of discussion or criticism.) Read: The end of publishing as we know it At first glance, this seems like a substantial blow against authors and publishers, who worry that chatbots threaten their business, both because of the technology's ability to summarize their work and its ability to produce competing work that might eat into their market. (When reached for comment, Anthropic and Meta told me they were happy with the rulings.) A number of news outlets portrayed the rulings as a victory for the tech companies. Wired described the two outcomes as "landmark" and "blockbuster." But in fact, the judgments are not straightforward. Each is specific to the particular details of each case, and they do not resolve the question of whether AI training is fair use in general. On certain key points, the two judges disagreed with each other -- so thoroughly, in fact, that one legal scholar observed that the judges had "totally different conceptual frames for the problem." It's worth understanding these rulings, because AI training remains a monumental and unresolved issue -- one that could define how the most powerful tech companies are able to operate in the future, and whether writing and publishing remain viable professions. So, is it open season on books now? Can anyone pirate whatever they want to train for-profit chatbots? Not necessarily. When preparing to train its LLM, Anthropic downloaded a number of "pirate libraries," collections comprising more than 7 million stolen books, all of which the company decided to keep indefinitely. Although the judge in this case ruled that the training itself was fair use, he also ruled that keeping such a "central library" was not, and for this, the company will likely face a trial that determines whether it is liable for potentially billions of dollars in damages. In the case against Meta, the judge also ruled that the training was fair use, but Meta may face further litigation for allegedly helping distribute pirated books in the process of downloading -- a typical feature of BitTorrent, the file-sharing protocol that the company used for this effort. (Meta has said it "took precautions" to avoid doing so.) Piracy is not the only relevant issue in these lawsuits. In their case against Anthropic, the authors argued that AI will cause a proliferation of machine-generated titles that compete with their books. Indeed, Amazon is already flooded with AI-generated books, some of which bear real authors' names, creating market confusion and potentially stealing revenue from writers. But in his opinion on the Anthropic case, Judge William Alsup said that copyright law should not protect authors from competition. "Authors' complaint is no different than it would be if they complained that training schoolchildren to write well would result in an explosion of competing works," he wrote. Read: The unbelievable scale of AI's pirated-books problem In his ruling on the Meta case, Judge Vince Chhabria disagreed. He wrote that Alsup had used an "inapt analogy" and was "blowing off the most important factor in the fair use analysis." Because anyone can use a chatbot to bypass the process of learning to write well, he argued, AI "has the potential to exponentially multiply creative expression in a way that teaching individual people does not." In light of this, he wrote, "it's hard to imagine that it can be fair use to use copyrighted books to develop a tool to make billions or trillions of dollars" while damaging the market for authors' work. To determine whether training is fair use, Chhabria said that we need to look at the details. For instance, famous authors might have less of a claim than up-and-coming authors. "While AI-generated books probably wouldn't have much of an effect on the market for the works of Agatha Christie, they could very well prevent the next Agatha Christie from getting noticed or selling enough books to keep writing," he wrote. Thus, in Chhabria's opinion, some plaintiffs will win cases against AI companies, but they will need to show that the market for their particular books has been damaged. Because the plaintiffs in the case against Meta didn't do this, Chhabria ruled against them. In addition to these two disagreements is the problem that nobody -- including AI developers themselves -- fully understands how LLMs work. For example, both judges seemed to underestimate the potential for AI to directly quote copyrighted material to users. Their fair-use analysis was based on the LLMs' inputs -- the text used to train the programs -- rather than outputs that might be infringing. Research on AI models such as Claude, Llama, GPT-4, and Google's Gemini has shown that, on average, 8 to 15 percent of chatbots' responses in normal conversation are copied directly from the web, and in some cases responses are 100 percent copied. The more text an LLM has "memorized," the more it can potentially copy and paste from its training sources without anyone realizing it's happening. OpenAI has characterized this as a "rare bug," and Anthropic, in another case, has argued that "Claude does not use its training texts as a database from which preexisting outputs are selected in response to user prompts." But research in this area is still in its early stages. A study published this spring showed that Llama can reproduce much more of its training text than was previously thought, including near-exact copies of books such as Harry Potter and the Sorcerer's Stone and 1984. That study was co-authored by Mark Lemley, one of the most widely read legal scholars on AI and copyright, and a longtime supporter of the idea that AI training is fair use. In fact, Lemley was part of Meta's defense team for its case, but he quit earlier this year, criticizing in a LinkedIn post about "Mark Zuckerberg and Facebook's descent into toxic masculinity and Neo-Nazi madness." (Meta did not respond to my question about this post.) Lemley was surprised by the results of the study, and told me that it "complicates the legal landscape in various ways for the defendants" in AI copyright cases. "I think it ought still to be a fair use," he told me, referring to training, but we can't entirely accept "the story that the defendants have been telling" about LLMs. For some models trained using copyrighted books, he told me, "you could make an argument that the model itself has a copy of some of these books in it," and AI companies will need to explain to the courts how that copy is also fair use, in addition to the copies made in the course of researching and training their model. As more is learned about how LLMs memorize their training text, we could see more lawsuits from authors whose books, with the right prompting, can be fully reproduced by LLMs. Recent research shows that widely read authors, including J. K. Rowling, George R. R. Martin, and Dan Brown may be in this category. Unfortunately, this kind of research is expensive and requires expertise that is rare outside of AI companies. And the tech industry has little incentive to support or publish such studies. Read: ChatGPT turned into a Studio Ghibli machine. How is that legal? The two recent rulings are best viewed as first steps toward a more nuanced conversation about what responsible AI development could look like. The purpose of copyright is not simply to reward authors for writing but to create a culture that produces important works of art, literature, and research. AI companies claim that their software is creative, but AI can only remix the work it's been trained with. Nothing in its architecture makes it capable of doing anything more. At best, it summarizes. Some writers and artists have used generative AI to interesting effect, but such experiments arguably have been insignificant next to the torrent of slop that is already drowning out human voices on the internet. There is even evidence that AI can make us less creative; it may therefore prevent the kinds of thinking needed for cultural progress. The goal of fair use is to balance a system of incentives so that the kind of work our culture needs is rewarded. A world in which AI training is broadly fair use is likely a culture with less human writing in it. Whether that is the kind of culture we should have is a fundamental question the judges in the other AI cases may need to confront.
[2]
AI Copyright Battles Continue Despite Meta, Anthropic Wins, Experts Say | PYMNTS.com
By completing this form, you agree to receive marketing communications from PYMNTS and to the sharing of your information with our sponsor, if applicable, in accordance with our Privacy Policy and Terms and Conditions. "Although we have two recent decisions from the Northern District of California in which the judges found that Anthropic's and Meta's training were fair use because the training of [large language models (LLMs)] was 'transformative,' the rulings are rather limited," Yelena Ambartsumian, founding attorney at Ambart Law, told PYMNTS. The two court cases are Bartz v. Anthropic and Kadrey v. Meta. The rulings in late June on these two lawsuits offer "the first significant judicial guidance" on the application of the fair use doctrine to model training, according to a July 3 blog post from global law firm Reed Smith. A closer look at the rulings shows that tech companies training their artificial intelligence models on copyrighted content are not home free, the post said. For Anthropic, the judge ruled that AI model training was fair use for its legally acquired books, but not for pirated books. That's why the judge allowed a separate infringement case to proceed over Anthropic's download of millions of works from shadow libraries. A trial on those claims is expected later this year, VKTR reported June 27. Meta, meanwhile, downloaded copyrighted books from unauthorized online sites, so the court did not focus on fair use but rather on the 13-author plaintiffs' inability to prove they were harmed, CNBC reported June 25. "While the Meta and Anthropic cases were definitely positive for AI developers, they should not be taken as definitive on the issue of fair use in using copyrighted materials to train AI programs," Thomas McNulty, attorney at Lando and Anastasi, told PYMNTS. "The Anthropic court found that competition from non-infringing works is not the sort of thing that copyright law protects, so there is a split in the handling of this factor that will likely be addressed on appeal," McNulty added. As for the Meta decision, it "effectively sets forth a road map for plaintiffs in later-filed suits, particularly as the Meta court deemed this factor [of market harm] the 'most important' in the fair use analysis," McNulty said. The rulings raise several legal questions. If an AI produced a result that was "substantially similar to a copyrighted work on which an AI program was trained, would there be liability for infringement?" McNulty said. Also, McNulty said, "Who would the liable party be? The entity that trained the AI and used the infringed work as a part of the training, the person who input the prompts that led to the infringing output, the entity that subsequently published the infringing output, or some combination of the three?" See also: AI Firm Cohere Sued by Publishers Over Copyright Infringement For now, the rulings seem to be emboldening tech firms. Irina Tsukerman, an attorney and president of Scarab Rising, said the rulings signal an erosion of control for creators. "The burden increasingly falls on artists and writers to prove that an AI-generated output is a direct copy, a nearly impossible task," Tsukerman said. Shane Lucado, founder and CEO of legal search firm InPerSuit, told PYMNTS that "these decisions may mark the end of copyright as a defensive shield for solo creatives. It is becoming a tool for big fights between companies with billions at stake. Small publishers, musicians and independent authors are getting folded into the data layer of corporate training sets." However, not all experts said the decisions present an existential threat to creators. Wyatt Mayham, CEO of Northwest AI Consulting, told PYMNTS that he believes a licensing market is beginning to take shape. "The courts have basically said, 'Training can be fair use, but piracy is still piracy,'" Mayham said. HarperCollins offered authors $2,500 per book to let AI models train on their works, he said. "Other publishers are soon to follow suit," Mayham said. Mayham said he also does not think the rulings are the "apocalypse that people are painting it out to be." He pointed out that 1,100 authors signed a petition opposing the HarperCollins deal. "This ends up creating a ton of leverage for creators in their licensing negotiations," Mayham said. "Everyone will end up getting paid."
Share
Copy Link
Recent court decisions in cases against Anthropic and Meta have significant implications for AI companies' use of copyrighted material in training large language models, sparking debates on fair use and the future of content creation.
In a significant development for the AI industry, two recent court rulings have addressed the contentious issue of using copyrighted material to train large language models (LLMs). The cases, involving tech giants Anthropic and Meta, have yielded decisions that could shape the future of AI development and content creation 1.
Source: PYMNTS
Both judges ruled that the use of copyrighted books for AI training constitutes "fair use," deeming it "transformative" as it results in a fundamentally different product that doesn't directly compete with the original works 1. This interpretation has been seen as a victory for AI companies, potentially allowing them broader access to copyrighted material for training purposes.
Despite the similar outcomes, the judges' reasoning differed significantly on key points. Judge William Alsup, in the Anthropic case, argued that copyright law should not protect authors from competition, even if it comes from AI-generated content 1. In contrast, Judge Vince Chhabria, ruling on the Meta case, expressed concerns about the potential for AI to "exponentially multiply creative expression" and damage the market for authors' work 1.
Source: The Atlantic
While the rulings favored AI companies on the issue of training, they didn't give carte blanche for copyright infringement. Anthropic faces potential liability for maintaining a "central library" of pirated books, which the judge deemed not protected by fair use 1. Meta may face further litigation for allegedly aiding in the distribution of pirated books during its downloading process 1.
Judge Chhabria emphasized the need to consider the specific market impact on authors, suggesting that while established authors might not be significantly affected, emerging writers could face challenges in getting noticed or selling enough books to sustain their careers 1. This perspective opens the door for future lawsuits where plaintiffs can demonstrate tangible market harm 2.
Legal experts stress that these rulings are not definitive on the issue of fair use in AI training. Thomas McNulty, an attorney at Lando and Anastasi, points out that the split in handling market harm factors will likely be addressed on appeal 2. The rulings also raise questions about liability for AI-generated content that closely resembles copyrighted works 2.
Some experts warn that these decisions could erode control for creators, making it increasingly difficult for artists and writers to prove direct copying by AI-generated outputs 2. Shane Lucado, CEO of InPerSuit, suggests that copyright may become a tool primarily for large-scale corporate battles, potentially sidelining independent creators 2.
Despite concerns, some industry observers see potential for a new licensing market. Wyatt Mayham, CEO of Northwest AI Consulting, notes that publishers like HarperCollins are beginning to offer compensation to authors for AI training rights, potentially creating leverage for creators in licensing negotiations 2.
As the AI industry continues to evolve, these rulings mark a significant milestone in the ongoing debate over intellectual property rights in the digital age. While they provide some clarity on the application of fair use doctrine to AI training, they also highlight the complex challenges that lie ahead for both tech companies and content creators.
Google rolls out an AI-powered business calling feature in Search and upgrades AI Mode with Gemini 2.5 Pro and Deep Search capabilities, showcasing significant advancements in AI integration for everyday tasks.
11 Sources
Technology
15 hrs ago
11 Sources
Technology
15 hrs ago
Calvin French-Owen, a former OpenAI engineer, shares insights into the company's intense work environment, rapid growth, and secretive culture, highlighting both challenges and achievements in AI development.
4 Sources
Technology
15 hrs ago
4 Sources
Technology
15 hrs ago
Microsoft's AI assistant Copilot lags behind ChatGPT in downloads and user adoption, despite the company's significant investment in AI technology and infrastructure.
4 Sources
Technology
14 hrs ago
4 Sources
Technology
14 hrs ago
Larry Ellison, Oracle's co-founder, surpasses Mark Zuckerberg to become the world's second-richest person with a net worth of $251 billion, driven by Oracle's AI-fueled stock rally and strategic partnerships.
4 Sources
Business and Economy
23 hrs ago
4 Sources
Business and Economy
23 hrs ago
OpenAI has added Google Cloud to its list of cloud partners, joining Microsoft, Oracle, and CoreWeave, as the AI giant seeks to meet escalating demands for computing capacity to power its AI models like ChatGPT.
5 Sources
Technology
7 hrs ago
5 Sources
Technology
7 hrs ago