6 Sources
6 Sources
[1]
Adobe hit with proposed class-action, accused of misusing authors' work in AI training | TechCrunch
Like pretty much every other tech company in existence, Adobe has leaned heavily into AI over the past several years. The software firm has launched a number of different AI services since 2023, including Firefly -- its AI-powered media-generation suite. Now, however, the company's full-throated embrace of the technology may have led to trouble, as a new lawsuit claims it used pirated books to train one of its AI models. A proposed class-action lawsuit filed on behalf of Elizabeth Lyon, an author from Oregon, claims that Adobe used pirated versions of numerous books -- including her own -- to train the company's SlimLM program. Adobe describes SlimLM as a small language model series that can be "optimized for document assistance tasks on mobile devices." It states that SlimLM was pre-trained on SlimPajama-627B, a "deduplicated, multi-corpora, open-source dataset" released by Cerebras in June of 2023. Lyon, who has written a number of guidebooks for non-fiction writing, says that some of her works were included in a pretraining dataset that Adobe had used. Lyon's lawsuit, which was originally reported on by Reuters, says that her writing was included in a processed subset of a manipulated dataset that was the basis of Adobe's program: "The SlimPajama dataset was created by copying and manipulating the RedPajama dataset (including copying Books3)," the lawsuit says. "Thus, because it is a derivative copy of the RedPajama dataset, SlimPajama contains the Books3 dataset, including the copyrighted works of Plaintiff and the Class members." "Books3'' -- a huge collection of 191,000 books that have been used to train genAI systems -- has been an ongoing source of legal trouble for the tech community. RedPajama has also been cited in a number of litigation cases. In September, a lawsuit against Apple claimed the company had used copyrighted material to train its Apple Intelligence model. The litigation mentioned the dataset and accused the tech company of copying protected works "without consent and without credit or compensation." In October, a similar lawsuit against Salesforce also claimed the company had used RedPajama for training purposes. Unfortunately for the tech industry, such lawsuits have, by now, become somewhat commonplace. AI algorithms are trained on massive datasets and, in some cases, those datasets have allegedly including pirated materials. In September, Anthropic agreed to pay $1.5 billion to a number of authors who had sued it and accused it of using pirated versions of their work to train its chatbot, Claude. The case was considered a potential turning point in the ongoing legal battles over copyrighted material in AI training data, of which there are many.
[2]
Adobe faces class action lawsuit after allegedly misusing authors work in AI training
The plaintiff claims sufficient financial resources to "vigorously" pursue this case Adobe is set to face an AI copyright lawsuit in the US, with a class-action case alleging that the company trained its AI models on pirated books without permission. Oregon author Elizabeth Lyon filed the case, claiming that the tech giant had trained its AI models not only on her books, but the work of others, too. The lawsuit focuses specifically on Adobe's SlimLM small language models which are used for document assistance tasks on mobile devices. The company has denied the allegations, asserting that SlimLM was trained on SlimPajama-627B - an open-source dataset that was released by Cerebras in 2023. However, the lawsuit claims that SlimPajama is a derivative of RedPajama, which allegedly includes Books3 - a dataset of nearly 200,000 pirated books. In short, Lyon argues that because SlimPajama includes RedPajama/Books3, it contains copyrighted work without consent, credit, or compensation. Adobe is also accused of having "repeatedly downloaded, copied, and processed those works during the preprocessing and pretraining of the models." It's not the first time that RedPajama or Books3 have been involved with legal cases, previously appearing in lawsuits against Apple and Salesforce. Lyon says she's "committed to vigorously prosecuting this action on behalf of the other members of the class," and that she has the "financial resources to do so." The plaintiff is seeking "an award of statutory and other damages," the reimbursement of attorney fees and a declaration of willful infringement from Adobe. TechRadar Pro has sought a formal response from Adobe, but the company has not yet responded.
[3]
Adobe is sued for using pirated books to train AI
A proposed class-action lawsuit filed by Oregon author Elizabeth Lyon accuses Adobe of training its SlimLM AI model on pirated books, including her guidebooks, through the SlimPajama-627B dataset derived from the RedPajama collection containing Books3. Adobe has pursued extensive development in artificial intelligence over recent years. The company launched multiple AI services starting in 2023, with Firefly serving as its AI-powered media-generation suite designed for creating images, videos, and other media content from text prompts and inputs. SlimLM represents a series of small language models that Adobe has optimized specifically for document assistance tasks on mobile devices. These models enable functions such as summarizing documents, extracting key information, and providing contextual help directly within mobile applications. Adobe states that it pre-trained SlimLM using the SlimPajama-627B dataset. Cerebras released this dataset in June 2023 as a deduplicated, multi-corpora, open-source resource intended for training large language models. The dataset aggregates various text sources after removing duplicates to improve training efficiency and model performance. Elizabeth Lyon, who specializes in guidebooks for non-fiction writing, initiated the lawsuit claiming that Adobe incorporated pirated versions of numerous books, including her own works, into the training process for SlimLM. The legal action seeks class-action status to represent other affected authors. The lawsuit details how the SlimPajama dataset originated from the RedPajama dataset, which includes the Books3 collection comprising 191,000 books. Reuters first reported on the filing. The complaint states verbatim: "The SlimPajama dataset was created by copying and manipulating the RedPajama dataset (including copying Books3)." It continues: "Thus, because it is a derivative copy of the RedPajama dataset, SlimPajama contains the Books3 dataset, including the copyrighted works of Plaintiff and the Class members." Lyon argues that her copyrighted materials appeared in this pre-training data without her consent or compensation. Books3 has emerged repeatedly in legal disputes within the AI sector, as developers have utilized it to train generative AI systems. The collection contains digitized texts from various genres and authors, making it a comprehensive but contentious training corpus. RedPajama, which incorporates Books3, has also faced mentions in multiple court cases.
[4]
Adobe Faces Lawsuit For Misusing Pirated Books For AI Training
On December 16, 2025, Elizabeth Lyon filed a proposed class action lawsuit against Adobe Inc. in the US District Court in the Northern District of California, alleging that the company used unauthorised copies of her books and those of other authors to train its SlimLM AI models without permission or payment. Lyon brings the action on behalf of herself and all similarly affected US copyright holders, accusing Adobe of copying and using copyrighted literary works in training data for its small language model (SLM) designed for document-related tasks on mobile devices. The lawsuit seeks class certification, unspecified monetary damages, injunctive and declaratory relief, and an order that requires Adobe to stop using and destroy/dispose allegedly infringing copies of copyrighted works. Notably, this lawsuit marks Adobe's first major copyright challenge over AI training data, and arrives as legal scrutiny of how generative AI systems source and ingest copyrighted material intensifies. Earlier this year, Anthropic agreed to a $1.5 billion settlement in a class action brought by authors who claimed the company used pirated books to train its large language models (LLMs), in what became the largest copyright recovery on record. Other companies, including Apple, OpenAI, and Meta, have also faced claims tied to their AI training practices. The lawsuit alleges that Adobe infringed authors' copyrights while developing and training its SlimLM series of SLMs, which the company designed to run directly on devices such as smartphones, tablets, and laptops. According to the complaint, SlimLM models perform document-assistance tasks and form part of Adobe's broader AI product offerings. At the centre of the allegations is the SlimPajama training dataset, which the plaintiffs say that Adobe used to pre-train its SlimLM models. The complaint states that SlimPajama is a cleaned and deduplicated derivative of the RedPajama dataset, which in turn reproduces large portions of earlier datasets used to train Meta's LLaMA models. Crucially, RedPajama includes a subsection known as Books, or RedPajama-Books, which the complaint identifies as a copy of the Books3 dataset. The filing further explains that Books3 originates from the Bibliotik private tracker, a source that contains a mix of fiction and non-fiction books: many of which remain under copyright. The plaintiffs allege that Adobe copied these works without authorisation during multiple stages of AI development. Furthermore, the complaint claims that Adobe continued to retain copies of these datasets on its servers and embedded the extracted information into SlimLM model parameters, thereby continuing the alleged infringement even after initial training concluded. The Books3 dataset has repeatedly emerged as a flashpoint in 2025 AI copyright litigation, underscoring intensifying disputes over how generative AI systems are trained. Books3, a notorious "shadow library" dataset composed of nearly 200,000 pirated electronic books obtained from private trackers such as Bibliotik, has featured in several high-profile cases this year. For instance, in a proposed class action against Apple Inc., authors allege that Apple used Books3 material to train its OpenELM model that underpins Apple Intelligence, arguing that the inclusion of pirated titles in training data constitutes widespread copyright infringement. Previously, Books3 was a central element in the legal battles against Anthropic, where a class of authors secured a $1.5 billion settlement over claims the AI company used pirated books to train its LLMs, marking a landmark recovery and prompting orders to destroy the infringing datasets. Most recently, a lawsuit against Meta by Entrepreneur Media over its LLaMA models claims that Meta's training datasets included Books3 material obtained without permission, along with other shadow libraries such as Library Genesis (LibGen). This case brings renewed attention to how widely deployed AI systems are built and commercialised, particularly when they rely on large text corpora assembled from third-party sources. By focusing on Adobe's SlimLM models, the lawsuit shifts scrutiny to SLM features embedded directly into mainstream productivity software used by millions of people, rather than standalone or experimental tools. Moreover, the dispute underscores the legal risks surrounding derivative datasets such as SlimPajama, which are marketed as cleaned or deduplicated but still trace back to repositories containing copyrighted books. If courts accept the plaintiffs' arguments, AI companies could face exposure not only for directly copying works, but also for incorporating datasets that inherit infringement through earlier sources. The remedies sought also carry broader implications. The plaintiff is asking the court to certify a nationwide class, declare Adobe's conduct unlawful, and order the destruction or disposal of infringing copies, alongside statutory damages and injunctive relief. While the complaint does not specify a monetary figure, statutory damages under US copyright law can scale rapidly when applied across multiple works and repeated acts of copying. Finally, as AI copyright litigation accelerates, the case adds pressure on developers and regulators to establish clearer data provenance and licensing practices before integrating generative AI capabilities into consumer-facing products.
[5]
Adobe Sued Over AI Training Data: Pirated Books Allegations Explained
The lawsuit, filed on behalf of Oregon-based author Elizabeth Lyon, claims Adobe relied on copyrighted works, without permission, to develop its SlimLM program. Adobe SlimLM models are a series of light language models intended to help with document-related issues on mobile phones. According to the company, these models were pre-trained in an open-source, multi-corpora SlimPajama-627B dataset, published by artificial intelligence hardware company Cerebras in June 2023. Adobe stated that SlimPajama was built as a deduplicated dataset for large-scale language modeling tasks. But the case at hand claims that the SlimPajama dataset is, in turn, built from a different problematic dataset named RedPajama.
[6]
Adobe sued over alleged use of creators' work to train AI models
The lawsuit was brought by author Elizabeth Lyon, who writes instructional books on how to market novels. Adobe is facing a lawsuit that accuses the company of using writers' copyrighted books without permission to train its artificial intelligence tools. The case was filed this week in a federal court in California and adds to a growing list of legal battles over how the tech giants train their AI systems. The lawsuit was brought by author Elizabeth Lyon, who writes instructional books on how to market novels, reports Reuters. She claims Adobe used pirated copies of her books, along with many others, to train its AI models without asking for approval. According to the complaint, Adobe's AI systems were trained using pirated versions of books. Lyon says these books were fed into Adobe's SlimLM models, which are small language models designed to help users with document-related tasks on mobile devices. The complaint seeks financial compensation, although it does not name a specific amount, according to the report. Also read: OpenAI in talks with Amazon for $10 bn investment and AI chips: Report This case is important because it is the first major copyright lawsuit targeting Adobe over AI training. However, it is part of a much larger trend. In recent years, many authors, artists, and publishers have sued technology companies, claiming their creative work was used to train AI systems without consent. Also read: Google launches Gemini 3 Flash AI model: What's new and how to access Several well-known AI companies, including OpenAI and Anthropic, are already facing similar lawsuits. Anthropic agreed to settle a class action lawsuit for $1.5 billion earlier this year. That settlement became the largest ever recorded in a copyright-related case.
Share
Share
Copy Link
Oregon author Elizabeth Lyon filed a proposed class-action lawsuit against Adobe, claiming the company used pirated versions of her books and thousands of others to train its SlimLM AI model. The case centers on the Books3 dataset, which contains 191,000 copyrighted works allegedly incorporated without permission or compensation through the SlimPajama training dataset.

Adobe faces a proposed class-action lawsuit filed by Elizabeth Lyon, an Oregon-based author, who accuses the software giant of using pirated books for AI training without authorization or compensation
1
. The lawsuit, filed on December 16, 2025, in the US District Court in the Northern District of California, alleges that Adobe trained its SlimLM AI model on copyrighted literary works, including Lyon's guidebooks for non-fiction writing4
. This marks Adobe's first major copyright challenge over AI training data, arriving as legal scrutiny intensifies across the tech industry.The SlimLM AI model represents a series of small language models that Adobe optimized specifically for document assistance tasks on mobile devices, including smartphones, tablets, and laptops
3
. Adobe states that it pre-trained SlimLM using the SlimPajama-627B dataset, an open-source dataset released by Cerebras in June 20231
. The company describes this as a "deduplicated, multi-corpora" resource designed for large-scale language modeling tasks5
. However, Lyon's complaint challenges this characterization, arguing that the SlimPajama dataset is a derivative copy of the RedPajama dataset, which allegedly contains the controversial Books3 dataset2
.The Books3 dataset has become a recurring source of legal trouble for the tech community. This collection comprises 191,000 books sourced from Bibliotik, a private tracker containing a mix of fiction and non-fiction works, many of which remain under copyright
1
. The lawsuit states verbatim: "The SlimPajama dataset was created by copying and manipulating the RedPajama dataset (including copying Books3). Thus, because it is a derivative copy of the RedPajama dataset, SlimPajama contains the Books3 dataset, including the copyrighted works of Plaintiff and the Class members"3
. The complaint further alleges that Adobe "repeatedly downloaded, copied, and processed those works during the preprocessing and pretraining of the models"2
.The RedPajama dataset has been cited in multiple litigation cases throughout 2025. In September, a lawsuit against Apple claimed the company used copyrighted material from the same dataset to train its Apple Intelligence model, accusing the tech giant of copying protected works "without consent and without credit or compensation"
1
. A similar lawsuit against Salesforce in October also alleged the company used RedPajama for training purposes1
. Most notably, Anthropic agreed to pay $1.5 billion to authors who sued over the use of pirated versions of their work to train its chatbot, Claude, marking the largest copyright recovery on record and a potential turning point in ongoing legal battles over intellectual property in generative AI systems1
4
.Related Stories
Lyon brings the action on behalf of herself and all similarly affected US copyright holders. She states that she is "committed to vigorously prosecuting this action on behalf of the other members of the class" and possesses the "financial resources to do so"
2
. The lawsuit seeks class certification, unspecified monetary damages, injunctive and declaratory relief, and an order requiring Adobe to destroy or dispose of allegedly infringing copies of copyrighted works4
. Lyon is seeking "an award of statutory and other damages," reimbursement of attorney fees, and a declaration of willful infringement from Adobe2
.This case brings renewed attention to how widely deployed AI systems are built and commercialized, particularly when they rely on large text corpora assembled from third-party sources
4
. By focusing on Adobe's SlimLM models, the lawsuit shifts scrutiny to small language models embedded directly into mainstream productivity software used by millions, rather than standalone or experimental tools. The dispute underscores legal risks surrounding derivative datasets such as SlimPajama, which are marketed as cleaned or deduplicated but still trace back to repositories containing copyrighted books4
. If courts accept the plaintiffs' arguments, AI companies could face exposure not only for directly copying works but also for incorporating datasets that inherit infringement through earlier sources. The remedies sought carry broader implications for how tech companies approach training data, potentially forcing a fundamental shift in data sourcing practices across the industry as misusing authors' work becomes increasingly untenable from both legal and reputational standpoints.Summarized by
Navi
[1]
[3]
[5]
1
Policy and Regulation

2
Technology

3
Technology
