5 Sources
5 Sources
[1]
Adobe hit with proposed class-action, accused of misusing authors' work in AI training | TechCrunch
Like pretty much every other tech company in existence, Adobe has leaned heavily into AI over the past several years. The software firm has launched a number of different AI services since 2023, including Firefly -- its AI-powered media-generation suite. Now, however, the company's full-throated embrace of the technology may have led to trouble, as a new lawsuit claims it used pirated books to train one of its AI models. A proposed class-action lawsuit filed on behalf of Elizabeth Lyon, an author from Oregon, claims that Adobe used pirated versions of numerous books -- including her own -- to train the company's SlimLM program. Adobe describes SlimLM as a small language model series that can be "optimized for document assistance tasks on mobile devices." It states that SlimLM was pre-trained on SlimPajama-627B, a "deduplicated, multi-corpora, open-source dataset" released by Cerebras in June of 2023. Lyon, who has written a number of guidebooks for non-fiction writing, says that some of her works were included in a pretraining dataset that Adobe had used. Lyon's lawsuit, which was originally reported on by Reuters, says that her writing was included in a processed subset of a manipulated dataset that was the basis of Adobe's program: "The SlimPajama dataset was created by copying and manipulating the RedPajama dataset (including copying Books3)," the lawsuit says. "Thus, because it is a derivative copy of the RedPajama dataset, SlimPajama contains the Books3 dataset, including the copyrighted works of Plaintiff and the Class members." "Books3'' -- a huge collection of 191,000 books that have been used to train genAI systems -- has been an ongoing source of legal trouble for the tech community. RedPajama has also been cited in a number of litigation cases. In September, a lawsuit against Apple claimed the company had used copyrighted material to train its Apple Intelligence model. The litigation mentioned the dataset and accused the tech company of copying protected works "without consent and without credit or compensation." In October, a similar lawsuit against Salesforce also claimed the company had used RedPajama for training purposes. Unfortunately for the tech industry, such lawsuits have, by now, become somewhat commonplace. AI algorithms are trained on massive datasets and, in some cases, those datasets have allegedly including pirated materials. In September, Anthropic agreed to pay $1.5 billion to a number of authors who had sued it and accused it of using pirated versions of their work to train its chatbot, Claude. The case was considered a potential turning point in the ongoing legal battles over copyrighted material in AI training data, of which there are many.
[2]
Adobe faces class action lawsuit after allegedly misusing authors work in AI training
The plaintiff claims sufficient financial resources to "vigorously" pursue this case Adobe is set to face an AI copyright lawsuit in the US, with a class-action case alleging that the company trained its AI models on pirated books without permission. Oregon author Elizabeth Lyon filed the case, claiming that the tech giant had trained its AI models not only on her books, but the work of others, too. The lawsuit focuses specifically on Adobe's SlimLM small language models which are used for document assistance tasks on mobile devices. The company has denied the allegations, asserting that SlimLM was trained on SlimPajama-627B - an open-source dataset that was released by Cerebras in 2023. However, the lawsuit claims that SlimPajama is a derivative of RedPajama, which allegedly includes Books3 - a dataset of nearly 200,000 pirated books. In short, Lyon argues that because SlimPajama includes RedPajama/Books3, it contains copyrighted work without consent, credit, or compensation. Adobe is also accused of having "repeatedly downloaded, copied, and processed those works during the preprocessing and pretraining of the models." It's not the first time that RedPajama or Books3 have been involved with legal cases, previously appearing in lawsuits against Apple and Salesforce. Lyon says she's "committed to vigorously prosecuting this action on behalf of the other members of the class," and that she has the "financial resources to do so." The plaintiff is seeking "an award of statutory and other damages," the reimbursement of attorney fees and a declaration of willful infringement from Adobe. TechRadar Pro has sought a formal response from Adobe, but the company has not yet responded.
[3]
Adobe is sued for using pirated books to train AI
A proposed class-action lawsuit filed by Oregon author Elizabeth Lyon accuses Adobe of training its SlimLM AI model on pirated books, including her guidebooks, through the SlimPajama-627B dataset derived from the RedPajama collection containing Books3. Adobe has pursued extensive development in artificial intelligence over recent years. The company launched multiple AI services starting in 2023, with Firefly serving as its AI-powered media-generation suite designed for creating images, videos, and other media content from text prompts and inputs. SlimLM represents a series of small language models that Adobe has optimized specifically for document assistance tasks on mobile devices. These models enable functions such as summarizing documents, extracting key information, and providing contextual help directly within mobile applications. Adobe states that it pre-trained SlimLM using the SlimPajama-627B dataset. Cerebras released this dataset in June 2023 as a deduplicated, multi-corpora, open-source resource intended for training large language models. The dataset aggregates various text sources after removing duplicates to improve training efficiency and model performance. Elizabeth Lyon, who specializes in guidebooks for non-fiction writing, initiated the lawsuit claiming that Adobe incorporated pirated versions of numerous books, including her own works, into the training process for SlimLM. The legal action seeks class-action status to represent other affected authors. The lawsuit details how the SlimPajama dataset originated from the RedPajama dataset, which includes the Books3 collection comprising 191,000 books. Reuters first reported on the filing. The complaint states verbatim: "The SlimPajama dataset was created by copying and manipulating the RedPajama dataset (including copying Books3)." It continues: "Thus, because it is a derivative copy of the RedPajama dataset, SlimPajama contains the Books3 dataset, including the copyrighted works of Plaintiff and the Class members." Lyon argues that her copyrighted materials appeared in this pre-training data without her consent or compensation. Books3 has emerged repeatedly in legal disputes within the AI sector, as developers have utilized it to train generative AI systems. The collection contains digitized texts from various genres and authors, making it a comprehensive but contentious training corpus. RedPajama, which incorporates Books3, has also faced mentions in multiple court cases.
[4]
Adobe Sued Over AI Training Data: Pirated Books Allegations Explained
The lawsuit, filed on behalf of Oregon-based author Elizabeth Lyon, claims Adobe relied on copyrighted works, without permission, to develop its SlimLM program. Adobe SlimLM models are a series of light language models intended to help with document-related issues on mobile phones. According to the company, these models were pre-trained in an open-source, multi-corpora SlimPajama-627B dataset, published by artificial intelligence hardware company Cerebras in June 2023. Adobe stated that SlimPajama was built as a deduplicated dataset for large-scale language modeling tasks. But the case at hand claims that the SlimPajama dataset is, in turn, built from a different problematic dataset named RedPajama.
[5]
Adobe sued over alleged use of creators' work to train AI models
The lawsuit was brought by author Elizabeth Lyon, who writes instructional books on how to market novels. Adobe is facing a lawsuit that accuses the company of using writers' copyrighted books without permission to train its artificial intelligence tools. The case was filed this week in a federal court in California and adds to a growing list of legal battles over how the tech giants train their AI systems. The lawsuit was brought by author Elizabeth Lyon, who writes instructional books on how to market novels, reports Reuters. She claims Adobe used pirated copies of her books, along with many others, to train its AI models without asking for approval. According to the complaint, Adobe's AI systems were trained using pirated versions of books. Lyon says these books were fed into Adobe's SlimLM models, which are small language models designed to help users with document-related tasks on mobile devices. The complaint seeks financial compensation, although it does not name a specific amount, according to the report. Also read: OpenAI in talks with Amazon for $10 bn investment and AI chips: Report This case is important because it is the first major copyright lawsuit targeting Adobe over AI training. However, it is part of a much larger trend. In recent years, many authors, artists, and publishers have sued technology companies, claiming their creative work was used to train AI systems without consent. Also read: Google launches Gemini 3 Flash AI model: What's new and how to access Several well-known AI companies, including OpenAI and Anthropic, are already facing similar lawsuits. Anthropic agreed to settle a class action lawsuit for $1.5 billion earlier this year. That settlement became the largest ever recorded in a copyright-related case.
Share
Share
Copy Link
Oregon author Elizabeth Lyon filed a proposed class-action lawsuit accusing Adobe of training its SlimLM AI model on pirated books without permission. The case centers on the Books3 dataset containing 191,000 copyrighted works, allegedly incorporated through the SlimPajama-627B training data. This marks the first major copyright infringement case against Adobe, joining similar lawsuits targeting Apple, Salesforce, and other tech companies over unauthorized data use in AI development.
Adobe is facing a proposed class-action lawsuit filed by Elizabeth Lyon, an Oregon-based author who specializes in guidebooks for non-fiction writing
1
. The complaint accuses the software giant of misusing authors' work by training its SlimLM AI model on pirated books without consent, credit, or compensation2
. This case represents the first major copyright infringement litigation targeting Adobe's AI training practices, adding the company to a growing list of tech industry defendants facing similar allegations5
.
Source: Analytics Insight
The lawsuit centers on Adobe's SlimLM, a series of small language models optimized for document assistance tasks on mobile devices
1
. Lyon claims her copyrighted materials were included in the training data used to develop these language models, representing unauthorized data use that violates intellectual property rights3
.At the heart of the complaint lies a controversial chain of data sourcing. Adobe states that SlimLM was pre-trained on SlimPajama-627B, a deduplicated, multi-corpora, open-source dataset released by Cerebras in June 2023
1
. However, Lyon's lawsuit argues that SlimPajama-627B is a derivative of the RedPajama dataset, which allegedly contains Books3—a massive collection of 191,000 pirated books widely used to train genAI systems1
.The complaint explicitly states: "The SlimPajama dataset was created by copying and manipulating the RedPajama dataset (including copying Books3). Thus, because it is a derivative copy of the RedPajama dataset, SlimPajama contains the Books3 dataset, including the copyrighted works of Plaintiff and the Class members". The lawsuit further alleges that Adobe "repeatedly downloaded, copied, and processed those works during the preprocessing and pretraining of the models"
2
.This Adobe lawsuit reflects a broader crisis facing the tech industry as creators' work becomes central to AI development disputes. Books3 and RedPajama have emerged as recurring elements in multiple legal battles. In September, Apple faced litigation claiming the company used copyrighted material to train its Apple Intelligence model through the RedPajama dataset "without consent and without credit or compensation". Salesforce encountered similar accusations in October regarding its use of RedPajama for training purposes.

Source: Digit
The most significant precedent came when Anthropic agreed to pay $1.5 billion to settle claims from authors who accused the company of using pirated versions of their work to train AI models, including its chatbot Claude. This settlement became the largest ever recorded in a copyright-related case and is viewed as a potential turning point in ongoing legal battles over training data
5
. OpenAI also faces similar lawsuits from authors, artists, and publishers challenging unauthorized data use5
.Related Stories
Lyon states she is "committed to vigorously prosecuting this action on behalf of the other members of the class" and possesses the "financial resources to do so"
2
. The plaintiff seeks statutory and other damages, reimbursement of attorney fees, and a declaration of willful infringement from Adobe2
. While the complaint does not specify an exact compensation amount, the case could have significant financial implications given the Anthropic precedent5
.
Source: TechRadar
For the tech industry, these lawsuits signal that using pirated books to train AI models carries substantial legal and financial risks. As AI algorithms require massive datasets for training, companies must navigate the complex intersection of open-source resources, data sourcing transparency, and copyright law. The outcome of this class-action lawsuit could influence how companies document their training data provenance and whether they implement more rigorous vetting processes to avoid use of copyrighted content. Adobe has denied the allegations but has not yet provided a formal public response to the complaint
2
. The case will test whether companies can rely on open-source datasets without liability when those datasets allegedly contain derivative copies of protected works.Summarized by
Navi
[1]
[3]
[4]