Adobe faces class-action lawsuit over alleged use of pirated books in AI training data

5 Sources

Share

Oregon author Elizabeth Lyon filed a proposed class-action lawsuit accusing Adobe of training its SlimLM AI model on pirated books without permission. The case centers on the Books3 dataset containing 191,000 copyrighted works, allegedly incorporated through the SlimPajama-627B training data. This marks the first major copyright infringement case against Adobe, joining similar lawsuits targeting Apple, Salesforce, and other tech companies over unauthorized data use in AI development.

Adobe Lawsuit Alleges Copyright Infringement Through Pirated Training Data

Adobe is facing a proposed class-action lawsuit filed by Elizabeth Lyon, an Oregon-based author who specializes in guidebooks for non-fiction writing

1

. The complaint accuses the software giant of misusing authors' work by training its SlimLM AI model on pirated books without consent, credit, or compensation

2

. This case represents the first major copyright infringement litigation targeting Adobe's AI training practices, adding the company to a growing list of tech industry defendants facing similar allegations

5

.

Source: Analytics Insight

Source: Analytics Insight

The lawsuit centers on Adobe's SlimLM, a series of small language models optimized for document assistance tasks on mobile devices

1

. Lyon claims her copyrighted materials were included in the training data used to develop these language models, representing unauthorized data use that violates intellectual property rights

3

.

The Books3 Dataset Connection in AI Training

At the heart of the complaint lies a controversial chain of data sourcing. Adobe states that SlimLM was pre-trained on SlimPajama-627B, a deduplicated, multi-corpora, open-source dataset released by Cerebras in June 2023

1

. However, Lyon's lawsuit argues that SlimPajama-627B is a derivative of the RedPajama dataset, which allegedly contains Books3—a massive collection of 191,000 pirated books widely used to train genAI systems

1

.

The complaint explicitly states: "The SlimPajama dataset was created by copying and manipulating the RedPajama dataset (including copying Books3). Thus, because it is a derivative copy of the RedPajama dataset, SlimPajama contains the Books3 dataset, including the copyrighted works of Plaintiff and the Class members". The lawsuit further alleges that Adobe "repeatedly downloaded, copied, and processed those works during the preprocessing and pretraining of the models"

2

.

Growing Pattern of Legal Challenges Over Use of Copyrighted Content

This Adobe lawsuit reflects a broader crisis facing the tech industry as creators' work becomes central to AI development disputes. Books3 and RedPajama have emerged as recurring elements in multiple legal battles. In September, Apple faced litigation claiming the company used copyrighted material to train its Apple Intelligence model through the RedPajama dataset "without consent and without credit or compensation". Salesforce encountered similar accusations in October regarding its use of RedPajama for training purposes.

Source: Digit

Source: Digit

The most significant precedent came when Anthropic agreed to pay $1.5 billion to settle claims from authors who accused the company of using pirated versions of their work to train AI models, including its chatbot Claude. This settlement became the largest ever recorded in a copyright-related case and is viewed as a potential turning point in ongoing legal battles over training data

5

. OpenAI also faces similar lawsuits from authors, artists, and publishers challenging unauthorized data use

5

.

What This Means for AI Development and Intellectual Property

Lyon states she is "committed to vigorously prosecuting this action on behalf of the other members of the class" and possesses the "financial resources to do so"

2

. The plaintiff seeks statutory and other damages, reimbursement of attorney fees, and a declaration of willful infringement from Adobe

2

. While the complaint does not specify an exact compensation amount, the case could have significant financial implications given the Anthropic precedent

5

.

Source: TechRadar

Source: TechRadar

For the tech industry, these lawsuits signal that using pirated books to train AI models carries substantial legal and financial risks. As AI algorithms require massive datasets for training, companies must navigate the complex intersection of open-source resources, data sourcing transparency, and copyright law. The outcome of this class-action lawsuit could influence how companies document their training data provenance and whether they implement more rigorous vetting processes to avoid use of copyrighted content. Adobe has denied the allegations but has not yet provided a formal public response to the complaint

2

. The case will test whether companies can rely on open-source datasets without liability when those datasets allegedly contain derivative copies of protected works.

Today's Top Stories

TheOutpost.ai

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

© 2025 Triveous Technologies Private Limited
Instagram logo
LinkedIn logo