Adobe faces class-action lawsuit over alleged use of pirated books in AI training

Reviewed byNidhi Govil

6 Sources

Share

Oregon author Elizabeth Lyon filed a proposed class-action lawsuit against Adobe, claiming the company used pirated versions of her books and thousands of others to train its SlimLM AI model. The case centers on the Books3 dataset, which contains 191,000 copyrighted works allegedly incorporated without permission or compensation through the SlimPajama training dataset.

News article

Adobe Accused of Copyright Infringement in AI Training

Adobe faces a proposed class-action lawsuit filed by Elizabeth Lyon, an Oregon-based author, who accuses the software giant of using pirated books for AI training without authorization or compensation

1

. The lawsuit, filed on December 16, 2025, in the US District Court in the Northern District of California, alleges that Adobe trained its SlimLM AI model on copyrighted literary works, including Lyon's guidebooks for non-fiction writing

4

. This marks Adobe's first major copyright challenge over AI training data, arriving as legal scrutiny intensifies across the tech industry.

SlimLM AI Model at Center of Legal Battle

The SlimLM AI model represents a series of small language models that Adobe optimized specifically for document assistance tasks on mobile devices, including smartphones, tablets, and laptops

3

. Adobe states that it pre-trained SlimLM using the SlimPajama-627B dataset, an open-source dataset released by Cerebras in June 2023

1

. The company describes this as a "deduplicated, multi-corpora" resource designed for large-scale language modeling tasks

5

. However, Lyon's complaint challenges this characterization, arguing that the SlimPajama dataset is a derivative copy of the RedPajama dataset, which allegedly contains the controversial Books3 dataset

2

.

Books3 Dataset Emerges as Persistent Legal Flashpoint

The Books3 dataset has become a recurring source of legal trouble for the tech community. This collection comprises 191,000 books sourced from Bibliotik, a private tracker containing a mix of fiction and non-fiction works, many of which remain under copyright

1

. The lawsuit states verbatim: "The SlimPajama dataset was created by copying and manipulating the RedPajama dataset (including copying Books3). Thus, because it is a derivative copy of the RedPajama dataset, SlimPajama contains the Books3 dataset, including the copyrighted works of Plaintiff and the Class members"

3

. The complaint further alleges that Adobe "repeatedly downloaded, copied, and processed those works during the preprocessing and pretraining of the models"

2

.

Broader Pattern of AI Copyright Litigation

The RedPajama dataset has been cited in multiple litigation cases throughout 2025. In September, a lawsuit against Apple claimed the company used copyrighted material from the same dataset to train its Apple Intelligence model, accusing the tech giant of copying protected works "without consent and without credit or compensation"

1

. A similar lawsuit against Salesforce in October also alleged the company used RedPajama for training purposes

1

. Most notably, Anthropic agreed to pay $1.5 billion to authors who sued over the use of pirated versions of their work to train its chatbot, Claude, marking the largest copyright recovery on record and a potential turning point in ongoing legal battles over intellectual property in generative AI systems

1

4

.

Plaintiff Seeks Damages and Injunctive Relief

Lyon brings the action on behalf of herself and all similarly affected US copyright holders. She states that she is "committed to vigorously prosecuting this action on behalf of the other members of the class" and possesses the "financial resources to do so"

2

. The lawsuit seeks class certification, unspecified monetary damages, injunctive and declaratory relief, and an order requiring Adobe to destroy or dispose of allegedly infringing copies of copyrighted works

4

. Lyon is seeking "an award of statutory and other damages," reimbursement of attorney fees, and a declaration of willful infringement from Adobe

2

.

Implications for Data Sourcing and Legal Risks

This case brings renewed attention to how widely deployed AI systems are built and commercialized, particularly when they rely on large text corpora assembled from third-party sources

4

. By focusing on Adobe's SlimLM models, the lawsuit shifts scrutiny to small language models embedded directly into mainstream productivity software used by millions, rather than standalone or experimental tools. The dispute underscores legal risks surrounding derivative datasets such as SlimPajama, which are marketed as cleaned or deduplicated but still trace back to repositories containing copyrighted books

4

. If courts accept the plaintiffs' arguments, AI companies could face exposure not only for directly copying works but also for incorporating datasets that inherit infringement through earlier sources. The remedies sought carry broader implications for how tech companies approach training data, potentially forcing a fundamental shift in data sourcing practices across the industry as misusing authors' work becomes increasingly untenable from both legal and reputational standpoints.

Today's Top Stories

TheOutpost.ai

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

© 2026 Triveous Technologies Private Limited
Instagram logo
LinkedIn logo