Adobe Lawsuit: Pirated Books in AI Training Spark Fury

Adobe Accused of Copyright Infringement in AI Training

Adobe faces a proposed class-action lawsuit filed by Elizabeth Lyon, an Oregon-based author, who accuses the software giant of using pirated books for AI training without authorization or compensation1

. The lawsuit, filed on December 16, 2025, in the US District Court in the Northern District of California, alleges that Adobe trained its SlimLM AI model on copyrighted literary works, including Lyon's guidebooks for non-fiction writing4

. This marks Adobe's first major copyright challenge over AI training data, arriving as legal scrutiny intensifies across the tech industry.

SlimLM AI Model at Center of Legal Battle

The SlimLM AI model represents a series of small language models that Adobe optimized specifically for document assistance tasks on mobile devices, including smartphones, tablets, and laptops3

. Adobe states that it pre-trained SlimLM using the SlimPajama-627B dataset, an open-source dataset released by Cerebras in June 20231

. The company describes this as a "deduplicated, multi-corpora" resource designed for large-scale language modeling tasks5

. However, Lyon's complaint challenges this characterization, arguing that the SlimPajama dataset is a derivative copy of the RedPajama dataset, which allegedly contains the controversial Books3 dataset2

Books3 Dataset Emerges as Persistent Legal Flashpoint

The Books3 dataset has become a recurring source of legal trouble for the tech community. This collection comprises 191,000 books sourced from Bibliotik, a private tracker containing a mix of fiction and non-fiction works, many of which remain under copyright1

. The lawsuit states verbatim: "The SlimPajama dataset was created by copying and manipulating the RedPajama dataset (including copying Books3). Thus, because it is a derivative copy of the RedPajama dataset, SlimPajama contains the Books3 dataset, including the copyrighted works of Plaintiff and the Class members"3

. The complaint further alleges that Adobe "repeatedly downloaded, copied, and processed those works during the preprocessing and pretraining of the models"2

Broader Pattern of AI Copyright Litigation

The RedPajama dataset has been cited in multiple litigation cases throughout 2025. In September, a lawsuit against Apple claimed the company used copyrighted material from the same dataset to train its Apple Intelligence model, accusing the tech giant of copying protected works "without consent and without credit or compensation"1

. A similar lawsuit against Salesforce in October also alleged the company used RedPajama for training purposes1

. Most notably, Anthropic agreed to pay $1.5 billion to authors who sued over the use of pirated versions of their work to train its chatbot, Claude, marking the largest copyright recovery on record and a potential turning point in ongoing legal battles over intellectual property in generative AI systems1

Plaintiff Seeks Damages and Injunctive Relief

Lyon brings the action on behalf of herself and all similarly affected US copyright holders. She states that she is "committed to vigorously prosecuting this action on behalf of the other members of the class" and possesses the "financial resources to do so"2

. The lawsuit seeks class certification, unspecified monetary damages, injunctive and declaratory relief, and an order requiring Adobe to destroy or dispose of allegedly infringing copies of copyrighted works4

. Lyon is seeking "an award of statutory and other damages," reimbursement of attorney fees, and a declaration of willful infringement from Adobe2

Implications for Data Sourcing and Legal Risks

This case brings renewed attention to how widely deployed AI systems are built and commercialized, particularly when they rely on large text corpora assembled from third-party sources4

. By focusing on Adobe's SlimLM models, the lawsuit shifts scrutiny to small language models embedded directly into mainstream productivity software used by millions, rather than standalone or experimental tools. The dispute underscores legal risks surrounding derivative datasets such as SlimPajama, which are marketed as cleaned or deduplicated but still trace back to repositories containing copyrighted books4

. If courts accept the plaintiffs' arguments, AI companies could face exposure not only for directly copying works but also for incorporating datasets that inherit infringement through earlier sources. The remedies sought carry broader implications for how tech companies approach training data, potentially forcing a fundamental shift in data sourcing practices across the industry as misusing authors' work becomes increasingly untenable from both legal and reputational standpoints.

Adobe faces class-action lawsuit over alleged use of pirated books in AI training

Adobe Accused of Copyright Infringement in AI Training

SlimLM AI Model at Center of Legal Battle

Books3 Dataset Emerges as Persistent Legal Flashpoint

Broader Pattern of AI Copyright Litigation

Plaintiff Seeks Damages and Injunctive Relief

Implications for Data Sourcing and Legal Risks

References

Adobe hit with proposed class-action, accused of misusing authors' work in AI training | TechCrunch

Adobe faces class action lawsuit after allegedly misusing authors work in AI training

Adobe is sued for using pirated books to train AI

Adobe Faces Lawsuit For Misusing Pirated Books For AI Training

Adobe Sued Over AI Training Data: Pirated Books Allegations Explained

Related Stories

Apple Faces Copyright Lawsuit Over AI Training with Pirated Books

Apple Faces Copyright Infringement Lawsuit Over AI Training Practices

Authors Sue AI Company Anthropic Over Copyright Infringement

Recent Highlights

Google Maps unveils Ask Maps with Gemini AI and 3D Immersive Navigation in biggest update

AI chatbots help plan violent attacks as safety guardrails fail, new investigation reveals

OpenAI secures $110 billion funding round as questions swirl around AI bubble and profitability

Recent Highlights

Today's Top Stories

Meta plans to cut 20% of workforce as AI infrastructure costs and efficiency gains reshape company

AI fakes flood Iran war coverage, creating unprecedented misinformation crisis online

ByteDance halts AI video generator rollout after Hollywood copyright disputes spark legal action

Travis Kalanick emerges from 8-year stealth mode with Atoms, a robotics company betting against humanoids