Salesforce Sued by Authors Over Alleged Copyright Infringement in AI Training Data

Reviewed byNidhi Govil

5 Sources

Share

Authors Molly Tanzer and Jennifer Gilmore have filed a proposed class action lawsuit against Salesforce, accusing the company of using thousands of copyrighted books without permission to train its AI models. This case highlights the escalating legal challenges surrounding intellectual property rights in the rapidly evolving AI development landscape.

Authors File Class Action Against Salesforce for Copyright Infringement in AI Training

Cloud software giant Salesforce (CRM.N) is facing a proposed class action lawsuit, with authors Molly Tanzer and Jennifer Gilmore alleging that the company utilized thousands of copyrighted books without authorization to train its artificial intelligence software

1

. Filed in a San Francisco federal court, the lawsuit asserts that Salesforce engaged in copyright infringement under the Copyright Act, intensifying the ongoing debate surrounding intellectual property rights and AI development

2

.

Source: Reuters

Source: Reuters

Allegations of Unauthorized Content Use

Tanzer and Gilmore claim that Salesforce integrated pirated books, including their own literary works, into its xGen AI models designed for advanced language processing. The complaint specifically points to the use of datasets like RedPajama and The Pile, which reportedly include a book corpus known as Books3. This collection, comprising over 196,000 books, is alleged to have been illicitly copied from the private tracker Bibliotik

2

.

Salesforce's Shifting Stance on Training Data

According to the lawsuit, Salesforce initially acknowledged using "RedPajama-Books" as a training source when it launched xGen in June 2023. However, by September, the company purportedly removed these explicit references from its website, substituting them with more ambiguous descriptions of "natural language data" sourced from "publicly available sources"

2

.

Source: Seeking Alpha

Source: Seeking Alpha

CEO's Past Remarks Surface in Complaint

Adding an ironic twist, the lawsuit cites earlier statements made by Salesforce CEO Marc Benioff. In a January 2024 Bloomberg interview, Benioff openly criticized AI companies for leveraging "stolen" training data, remarking that "all the training data has been stolen" and suggesting that compensating content creators would be "very easy to do"

2

3

. These comments are now being used against the company in the legal proceedings.

Wider Industry Implications and Precedent

This legal action is not isolated; it is part of a growing wave of lawsuits targeting technology companies over their use of copyrighted material for AI training. Similar cases have been brought against industry giants like OpenAI, Microsoft, and Meta Platforms

1

. A notable development occurred in August when Anthropic reached a significant $1.5 billion settlement with a group of authors in a comparable copyright infringement dispute, setting a potential precedent for future cases

3

.

Legal Experts Weigh In

Ishita Sharma, managing partner at Fathom Legal, commented on the challenges authors face, stating they must "prove real financial harm, not just that their books were used for training." While some recent rulings have favored AI companies due to authors' inability to demonstrate market harm, Sharma emphasized that "using public datasets like RedPajama or The Pile doesn't automatically erase willful infringement"

2

.

Potential Outcomes and Demands

Should the class action lawsuit prove successful, Salesforce could face substantial financial penalties or be ordered to compensate authors. The plaintiffs are seeking class certification for all U.S. copyright holders whose works have been utilized since October 2022. Their demands include statutory damages, the destruction of infringing copies, profit disgorgement, a declaration of willful infringement, and attorneys' fees

2

.

Salesforce has, as of now, declined to comment on the ongoing lawsuit

5

. This case further highlights the critical need for clear legal frameworks and ethical guidelines to navigate the complex intersection of AI advancement and intellectual property rights.

Today's Top Stories

TheOutpost.ai

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

© 2025 Triveous Technologies Private Limited
Instagram logo
LinkedIn logo