5 Sources
5 Sources
[1]
Salesforce sued by authors over artificial intelligence software
Oct 16 (Reuters) - Cloud-computing firm Salesforce (CRM.N), opens new tab was hit with a proposed class action lawsuit by two authors who alleged the company used thousands of books without permission to train its artificial intelligence software. Novelists Molly Tanzer and Jennifer Gilmore said in the complaint, opens new tab filed on Wednesday that Salesforce infringed copyrights by using their work to train its xGen AI models to process language. A Salesforce spokesperson declined to comment on the lawsuit on Thursday. "It's important that companies that use copyrighted material for ... AI products are transparent," attorney Joseph Saveri, who represents the authors and has brought similar lawsuits on behalf of copyright owners against tech companies, said on Thursday. "It's also only fair that our clients are fairly compensated when this happens." Authors, news outlets and other content owners have filed dozens of lawsuits against tech companies including OpenAI, Microsoft and Meta Platforms for allegedly misusing their material in AI training. Anthropic agreed to a landmark $1.5 billion settlement, opens new tab with a separate group of authors suing it for copyright infringement in August. Tanzer and Gilmore said in their lawsuit that Salesforce used thousands of pirated books written by them and others to train xGen. The lawsuit said that Salesforce CEO Marc Benioff has previously criticized AI companies for using "stolen" training data to build their models and said that paying content creators for their work would be "very easy to do." "Benioff is right -- technology companies like Benioff's own Salesforce that use the intellectual property of copyright holders like Plaintiffs and Class members should fairly compensate them," the complaint said. Reporting by Blake Brittain in Washington; Editing by Cynthia Osterman Our Standards: The Thomson Reuters Trust Principles., opens new tab * Suggested Topics: * Artificial Intelligence * Consumer Protection Blake Brittain Thomson Reuters Blake Brittain reports on intellectual property law, including patents, trademarks, copyrights and trade secrets, for Reuters Legal. He has previously written for Bloomberg Law and Thomson Reuters Practical Law and practiced as an attorney.
[2]
Salesforce Faces Class Action Over Alleged Illegal AI Training Data - Decrypt
Salesforce CEO Marc Benioff previously said AI companies "ripped off" training data and "all the training data has been stolen," in an interview with Bloomberg. A new class action lawsuit in San Francisco federal court has accused software giant Salesforce of building its XGen AI models on a pirated library of books and then scrubbing references to those sources once questions arose. Filed on Wednesday by authors E. Molly Tanzer and Jennifer Gilmore, the suit is brought under the Copyright Act, alleging ongoing infringement, saying Salesforce "continues to do so by continuing to store, copy, use, and process the datasets containing copies of Plaintiffs' ... copyrighted books." The complaint says Salesforce.INC "pirated hundreds of thousands of copyrighted books to develop its XGen series of large language models," relying on the "notorious RedPajama and The Pile datasets" that include a books corpus known as Books3, a collection of over 196,000 books copied from the private tracker Bibliotik. The filing says Salesforce initially listed "RedPajama-Books" among its training sources when it launched XGen in June 2023, with a company engineer linking GitHub users directly to both datasets. By September, however, Salesforce allegedly deleted those references from its website and replaced them with vague descriptions of "natural language data" drawn from "publicly available sources." Hugging Face, the platform hosting Books3, removed the dataset the following month, citing copyright complaints, the lawsuit says. The lawsuit alleges that Salesforce used The Pile to train its CodeGen models in 2022, then commercialized the technology through its Agentforce AI platform, including the XGen-Sales model released in October 2024. Two months later, Salesforce allegedly scrubbed its disclosures, deleting charts and references to "RedPajama-Books" and replacing them with vague language about a "mixture of publicly available data," before claiming by December 2023 that its models used a "legally compliant dataset" with no mention of RedPajama. Ishita Sharma, managing partner at Fathom Legal, told Decrypt that authors must "prove real financial harm, not just that their books were used for training," noting how Judge Vince Chhabria recently dismissed similar claims against Meta, ruling that "simply claiming 'our work was used' isn't enough." Recent rulings favored OpenAI and Anthropic in similar cases, with judges finding authors failed to prove market harm, though one criticized Anthropic for maintaining "a permanent library of pirated books." 'Using public datasets like RedPajama or The Pile doesn't automatically erase willful infringement," Sharma said, adding, "if they knew or ignored that copyrighted works were included, courts could still find reckless disregard." "Unless the AI can reproduce parts of the original work, the model weights themselves aren't considered copyright infringement," she added. The complaint cites statements from Salesforce CEO Marc Benioff, who told a Bloomberg interviewer in January 2024 that AI companies "ripped off" training data and that "all the training data has been stolen." The authors seek class certification for all U.S. copyright holders whose works were used since October 2022, demanding statutory damages, destruction of infringing copies, profit disgorgement, a willful infringement declaration, and attorneys' fees.
[3]
Salesforce sued by authors over artificial intelligence software
Salesforce faces a proposed class action lawsuit from two authors alleging the company used thousands of copyrighted books without permission to train its AI models. Novelists Molly Tanzer and Jennifer Gilmore claim their works were infringed upon to develop Salesforce's xGen AI. Attorney Joseph Saveri emphasized the need for transparency and fair compensation for content creators. Cloud-computing firm Salesforce was hit with a proposed class action lawsuit by two authors who alleged the company used thousands of books without permission to train its artificial intelligence software. Novelists Molly Tanzer and Jennifer Gilmore said in the complaint filed on Wednesday that Salesforce infringed copyrights by using their work to train its xGen AI models to process language. A Salesforce spokesperson declined to comment on the lawsuit on Thursday. "It's important that companies that use copyrighted material for ... AI products are transparent," attorney Joseph Saveri, who represents the authors and has brought similar lawsuits on behalf of copyright owners against tech companies, said on Thursday. "It's also only fair that our clients are fairly compensated when this happens." Authors, news outlets and other content owners have filed dozens of lawsuits against tech companies including OpenAI, Microsoft and Meta Platforms for allegedly misusing their material in AI training. Anthropic agreed to a landmark $1.5 billion settlement with a separate group of authors suing it for copyright infringement in August. Tanzer and Gilmore said in their lawsuit that Salesforce used thousands of pirated books written by them and others to train xGen. The lawsuit said that Salesforce CEO Marc Benioff has previously criticized AI companies for using "stolen" training data to build their models and said that paying content creators for their work would be "very easy to do." "Benioff is right - technology companies like Benioff's own Salesforce that use the intellectual property of copyright holders like Plaintiffs and Class members should fairly compensate them," the complaint said.
[4]
Salesforce faces copyright lawsuit from authors over training AI models
Salesforce (NYSE:CRM) is facing a proposed class action lawsuit by two authors who allege that the company used thousands of books without permission to train its AI software. Authors E. Molly Tanzer and Jennifer Gilmore said in the complaint that Salesforce If the class action lawsuit succeeds, Salesforce could face financial penalties or compensation claims by authors. Salesforce faces potential allegations of copyright infringement for using datasets that included unauthorized copyrighted books to train its AI models. Like OpenAI, Microsoft, and Meta, Salesforce is now involved in lawsuits over alleged misuse of copyrighted content for AI model training.
[5]
Salesforce sued by authors over artificial intelligence software
(Reuters) -Cloud-computing firm Salesforce was hit with a proposed class action lawsuit by two authors who alleged the company used thousands of books without permission to train its artificial intelligence software. Novelists Molly Tanzer and Jennifer Gilmore said in the complaint filed on Wednesday that Salesforce infringed copyrights by using their work to train its xGen AI models to process language. A Salesforce spokesperson declined to comment on the lawsuit on Thursday. "It's important that companies that use copyrighted material for ... AI products are transparent," attorney Joseph Saveri, who represents the authors and has brought similar lawsuits on behalf of copyright owners against tech companies, said on Thursday. "It's also only fair that our clients are fairly compensated when this happens." Authors, news outlets and other content owners have filed dozens of lawsuits against tech companies including OpenAI, Microsoft and Meta Platforms for allegedly misusing their material in AI training. Anthropic agreed to a landmark $1.5 billion settlement with a separate group of authors suing it for copyright infringement in August. Tanzer and Gilmore said in their lawsuit that Salesforce used thousands of pirated books written by them and others to train xGen. The lawsuit said that Salesforce CEO Marc Benioff has previously criticized AI companies for using "stolen" training data to build their models and said that paying content creators for their work would be "very easy to do." "Benioff is right -- technology companies like Benioff's own Salesforce that use the intellectual property of copyright holders like Plaintiffs and Class members should fairly compensate them," the complaint said. (Reporting by Blake Brittain in Washington; Editing by Cynthia Osterman)
Share
Share
Copy Link
Authors Molly Tanzer and Jennifer Gilmore have filed a proposed class action lawsuit against Salesforce, accusing the company of using thousands of copyrighted books without permission to train its AI models. This case highlights the escalating legal challenges surrounding intellectual property rights in the rapidly evolving AI development landscape.
Cloud software giant Salesforce (CRM.N) is facing a proposed class action lawsuit, with authors Molly Tanzer and Jennifer Gilmore alleging that the company utilized thousands of copyrighted books without authorization to train its artificial intelligence software
1
. Filed in a San Francisco federal court, the lawsuit asserts that Salesforce engaged in copyright infringement under the Copyright Act, intensifying the ongoing debate surrounding intellectual property rights and AI development2
.
Source: Reuters
Tanzer and Gilmore claim that Salesforce integrated pirated books, including their own literary works, into its xGen AI models designed for advanced language processing. The complaint specifically points to the use of datasets like RedPajama and The Pile, which reportedly include a book corpus known as Books3. This collection, comprising over 196,000 books, is alleged to have been illicitly copied from the private tracker Bibliotik
2
.According to the lawsuit, Salesforce initially acknowledged using "RedPajama-Books" as a training source when it launched xGen in June 2023. However, by September, the company purportedly removed these explicit references from its website, substituting them with more ambiguous descriptions of "natural language data" sourced from "publicly available sources"
2
.
Source: Seeking Alpha
Adding an ironic twist, the lawsuit cites earlier statements made by Salesforce CEO Marc Benioff. In a January 2024 Bloomberg interview, Benioff openly criticized AI companies for leveraging "stolen" training data, remarking that "all the training data has been stolen" and suggesting that compensating content creators would be "very easy to do"
2
3
. These comments are now being used against the company in the legal proceedings.This legal action is not isolated; it is part of a growing wave of lawsuits targeting technology companies over their use of copyrighted material for AI training. Similar cases have been brought against industry giants like OpenAI, Microsoft, and Meta Platforms
1
. A notable development occurred in August when Anthropic reached a significant $1.5 billion settlement with a group of authors in a comparable copyright infringement dispute, setting a potential precedent for future cases3
.Related Stories
Ishita Sharma, managing partner at Fathom Legal, commented on the challenges authors face, stating they must "prove real financial harm, not just that their books were used for training." While some recent rulings have favored AI companies due to authors' inability to demonstrate market harm, Sharma emphasized that "using public datasets like RedPajama or The Pile doesn't automatically erase willful infringement"
2
.Should the class action lawsuit prove successful, Salesforce could face substantial financial penalties or be ordered to compensate authors. The plaintiffs are seeking class certification for all U.S. copyright holders whose works have been utilized since October 2022. Their demands include statutory damages, the destruction of infringing copies, profit disgorgement, a declaration of willful infringement, and attorneys' fees
2
.Salesforce has, as of now, declined to comment on the ongoing lawsuit
5
. This case further highlights the critical need for clear legal frameworks and ethical guidelines to navigate the complex intersection of AI advancement and intellectual property rights.Summarized by
Navi
[3]
[5]