Particle.news
Download on the App Store

Salesforce Sued by Authors Over Alleged Use of Pirated Books to Train XGen AI

The complaint points to Salesforce materials that initially listed Books3‑linked RedPajama‑Books as training data.

Overview

  • Novelists E. Molly Tanzer and Jennifer Gilmore filed a proposed class action on Wednesday in San Francisco federal court alleging Salesforce trained its XGen models on copyrighted books without permission.
  • The suit claims Salesforce used The Pile’s Books3 corpus to develop CodeGen in 2022 and cited RedPajama‑Books for XGen in 2023 before later scrubbing those references in favor of vague descriptions of “publicly available” data.
  • Plaintiffs allege ongoing infringement, saying Salesforce continues to store and process datasets containing copies of their books, and they seek class certification, damages, disgorgement, and destruction of infringing copies.
  • A Salesforce spokesperson declined to comment, while the filing quotes CEO Marc Benioff’s past remarks that AI companies used “stolen” training data and should compensate creators.
  • The case arrives during a broader wave of copyright lawsuits over AI training data, with Reuters noting a $1.5 billion author settlement with Anthropic and one outlet reporting this as the first such suit targeting Salesforce.