Zuckerberg Approved Use of Pirated Data for Meta’s AI Training, Court Filings Reveal

Newly unredacted documents allege Meta used the LibGen dataset, known for hosting pirated works, to train its Llama AI models despite internal concerns.

Overview

Meta is accused of using the LibGen dataset, a platform hosting pirated books and articles, to train its Llama AI models without proper authorization.
Court filings allege that CEO Mark Zuckerberg personally approved the use of the dataset, despite internal warnings about its legality.
Meta employees reportedly stripped copyright information from the dataset to conceal its use and mitigate potential legal and public relations risks.
The lawsuit, filed by authors including Sarah Silverman and Ta-Nehisi Coates, seeks damages and claims Meta also distributed the pirated materials via torrenting networks.
The case is part of a larger legal battle over whether using copyrighted materials for AI training constitutes fair use under U.S. law.