Meta Accused of Using Pirated Library to Train AI Models

Unsealed court documents reveal Meta executives approved using copyrighted data from LibGen to develop its Llama AI, raising legal and ethical concerns.

Overview

Internal communications show Meta executives, including CEO Mark Zuckerberg, approved the use of LibGen, a known source of pirated content, to train the Llama AI model.
Meta employees discussed methods to conceal the use of LibGen data, including removing copyright markers and metadata from the training materials.
The decision was reportedly driven by pressure to compete with rivals like OpenAI and Mistral AI, with LibGen deemed essential for achieving state-of-the-art performance benchmarks.
The revelations are part of a class-action lawsuit filed by authors and creators, including Sarah Silverman, accusing Meta of violating copyright laws in AI training.
The case highlights broader industry challenges as AI companies face increasing scrutiny over the use of copyrighted material in training datasets.