Particle News: Meta’s Llama3.1 Is Reproducing Copyrighted Books Verbatim

Overview

Researchers from Stanford, Cornell and West Virginia universities found Llama3.1 memorised roughly 42% of Harry Potter and can replicate 50-word excerpts about half the time.
Court filings show the model was trained on the Books3 dataset of nearly 200,000 copyrighted works obtained via torrent, according to authors including Sarah Silverman.
Stanford tech law expert Mark Lemley estimates that if just 3% of Books3 content is infringing, Meta could face close to $1 billion in statutory damages.
Llama3.1’s ability to output verbatim passages from works such as The Great Gatsby and 1984 highlights regulatory gaps in how copyright law addresses AI training.
Authors and publishers are calling for clear compensation frameworks and updated regulations to protect creative rights in AI development.