Overview
- Researchers from Stanford, Cornell and West Virginia universities found Llama3.1 memorised roughly 42% of Harry Potter and can replicate 50-word excerpts about half the time.
- Court filings show the model was trained on the Books3 dataset of nearly 200,000 copyrighted works obtained via torrent, according to authors including Sarah Silverman.
- Stanford tech law expert Mark Lemley estimates that if just 3% of Books3 content is infringing, Meta could face close to $1 billion in statutory damages.
- Llama3.1’s ability to output verbatim passages from works such as The Great Gatsby and 1984 highlights regulatory gaps in how copyright law addresses AI training.
- Authors and publishers are calling for clear compensation frameworks and updated regulations to protect creative rights in AI development.