Particle News: OpenAI Accused of Training GPT-4o on Copyrighted O'Reilly Media Books

Overview

The AI Disclosures Project claims OpenAI's GPT-4o model was trained on copyrighted O'Reilly Media books without authorization, using a detection method called DE-COP.
The study found GPT-4o demonstrated strong recognition of O'Reilly Media content, achieving an 82% AURUC score, while older models like GPT-3.5 Turbo showed lower but still significant recognition.
Researchers tested 3,962 paragraph excerpts from 34 O'Reilly books, using paraphrased content generated by Claude 3.5 Sonnet to evaluate model familiarity with copyrighted material.
Tim O'Reilly, CEO of O'Reilly Media, co-authored the study, which highlights systemic challenges in AI training transparency and the need for formal licensing frameworks.
OpenAI has not yet responded to the allegations, which add to ongoing debates over intellectual property and ethical AI development practices.