Overview
- The AI Disclosures Project claims OpenAI's GPT-4o model was trained on copyrighted O'Reilly Media books without authorization, using a detection method called DE-COP.
- The study found GPT-4o demonstrated strong recognition of O'Reilly Media content, achieving an 82% AURUC score, while older models like GPT-3.5 Turbo showed lower but still significant recognition.
- Researchers tested 3,962 paragraph excerpts from 34 O'Reilly books, using paraphrased content generated by Claude 3.5 Sonnet to evaluate model familiarity with copyrighted material.
- Tim O'Reilly, CEO of O'Reilly Media, co-authored the study, which highlights systemic challenges in AI training transparency and the need for formal licensing frameworks.
- OpenAI has not yet responded to the allegations, which add to ongoing debates over intellectual property and ethical AI development practices.