Particle.news

Download on the App Store

OpenAI Accused of Training GPT-4o on Copyrighted O'Reilly Media Books

A study by the AI Disclosures Project alleges OpenAI used copyrighted material without permission, reigniting calls for transparency and licensing in AI development.

Image

Overview

  • The AI Disclosures Project claims OpenAI's GPT-4o model was trained on copyrighted O'Reilly Media books without authorization, using a detection method called DE-COP.
  • The study found GPT-4o demonstrated strong recognition of O'Reilly Media content, achieving an 82% AURUC score, while older models like GPT-3.5 Turbo showed lower but still significant recognition.
  • Researchers tested 3,962 paragraph excerpts from 34 O'Reilly books, using paraphrased content generated by Claude 3.5 Sonnet to evaluate model familiarity with copyrighted material.
  • Tim O'Reilly, CEO of O'Reilly Media, co-authored the study, which highlights systemic challenges in AI training transparency and the need for formal licensing frameworks.
  • OpenAI has not yet responded to the allegations, which add to ongoing debates over intellectual property and ethical AI development practices.