Particle.news

Download on the App Store

Reddit Blocks Wayback Machine From Crawling Posts After Detecting AI Scraping

Reddit says the technical limit will force companies toward paid data licenses after detecting AI firms harvesting content from archives.

Image
Image
Image

Overview

  • Reddit has updated its crawling rules to prevent the Wayback Machine from indexing post detail pages, comments, user profiles and other content, leaving only the homepage accessible
  • The company cites instances of AI firms using archived Reddit pages to scrape data in ways that violate its user privacy and deleted-content policies
  • Reddit informed the Internet Archive in advance of the new restrictions and the two organizations are in ongoing talks over potential compliance or exemptions
  • This policy shift reverses Reddit’s earlier assurance that it would continue granting “good faith” access to archiving services such as the Wayback Machine
  • The move aligns with Reddit’s broader strategy to monetize AI training data through licensing deals with Google and OpenAI and pursue legal action against unauthorized scrapers