Particle.news

Download on the App Store

Cloudflare De-Lists Perplexity Bots After Unmasking Stealth Crawling

The company has deployed managed rules to detect crawlers that disguised themselves as standard browsers to ignore robots.txt directives

Image
SAN FRANCISCO, CALIFORNIA - OCTOBER 30: (L-R) Devin Coldewey and Aravind Srinivas, Co-Founder & CEO of Perplexity, speak onstage during TechCrunch Disrupt 2024 Day 3 at Moscone Center on October 30, 2024 in San Francisco, California. (Photo by Kimberly White/Getty Images for TechCrunch)
Perplexity accused of sneaky web scraping
Image

Overview

  • Cloudflare research found undisclosed Perplexity crawlers rotated user agents, IP addresses and ASNs to evade robots.txt blocks across tens of thousands of domains
  • Controlled tests on hidden domains confirmed these stealth bots continued to retrieve restricted content, prompting Cloudflare to remove Perplexity from its verified bot program
  • Cloudflare has added heuristics in its bot management system to fingerprint and block the evasive crawlers network-wide
  • Perplexity denied any unauthorized scraping, calling Cloudflare’s evidence a misattribution and labeling the blog post a publicity stunt
  • The dispute highlights industry moves to enforce site directives and develop licensing frameworks as AI systems ingest massive web content