Particle News: Wikimedia Faces Mounting Strain from AI Crawlers as Bandwidth Usage Soars 50%

Overview

Since January 2024, bandwidth usage for multimedia downloads on Wikimedia Commons has surged by 50%, driven by AI crawlers scraping content for model training.
Bots now account for 65% of Wikimedia's most resource-intensive traffic, disproportionately accessing obscure, non-cached pages that are costly to serve.
Wikimedia's Site Reliability team is frequently forced to block AI crawlers to prevent disruptions for human users, diverting resources from core operations.
Many AI crawlers evade detection by ignoring robots.txt directives, spoofing user agents, and rotating IP addresses, complicating mitigation efforts.
The Wikimedia Foundation is actively exploring systemic solutions, such as developer guidelines and sustainable access methods, to address escalating costs and preserve its commitment to open knowledge sharing.