Overview
- Since January 2024, bandwidth usage for multimedia downloads on Wikimedia Commons has surged by 50%, driven by AI crawlers scraping content for model training.
- Bots now account for 65% of Wikimedia's most resource-intensive traffic, disproportionately accessing obscure, non-cached pages that are costly to serve.
- Wikimedia's Site Reliability team is frequently forced to block AI crawlers to prevent disruptions for human users, diverting resources from core operations.
- Many AI crawlers evade detection by ignoring robots.txt directives, spoofing user agents, and rotating IP addresses, complicating mitigation efforts.
- The Wikimedia Foundation is actively exploring systemic solutions, such as developer guidelines and sustainable access methods, to address escalating costs and preserve its commitment to open knowledge sharing.