Overview
- A ClickHouse permissions change produced a flawed query that doubled a Bot Management “feature file,” exceeded size limits, and crashed edge routing software.
- Because the file was regenerated every five minutes, networks alternated between valid and bad configurations from around 11:20 UTC before a persistent failure set in shortly before 13:00 UTC.
- Engineers stopped generation and propagation of the oversized files, inserted a known‑good version into distribution, and forced core proxy restarts to reload valid data.
- Cloudflare declared the incident resolved by 17:06 UTC and issued an apology, with CEO Matthew Prince stressing the cause was internal rather than a DDoS attack.
- Reported outages affected major services that use Cloudflare, including X, ChatGPT, Canva, Discord, the Cloudflare dashboard, and Turnstile, and the company committed to mitigations such as hardened file ingestion, global kill switches, resource limits on error reports, and failure‑mode reviews.