Overview
- In a post‑event summary and apology published Oct. 23, AWS said a latent race condition in DynamoDB’s automated DNS management produced an incorrect endpoint record that cascaded to roughly 141 services.
- The incident triggered EC2 instance‑launch failures and Network Load Balancer health check errors in US‑East‑1, with AWS reporting services returned to normal operations by about 6 p.m. ET on Oct. 20.
- The outage disrupted thousands of applications worldwide, affecting platforms such as Snapchat, Roblox, Fortnite and Signal, Amazon retail and devices, financial services including UK banks and Coinbase, and UK government services like HMRC.
- AWS said it has disabled the DynamoDB DNS Planner and DNS Enactor globally pending a fix, is adding velocity controls to NLB failover behavior, and is expanding EC2 test suites to better exercise recovery workflows.
- Regulatory and industry reactions focus on concentration risk and resilience, with UK authorities facing calls to scrutinize AWS under the Critical Third Parties regime and experts urging greater transparency, stress testing and contingency planning.