Overview
- Initial Maia 200 systems are live in Azure’s US Central region in Iowa with a planned expansion to Phoenix and other U.S. sites.
- The accelerator delivers 10.1 PFLOPS at FP4 and about 5 PFLOPS at FP8 with 272 MB of inline SRAM, 216 GB of HBM3e, and 7 TB/s of memory bandwidth.
- Microsoft says Maia 200 is its most efficient inference deployment to date and promises up to 30% better performance per dollar than its current hardware.
- Early units are assigned to the Microsoft Superintelligence team, Microsoft Foundry, and Copilot, and they will run the latest OpenAI models on Azure.
- Fabricated by TSMC on a 3 nm process, Maia 200 advances Microsoft’s in‑house silicon strategy as the company designs a successor called Maia 300 and positions the lineup as an alternative to Nvidia and rival cloud chips.