Overview
- Maia 200 is live in Azure’s U.S. Central (Iowa) data center with U.S. West 3 (Phoenix) next, and a preview SDK is open to developers, academics and frontier labs.
- Built on TSMC’s 3nm process with 216GB of HBM3e, the chip delivers over 10 PFLOPS at FP4 and about 5 PFLOPS at FP8, with four chips per server and Ethernet-based scaling to as many as 6,144 chips.
- Microsoft says Maia 200 offers roughly 30% better performance per dollar than alternative hardware in its fleet, with about 3× FP4 performance versus AWS Trainium3 and FP8 performance above Google’s seventh‑generation TPU.
- Early deployment will power Microsoft 365 Copilot, Microsoft Foundry and the Superintelligence team, serving multiple models including OpenAI’s GPT‑5.2.
- The company casts Maia 200 as an inference‑optimized option to reduce reliance on Nvidia and lower per‑token serving costs, with broader customer availability planned for the future.