Overview
- Azure confirmed a live cluster linking more than 4,600 NVIDIA Blackwell Ultra GPUs, marking the first at-scale GB300 NVL72 deployment in production.
- Microsoft says the system is purpose-built for OpenAI’s most demanding inference workloads and future multitrillion‑parameter reasoning models.
- Each NVL72 rack integrates 72 Blackwell Ultra GPUs and 36 Grace CPUs with 130 TB/s NVLink bandwidth, 37 TB of fast memory, and up to 1,440 PFLOPS of FP4 Tensor performance.
- A full fat‑tree architecture using NVIDIA Quantum‑X800 InfiniBand provides 800 Gb/s per GPU to scale across racks for training and serving large, long‑context models.
- NVIDIA cites MLPerf Inference v5.1 records for GB300, while Microsoft calls this the first of many deployments as it scales to hundreds of thousands of Blackwell Ultra GPUs and tackles power and cooling at data‑center scale.