Particle.news

Download on the App Store

Microsoft Azure Launches First Production-Scale NVIDIA GB300 NVL72 Cluster for OpenAI

The deployment targets OpenAI’s most demanding inference workloads using a co-designed NVLink–Quantum‑X800 fabric.

Overview

  • Azure introduced the NDv6 GB300 VM series built on liquid-cooled GB300 NVL72 racks optimized for reasoning models and complex multimodal AI.
  • The initial supercomputer links 4,608 NVIDIA Blackwell Ultra GPUs via the Quantum‑X800 InfiniBand platform providing 800 Gb/s of bandwidth per GPU.
  • Within each rack, 72 GPUs and 36 NVIDIA Grace CPUs share 37 TB of fast memory over fifth‑generation NVLink Switch fabric delivering 130 TB/s of intra-rack bandwidth.
  • NVIDIA reports MLPerf Inference v5.1 leadership for GB300 NVL72 systems, including up to 5x higher per‑GPU throughput on the 671‑billion‑parameter DeepSeek‑R1 model compared with Hopper.
  • Microsoft says the rollout required custom liquid cooling, new power distribution and a reengineered software stack, with plans to scale to hundreds of thousands of Blackwell Ultra GPUs.