Particle.news
Download on the App Store

NVIDIA Sweeps MLPerf Training v5.1 With Blackwell Ultra, Sets 10‑Minute Llama 405B Record

FP4 training plus 800 Gb/s InfiniBand power large generational gains across rack‑scale GB300 NVL72 systems.

Overview

  • NVIDIA posted the fastest time to train on all seven MLPerf Training v5.1 tests and was the only platform to submit results on every benchmark.
  • The debuting GB300 NVL72 with Blackwell Ultra delivered over 4x faster Llama 3.1 405B pretraining and nearly 5x faster Llama 2 70B LoRA fine‑tuning versus Hopper using the same GPU counts.
  • A new record time was set for Llama 3.1 405B at 10 minutes using more than 5,000 Blackwell GPUs, with a 2,560‑GPU run clocking 18.79 minutes for a 45% improvement over a similar prior submission.
  • NVIDIA set records on new tests, training Llama 3.1 8B in 5.2 minutes with up to 512 Blackwell Ultra GPUs and FLUX.1 in 12.5 minutes with 1,152 GPUs, and it was the sole FLUX.1 submitter.
  • This round marked the first MLPerf Training results using FP4 while meeting accuracy targets, alongside the Quantum‑X800 InfiniBand platform that doubled scale‑out bandwidth and broad partner participation across 15 organizations.