Overview
- NVIDIA posted the fastest time to train on all seven MLPerf Training v5.1 tests and was the only platform to submit results on every benchmark.
- The debuting GB300 NVL72 with Blackwell Ultra delivered over 4x faster Llama 3.1 405B pretraining and nearly 5x faster Llama 2 70B LoRA fine‑tuning versus Hopper using the same GPU counts.
- A new record time was set for Llama 3.1 405B at 10 minutes using more than 5,000 Blackwell GPUs, with a 2,560‑GPU run clocking 18.79 minutes for a 45% improvement over a similar prior submission.
- NVIDIA set records on new tests, training Llama 3.1 8B in 5.2 minutes with up to 512 Blackwell Ultra GPUs and FLUX.1 in 12.5 minutes with 1,152 GPUs, and it was the sole FLUX.1 submitter.
- This round marked the first MLPerf Training results using FP4 while meeting accuracy targets, alongside the Quantum‑X800 InfiniBand platform that doubled scale‑out bandwidth and broad partner participation across 15 organizations.