Particle.news

Download on the App Store

NVIDIA Details GB10 ‘Grace Blackwell’ and GB300 ‘Blackwell Ultra’ at Hot Chips 2025

NVIDIA stresses larger memories with faster interconnects to reduce parallelism overhead, keeping bigger models resident per device.

Image
blank
Nvidia
blank

Overview

  • GB10 is a 3nm, dual‑dielet SoC for DGX Spark and AI PC platforms with a 20‑core Arm v9.2 CPU, a Blackwell iGPU, and up to 128GB coherent LPDDR5X UMA in a 140W package.
  • The integrated Blackwell graphics block delivers up to 31 TFLOPs FP32 and 1,000 TOPS NVFP4, with a chip‑to‑chip NVLink‑based interface exposing up to 600 GB/s aggregate bandwidth to the GPU.
  • NVIDIA says GB10’s S‑die uses MediaTek CPU IP, with the collaboration relying on extensive co‑simulation and a custom NVLink C2C link to unify the CPU, memory subsystem, and GPU.
  • GB300 is presented as a dual‑die accelerator with 208 billion transistors, 20,480 CUDA cores, NV‑HBI interconnect rated at 10 TB/s between dies, and up to 288GB HBM3E delivering 8 TB/s per GPU, with NVIDIA citing roughly 50% faster AI performance versus the prior generation.
  • Rack‑scale GB300 NVL72 systems are described at up to 1.1 exaFLOPS dense FP4, and CoreWeave reports a DeepSeek R1 run needing 4 GB300s versus 16 H100s with about 6× higher raw throughput per GB300 GPU.