Overview
- GB10 pairs a MediaTek-designed 20‑core Arm v9.2 CPU die with a Blackwell GPU die on TSMC 3nm using NVLink C2C for 600 GB/s bidirectional coherence, delivering about 1 petaFLOP FP4 and operating at roughly 140W.
- DGX Spark ships with 128GB coherent unified memory for local inference of models up to about 200B parameters and fine‑tuning up to 70B, with ConnectX‑7 enabling two systems to handle roughly 405B‑parameter workloads.
- Blackwell Ultra GB300 employs a dual‑die design with 20,480 CUDA cores, around 208B transistors, up to 288GB HBM3E at 8 TB/s, and NV‑HBI linking the dies at 10 TB/s, plus official PCIe 6.x support and a 1,400W TDP.
- Nvidia’s NVFP4 4‑bit floating format is integrated across the stack and the GB300’s tensor cores are optimized for it, with Nvidia citing FP8‑level accuracy deviations under 1% and substantial memory and throughput gains.
- CoreWeave reports a GB300 NVL72 setup achieved about 6× higher raw throughput per GPU than H100 on the DeepSeek R1 model, while reporting also indicates GB300 systems are moving into production and face export limits in China.