Overview
- Nvidia’s Rubin CPX is a context/prefill accelerator designed for very long inputs, offloading that phase so HBM-equipped GPUs focus on generation/decoding.
- Within the Vera Rubin NVL144 CPX platform, 144 CPX GPUs work with 144 Rubin GPUs and 36 Vera CPUs to deliver about 8 exaflops (NVFP4), 100 TB of fast memory, and 1.7 PB/s bandwidth in a single rack.
- Each CPX device provides roughly 30 petaFLOPS (NVFP4) and up to 128 GB of GDDR7, with integrated video encode/decode and a reported 3x attention speedup versus GB300-class systems.
- Nvidia says CPX trays can ship with new racks or be added to existing Vera Rubin NVL144 deployments to scale long‑context inference without relying solely on costly HBM.
- The company projects major gains and returns, citing up to 7.5x rack-level AI performance versus GB300 NVL72 and estimating $5 billion in token revenue per $100 million invested, though real‑world results remain to be proven.