Particle.news

Download on the App Store

Nvidia Debuts Rubin CPX, a Long-Context GPU for Million-Token AI and Video

The chip offloads context processing to reduce latency and is slated to arrive in rack-scale systems by late 2026.

Overview

  • Rubin CPX is purpose-built for massive-context inference, integrating video decoding and encoding with long-context processing on a single chip.
  • Nvidia lists roughly 30 petaFLOPs of NVFP4 compute and up to 128 GB of cost-efficient GDDR7 memory per CPX device.
  • The Vera Rubin NVL144 CPX rack pairs 144 CPX accelerators with 144 Rubin GPUs and 36 Vera CPUs for about 8 exaFLOPs of NVFP4, which Nvidia says is 7.5x the GB300 NVL72.
  • Nvidia will offer CPX as part of new NVL144 CPX racks and as separate trays or racks that attach to existing Rubin deployments.
  • The company pitches use cases such as understanding large software projects and hour-scale generative video, and it claims a $100 million deployment could generate $5 billion in token revenue.