Particle News: Nvidia Debuts Rubin CPX, a Long-Context GPU to Split LLM Inference Workloads

Overview

Nvidia introduced Rubin CPX as a context-encoding processor that offloads the compute-heavy prefill phase from generation-focused GPUs.
CPX will ship in two forms: an integrated Vera Rubin NVL144 CPX compute tray for new builds and standalone CPX racks that attach to existing Rubin deployments.
Vendor slides cite up to roughly 6.5x faster performance on very large-context tasks versus GB300/Blackwell Ultra, a claim that awaits independent benchmarks.
Nvidia highlights large codebases and multi-frame video as early use cases, aiming to cut time-to-first-token and boost overall inference throughput.
Availability is targeted for roughly a year from now, with Reuters reporting launches by the end of next year, and Nvidia pitching improved token economics for data centers.