Overview
- Nvidia introduced Rubin CPX as a context-encoding processor that offloads the compute-heavy prefill phase from generation-focused GPUs.
- CPX will ship in two forms: an integrated Vera Rubin NVL144 CPX compute tray for new builds and standalone CPX racks that attach to existing Rubin deployments.
- Vendor slides cite up to roughly 6.5x faster performance on very large-context tasks versus GB300/Blackwell Ultra, a claim that awaits independent benchmarks.
- Nvidia highlights large codebases and multi-frame video as early use cases, aiming to cut time-to-first-token and boost overall inference throughput.
- Availability is targeted for roughly a year from now, with Reuters reporting launches by the end of next year, and Nvidia pitching improved token economics for data centers.