Particle.news

Download on the App Store

Nvidia Debuts Rubin CPX, a Long-Context GPU to Split LLM Inference Workloads

Built for inputs over one million tokens, the part is slated to ship in about a year with claimed throughput gains for video and code workloads.

Overview

  • Nvidia introduced Rubin CPX as a context-encoding processor that offloads the compute-heavy prefill phase from generation-focused GPUs.
  • CPX will ship in two forms: an integrated Vera Rubin NVL144 CPX compute tray for new builds and standalone CPX racks that attach to existing Rubin deployments.
  • Vendor slides cite up to roughly 6.5x faster performance on very large-context tasks versus GB300/Blackwell Ultra, a claim that awaits independent benchmarks.
  • Nvidia highlights large codebases and multi-frame video as early use cases, aiming to cut time-to-first-token and boost overall inference throughput.
  • Availability is targeted for roughly a year from now, with Reuters reporting launches by the end of next year, and Nvidia pitching improved token economics for data centers.