Particle: Nvidia Says Vera Rubin AI Platform Is in Production at CES 2026, Promising 10x Cheaper Inference

Overview

Nvidia unveiled Vera Rubin as a rack-scale, extreme co‑designed system built from six components — Rubin GPU, Vera CPU, NVLink 6 switch, Spectrum‑6 Ethernet, ConnectX‑9 SuperNIC, and BlueField‑4 DPU — assembled as the NVL72.
Compared with Blackwell, Nvidia claims up to 5x faster inference and roughly 10x lower cost per token, with design targets that also improve training efficiency for mixture‑of‑experts and long‑context workloads.
The company introduced an Inference Context Memory Storage platform that creates an AI‑native KV‑cache tier, with stated gains of up to 5x tokens per second, 5x better performance per TCO dollar, and 5x better power efficiency.
Nvidia said Microsoft Azure and CoreWeave will be among the first cloud providers to offer Rubin‑based instances, with products expected from partners in the second half of 2026.
Nvidia expanded its open‑weight model portfolio and launched Alpamayo, a reasoning VLA family for autonomous driving, releasing 1,700 hours of training data and highlighting an upcoming first passenger deployment in the new Mercedes‑Benz CLA.