Overview
- Rubin is a tightly co‑designed six‑chip platform combining Rubin GPUs, a new Vera CPU, NVLink 6, Spectrum‑X networking, ConnectX‑9 SuperNICs and BlueField‑4 DPUs, with key parts built on TSMC’s 3 nm process.
- Nvidia says Rubin delivers about 3.5× faster training and 5× faster inference than Blackwell, reaches up to roughly 50 petaflops, and can deliver AI tokens at about one‑tenth the cost of the prior generation.
- A new AI‑native inference context memory storage tier targets KV‑cache bottlenecks, with Nvidia citing up to 5× more tokens per second, 5× better performance per TCO dollar and 5× better power efficiency.
- Early users include Microsoft, AWS, Google Cloud and CoreWeave, with partner products and cloud instances slated to begin rolling out in the second half of 2026 alongside the rack‑scale, liquid‑cooled Vera Rubin NVL72.
- Analysts note that “full production” for advanced chips typically starts with low‑volume ramps and validation, so broad availability and ecosystem scale‑up remain to be demonstrated.