Particle.news
Download on the App Store

Nvidia Says Vera Rubin AI Platform Is in Full Production, Targets H2 2026 Rollout

The company frames Rubin as a tightly integrated system for agentic, long‑context AI with sharply lower inference costs.

Overview

  • Rubin is a tightly co‑designed six‑chip platform combining Rubin GPUs, a new Vera CPU, NVLink 6, Spectrum‑X networking, ConnectX‑9 SuperNICs and BlueField‑4 DPUs, with key parts built on TSMC’s 3 nm process.
  • Nvidia says Rubin delivers about 3.5× faster training and 5× faster inference than Blackwell, reaches up to roughly 50 petaflops, and can deliver AI tokens at about one‑tenth the cost of the prior generation.
  • A new AI‑native inference context memory storage tier targets KV‑cache bottlenecks, with Nvidia citing up to 5× more tokens per second, 5× better performance per TCO dollar and 5× better power efficiency.
  • Early users include Microsoft, AWS, Google Cloud and CoreWeave, with partner products and cloud instances slated to begin rolling out in the second half of 2026 alongside the rack‑scale, liquid‑cooled Vera Rubin NVL72.
  • Analysts note that “full production” for advanced chips typically starts with low‑volume ramps and validation, so broad availability and ecosystem scale‑up remain to be demonstrated.