Particle.news
Download on the App Store

Google Details Ironwood TPU Superpods Built for Inference at Cloud Scale

Google pitches a vertically integrated silicon-to-software system to cut latency and cost per query for large-scale AI serving.

Overview

  • Each Ironwood chip includes 192 GiB of HBM3E, with a Superpod reported at 9,216 chips delivering about 42.5 exaFLOPS of FP8 compute and roughly 1.77 PB of aggregate HBM.
  • The architecture treats a pod as one supercomputer using massive-scale RDMA, a 3D Torus Inter‑Chip Interconnect, and an Optical Circuit Switch that can reroute around unhealthy components.
  • Google targets the stack with XLA and supports both JAX and a native PyTorch experience, while Pallas and the Mosaic backend enable Python-defined custom kernels.
  • Google cites roughly 2× performance per watt versus Trillium and frames Ironwood as optimized for inference where on‑package memory, latency, throughput, and cost per query dominate.
  • Availability is described as coming in the next few weeks across workloads, and reporting indicates the offering is exclusive to Google Cloud, raising lock‑in considerations.