Particle News: Google Details Ironwood TPU Superpods Built for Inference at Cloud Scale

Overview

Each Ironwood chip includes 192 GiB of HBM3E, with a Superpod reported at 9,216 chips delivering about 42.5 exaFLOPS of FP8 compute and roughly 1.77 PB of aggregate HBM.
The architecture treats a pod as one supercomputer using massive-scale RDMA, a 3D Torus Inter‑Chip Interconnect, and an Optical Circuit Switch that can reroute around unhealthy components.
Google targets the stack with XLA and supports both JAX and a native PyTorch experience, while Pallas and the Mosaic backend enable Python-defined custom kernels.
Google cites roughly 2× performance per watt versus Trillium and frames Ironwood as optimized for inference where on‑package memory, latency, throughput, and cost per query dominate.
Availability is described as coming in the next few weeks across workloads, and reporting indicates the offering is exclusive to Google Cloud, raising lock‑in considerations.