Google Introduces Ironwood, Its First Inference-Optimized AI Chip
Unveiled at Cloud Next 2025, Ironwood delivers unprecedented performance and efficiency for real-time AI applications, marking a shift in Google's AI strategy.
- Ironwood is Google's seventh-generation TPU, purpose-built for inference workloads, offering double the performance per watt compared to its predecessor, Trillium.
- The chip comes in two configurations: a 256-chip cluster and a 9,216-chip cluster, delivering up to 42.5 exaFLOPS of computing power for demanding AI tasks.
- Enhanced memory and inter-chip connectivity enable Ironwood to support larger AI models and reduce latency, with 192 GB of high-bandwidth memory per chip and 1.2 Tbps bidirectional bandwidth.
- Integrated into Google's AI Hypercomputer architecture, Ironwood is designed to power advanced models like Gemini 2.5, supporting scalable deployment for enterprise AI applications.
- Available exclusively through Google Cloud later in 2025, Ironwood positions Google to challenge Nvidia by addressing the cost and energy demands of AI inference at scale.