Google Unveils Ironwood, Its Seventh-Generation Tensor Processing Unit for LLMs, Gen AI
Nearly twice as energy efficient as its earlier chips, the company claims — and up to 24 times the speed of the world's fastest computer.
Google has unveiled the Tensor Processing Units (TPUs) that it aims to power Gemini and its other generative artificial intelligence (gen AI) services: Ironwood, which it claims can be scaled to deliver more than 24 times the performance of largest supercomputer to date.
"Ironwood is built to support this next phase of generative AI and its tremendous computational and communication requirements," claims Amin Vahdat of the processor, unveiled during the company's Google Cloud Next 25 event. "It scales up to 9,216 liquid cooled chips linked with breakthrough Inter-Chip Interconnect (ICI) networking spanning nearly 10MW. It is one of several new components of Google Cloud AI Hypercomputer architecture, which optimizes hardware and software together for the most demanding AI workloads. With Ironwood, developers can also leverage Google’s own Pathways software stack to reliably and easily harness the combined computing power of tens of thousands of Ironwood TPUs."
The company's seventh-generation in-house Tensor Processing Unit, Ironwood comes with some strong claims — including a near-doubling in performance-per-watt over its sixth-generation Trillium chips and a near-thirtyfold increase in energy efficiency over Google's first Cloud TPU, launched back in 2018. It comes with support for 192GB of High-Bandwidth Memory (HBM) per chip, six times more than Trillium, and 7.2TB/s of memory bandwidth per chip, four and a half times that of Trillium. Even the inter-chip communication within a pod has been boosted to 1.2Tb/s bidirectional throughput, 50 percent higher than Trillium.
Google isn't just comparing the chip to its own parts, though: the company claims that when scaled to the maximum 9,216-chip "pod" size it can deliver 42.5 exa-floating point operations per second (exaflops) of compute performance — nearly 24 times that of El Capitan, the fastest publicly-disclosed computer built to date, hosted at the Lawrence Livermore National Laboratory and using AMD EPYC and Instinct MI300A chips to deliver 1.742 exaflops from its 11,039,616 cores.
Where El Capitan was built for a range of scientific workloads, though, Ironwood exclusively targets AI — specifically, generative AI. As the size of models and their training sets increase, ever-faster and more power-hungry systems are required to put together the next generation of large language models (LLMs) and other generative AI models, something Google is keen to provide in-house for both its own platforms like Gemini and for third parties through the Google Cloud platform.
"Leading thinking models like Gemini 2.5 and the Nobel Prize winning AlphaFold all run on TPUs today," Vahdat explains, playing somewhat fast and loose with the definition of the word "thinking," "and with Ironwood we can't wait to see what AI breakthroughs are sparked by our own developers and Google Cloud customers when it becomes available later this year."
Pricing for access to Ironwood had not been disclosed at the time of writing.