Perseus Looks to Save Us All From AI "Energy Bloat" — Boosting Training Efficiency Through Slowdowns
By slowing down lighter-loaded processors, researchers have found, it's possible to reduce training energy needs by up to 30 percent.
Researchers from the Universities of Michigan, Washington, and California San Diego have come up with a way to help address growing environmental concerns surround the use of large language models (LLMs) — by reducing "energy bloat" during the training process by up to 30 percent.
"We can't keep building bigger and bigger data centers because we won't have the power to run them," explains Mosharaf Chowdhury, University of Michigan associate professor of computer science and engineering and corresponding author of the work. "If we can reduce the energy consumed by AI, we can reduce AI's carbon footprint and cooling requirements and allow for more computation to fit within our current energy constraints."
With vast numbers of companies attempting to add artificial intelligence technologies, most commonly based around generative large language models (LLMs), to their products, environmental concerns cannot be overlooked. While each subsequent generation of computer hardware is able to perform the tasks of its predecessor more efficiently, they're not being used that way; instead, they're being used to consume equal or greater amounts of power in order to deliver higher performance. The models, too, are becoming more complex, soaking up that increase in power during their energy-hungry training processes.
It's in this process, rather than the point-of-use inference stage, where the team has identified "energy bloat" — wastage that can be clawed back. "AI models today are so large, they cannot fit inside a single computer processor," explains first author Jae-Won Chung. "They need to be divided into tens of thousands of processors to be trained, but dividing the models in perfectly equal sizes across all processors is practically impossible."
The team's solution: Perseus, a tool designed to identify which training tasks will take the longest time to complete — then slowing down the processors handling shorter tasks so that everything finishes at roughly the same time. Counter-intuitively, by allowing lighter-loaded processors to finish at a more leisurely pace the overall power usage of the training is reduced — by, the team claims of its experiments, up to 30 percent.
"Reducing the power cost of AI can have important implications for equitable AI access," Chowdhury says. "If a country doesn't have enough power to run a big model, they might need to use services from far away, or be stuck running smaller, less accurate models. This gap could further perpetuate disparity between different communities."
The team's work has been published at the 30th ACM Symposium on Operating Systems Principles (SOSP '24), with a preprint available on Cornell's arXiv server; Perseus has been released under the permissive Apache 2.0 license as part of the Zeus deep-learning energy measurement and optimization toolkit, under the name Pipeline Frequency Optimizer, with source code on GitHub.