The ML.ENERGY Benchmark: Toward Automated Inference Energy Measurement and Optimization

Authors

Jae-Won Chung (University of Michigan)
Jiachen Liu (University of Michigan)
Jeff J. Ma (University of Michigan)
Ruofan Wu (University of Michigan)
Oh Jun Kweon (University of Michigan)
Yuxuan Xia (University of Michigan)
Zhiyu Wu (University of Michigan)
Mosharaf Chowdhury (University of Michigan)

Abstract

As Generative AI becomes increasingly integrated into real-world services, energy consumption has become a significant bottleneck—yet it remains under-measured and under-optimized in machine learning (ML) systems. This paper introduces the ML.ENERGY Benchmark and Leaderboard, an open-source suite and evaluation platform designed to measure and compare the inference energy use of AI models in realistic service environments. The authors present four core principles for effective energy benchmarking and illustrate their application within the tool. Results from the benchmark detail energy metrics for 40 popular model architectures across 6 tasks, showcase case studies on design decisions affecting energy use, and demonstrate that automatic optimizations can cut energy consumption by over 40% without sacrificing output quality. The ML.ENERGY Benchmark is extensible, making it a practical resource for both researchers and practitioners seeking to evaluate and minimize the energy footprint of their AI applications.

Sources

Notice something missing or incorrect?
Suggest changes on GitHub