The ML.ENERGY Benchmark: Toward Automated Inference Energy Measurement and Optimization
As Generative AI becomes increasingly integrated into real-world services, energy consumption has become a significant bottleneck—yet it remains under-measured and under-optimized in machine learning (ML) systems. This paper introduces the ML.ENERGY Benchmark and Leaderboard, an open-source suite and evaluation platform designed to measure and compare the inference energy use of AI models in realistic service environments. The authors present four core principles for effective energy benchmarking and illustrate their application within the tool. Results from the benchmark detail energy metrics for 40 popular model architectures across 6 tasks, showcase case studies on design decisions affecting energy use, and demonstrate that automatic optimizations can cut energy consumption by over 40% without sacrificing output quality. The ML.ENERGY Benchmark is extensible, making it a practical resource for both researchers and practitioners seeking to evaluate and minimize the energy footprint of their AI applications.