Fine-Grain Energy Consumption Modeling of HPC Task-Based Programs

The power consumption of supercomputers is and will be a major concern in the future. Therefore, reducing the power consumption of high performance computing (HPC) applications is mandatory. Monitoring the energy consumption of HPC programs is a good first step: using external or software power mete...

Full description

Saved in:
Bibliographic Details
Published inProceedings / IEEE International Conference on Cluster Computing pp. 1 - 12
Main Authors Risse, Jules, Guermouche, Amina, Trahay, Francois
Format Conference Proceeding
LanguageEnglish
Published IEEE 02.09.2025
Subjects
Online AccessGet full text
ISSN2168-9253
DOI10.1109/CLUSTER59342.2025.11186478

Cover

More Information
Summary:The power consumption of supercomputers is and will be a major concern in the future. Therefore, reducing the power consumption of high performance computing (HPC) applications is mandatory. Monitoring the energy consumption of HPC programs is a good first step: using external or software power meters, one can measure the energy consumption of an entire compute node or some of its hardware components. Unfortunately, the differences in scope and time scale between power meters and code level functions prevent the identification of power hungry code blocks. For this work, we propose leveraging the tracing mechanism of the StarPU runtime system in order to estimate task level power consumption. We trace the execution of the application while regularly measuring coarse-grain energy consumption of central processing units (CPUs) and graphics processing units (GPUs) using vendor software interfaces. After execution, we identify the executed tasks on each processing unit for every coarsegrain energy measurement interval. We then use this information to generate an overdetermined linear system linking tasks and energy measurements. Subsequently, solving the system allows us to estimate the fine-grain power consumption of each task independently of its actual duration. We achieve mean average percentage errors (MAPE) ranging from 0.5 % to 5 % on various CPUs, and from 10 % to 28 % on GPUs. We show that a solution generated from a run can be used to predict the energy consumption of other runs with different scheduling policies.
ISSN:2168-9253
DOI:10.1109/CLUSTER59342.2025.11186478