Fine-Grain Energy Consumption Modeling of HPC Task-Based Programs
The power consumption of supercomputers is and will be a major concern in the future. Therefore, reducing the power consumption of high performance computing (HPC) applications is mandatory. Monitoring the energy consumption of HPC programs is a good first step: using external or software power mete...
Saved in:
| Published in | Proceedings / IEEE International Conference on Cluster Computing pp. 1 - 12 |
|---|---|
| Main Authors | , , |
| Format | Conference Proceeding |
| Language | English |
| Published |
IEEE
02.09.2025
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 2168-9253 |
| DOI | 10.1109/CLUSTER59342.2025.11186478 |
Cover
| Summary: | The power consumption of supercomputers is and will be a major concern in the future. Therefore, reducing the power consumption of high performance computing (HPC) applications is mandatory. Monitoring the energy consumption of HPC programs is a good first step: using external or software power meters, one can measure the energy consumption of an entire compute node or some of its hardware components. Unfortunately, the differences in scope and time scale between power meters and code level functions prevent the identification of power hungry code blocks. For this work, we propose leveraging the tracing mechanism of the StarPU runtime system in order to estimate task level power consumption. We trace the execution of the application while regularly measuring coarse-grain energy consumption of central processing units (CPUs) and graphics processing units (GPUs) using vendor software interfaces. After execution, we identify the executed tasks on each processing unit for every coarsegrain energy measurement interval. We then use this information to generate an overdetermined linear system linking tasks and energy measurements. Subsequently, solving the system allows us to estimate the fine-grain power consumption of each task independently of its actual duration. We achieve mean average percentage errors (MAPE) ranging from 0.5 % to 5 % on various CPUs, and from 10 % to 28 % on GPUs. We show that a solution generated from a run can be used to predict the energy consumption of other runs with different scheduling policies. |
|---|---|
| ISSN: | 2168-9253 |
| DOI: | 10.1109/CLUSTER59342.2025.11186478 |