Supporting Real-Time Jobs on the IBM Blue Gene/Q: Simulation-Based Study

As the volume and velocity of data generated by scientific experiments increase, the analysis of those data inevitably requires HPC resources. Successful research in a growing number of scientific fields depends on the ability to analyze data rapidly. In many situations, scientists and engineers wan...

Full description

Saved in:
Bibliographic Details
Published inJob Scheduling Strategies for Parallel Processing Vol. 10773; pp. 83 - 102
Main Authors Wang, Daihou, Jung, Eun-Sung, Kettimuthu, Rajkumar, Foster, Ian, Foran, David J., Parashar, Manish
Format Book Chapter
LanguageEnglish
Published Switzerland Springer International Publishing AG 01.01.2018
Springer International Publishing
SeriesLecture Notes in Computer Science
Subjects
Online AccessGet full text
ISBN9783319773971
3319773976
ISSN0302-9743
1611-3349
DOI10.1007/978-3-319-77398-8_5

Cover

More Information
Summary:As the volume and velocity of data generated by scientific experiments increase, the analysis of those data inevitably requires HPC resources. Successful research in a growing number of scientific fields depends on the ability to analyze data rapidly. In many situations, scientists and engineers want quasi-instant feedback, so that results from one experiment can guide selection of the next or even improve the course of a single experiment. Such real-time requirements are hard to meet on current HPC systems, which are typically batch-scheduled under policies in which an arriving job is run immediately only if enough resources are available and is otherwise queued. Real-time jobs, in order to meet their requirements, should sometimes have higher priority than batch jobs that were submitted earlier. But, accommodating more real-time jobs will negatively impact the performance of batch jobs, which may have to be preempted. The overhead involved in preempting and restarting batch jobs will, in turn, negatively impact system utilization. Here we evaluate various scheduling schemes to support real-time jobs along with the traditional batch jobs. We perform simulation studies using trace logs of Mira, the IBM BG/Q system at Argonne National Laboratory, to quantify the impact of real-time jobs on batch job performance for various percentages of real-time jobs in the workload. We present new insights gained from grouping the jobs into different categories and studying the performance of each category. Our results show that real-time jobs in all categories can achieve an average slowdown less than 1.5 and that most categories achieve an average slowdown close to 1 with at most 20% increase in average slowdown for some categories of batch jobs with 20% or fewer real-time jobs.
ISBN:9783319773971
3319773976
ISSN:0302-9743
1611-3349
DOI:10.1007/978-3-319-77398-8_5