Job failure prediction in grid environment based on workload characteristics

The power of grid technology in aggregating autonomous resources owned by several organizations into a single virtual system has made it popular in compute-intensive and data-intensive applications. Complex and dynamic nature of grid makes failure of users' jobs fairly probable. Furthermore, tr...

Full description

Saved in:
Bibliographic Details
Published in2009 14th International CSI Computer Conference pp. 329 - 334
Main Authors Fadishei, H., Saadatfar, H., Deldari, H.
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.10.2009
Subjects
Online AccessGet full text
ISBN9781424442614
1424442613
DOI10.1109/CSICC.2009.5349381

Cover

More Information
Summary:The power of grid technology in aggregating autonomous resources owned by several organizations into a single virtual system has made it popular in compute-intensive and data-intensive applications. Complex and dynamic nature of grid makes failure of users' jobs fairly probable. Furthermore, traditional methods for job failure recovery have proven costly and thus a need to shift toward proactive and predictive management strategies is necessary in such systems. In this paper, an innovative effort is made to predict the futurity of jobs submitted to a production grid environment (AuverGrid). By analyzing grid workload traces and extracting patterns describing common failure characteristics, the success or failure status of jobs during 6 months of AuverGrid activity was predicted with around 96% accuracy. The quality of services on grid can be improved by integrating the result of this work into management services like scheduling and monitoring.
ISBN:9781424442614
1424442613
DOI:10.1109/CSICC.2009.5349381