Job failure prediction in grid environment based on workload characteristics
The power of grid technology in aggregating autonomous resources owned by several organizations into a single virtual system has made it popular in compute-intensive and data-intensive applications. Complex and dynamic nature of grid makes failure of users' jobs fairly probable. Furthermore, tr...
Saved in:
| Published in | 2009 14th International CSI Computer Conference pp. 329 - 334 |
|---|---|
| Main Authors | , , |
| Format | Conference Proceeding |
| Language | English |
| Published |
IEEE
01.10.2009
|
| Subjects | |
| Online Access | Get full text |
| ISBN | 9781424442614 1424442613 |
| DOI | 10.1109/CSICC.2009.5349381 |
Cover
| Summary: | The power of grid technology in aggregating autonomous resources owned by several organizations into a single virtual system has made it popular in compute-intensive and data-intensive applications. Complex and dynamic nature of grid makes failure of users' jobs fairly probable. Furthermore, traditional methods for job failure recovery have proven costly and thus a need to shift toward proactive and predictive management strategies is necessary in such systems. In this paper, an innovative effort is made to predict the futurity of jobs submitted to a production grid environment (AuverGrid). By analyzing grid workload traces and extracting patterns describing common failure characteristics, the success or failure status of jobs during 6 months of AuverGrid activity was predicted with around 96% accuracy. The quality of services on grid can be improved by integrating the result of this work into management services like scheduling and monitoring. |
|---|---|
| ISBN: | 9781424442614 1424442613 |
| DOI: | 10.1109/CSICC.2009.5349381 |