Optimizing Data Locality by Executor Allocation in Reduce Stage for Spark Framework
Data locality is a key factor influencing the performance of Spark systems. As the execution container of tasks, the executors started on which nodes can directly affect the locality level achieved by the tasks. This paper tries to improve the data locality by executor allocation in reduce stage for...
Saved in:
| Published in | Parallel and Distributed Computing, Applications and Technologies Vol. 13148; pp. 349 - 357 |
|---|---|
| Main Authors | , , , |
| Format | Book Chapter |
| Language | English |
| Published |
Switzerland
Springer International Publishing AG
2022
Springer International Publishing |
| Series | Lecture Notes in Computer Science |
| Subjects | |
| Online Access | Get full text |
| ISBN | 9783030967710 3030967719 |
| ISSN | 0302-9743 1611-3349 |
| DOI | 10.1007/978-3-030-96772-7_32 |
Cover
| Summary: | Data locality is a key factor influencing the performance of Spark systems. As the execution container of tasks, the executors started on which nodes can directly affect the locality level achieved by the tasks. This paper tries to improve the data locality by executor allocation in reduce stage for Spark framework. Firstly, we calculate the network distance matrix of executors and formulate an optimal executor allocation problem to minimize the total communication distance. Then, an approximation algorithm is proposed and the approximate factor is proved to be 2. Finally, we evaluate the performance of our algorithm in a practical Spark cluster by using several representative benchmarks: sort, pageRank and LDA. Experimental results show that the proposed algorithm can help to improve the data locality and application/job performance obviously. |
|---|---|
| ISBN: | 9783030967710 3030967719 |
| ISSN: | 0302-9743 1611-3349 |
| DOI: | 10.1007/978-3-030-96772-7_32 |