Optimizing Data Locality by Executor Allocation in Reduce Stage for Spark Framework

Data locality is a key factor influencing the performance of Spark systems. As the execution container of tasks, the executors started on which nodes can directly affect the locality level achieved by the tasks. This paper tries to improve the data locality by executor allocation in reduce stage for...

Full description

Saved in:
Bibliographic Details
Published inParallel and Distributed Computing, Applications and Technologies Vol. 13148; pp. 349 - 357
Main Authors Fu, Zhongming, He, Mengsi, Tang, Zhuo, Zhang, Yang
Format Book Chapter
LanguageEnglish
Published Switzerland Springer International Publishing AG 2022
Springer International Publishing
SeriesLecture Notes in Computer Science
Subjects
Online AccessGet full text
ISBN9783030967710
3030967719
ISSN0302-9743
1611-3349
DOI10.1007/978-3-030-96772-7_32

Cover

More Information
Summary:Data locality is a key factor influencing the performance of Spark systems. As the execution container of tasks, the executors started on which nodes can directly affect the locality level achieved by the tasks. This paper tries to improve the data locality by executor allocation in reduce stage for Spark framework. Firstly, we calculate the network distance matrix of executors and formulate an optimal executor allocation problem to minimize the total communication distance. Then, an approximation algorithm is proposed and the approximate factor is proved to be 2. Finally, we evaluate the performance of our algorithm in a practical Spark cluster by using several representative benchmarks: sort, pageRank and LDA. Experimental results show that the proposed algorithm can help to improve the data locality and application/job performance obviously.
ISBN:9783030967710
3030967719
ISSN:0302-9743
1611-3349
DOI:10.1007/978-3-030-96772-7_32