Optimizing Data Locality by Executor Allocation in Reduce Stage for Spark Framework

Data locality is a key factor influencing the performance of Spark systems. As the execution container of tasks, the executors started on which nodes can directly affect the locality level achieved by the tasks. This paper tries to improve the data locality by executor allocation in reduce stage for...

Full description

Saved in:

Bibliographic Details
Published in	Parallel and Distributed Computing, Applications and Technologies Vol. 13148; pp. 349 - 357
Main Authors	Fu, Zhongming, He, Mengsi, Tang, Zhuo, Zhang, Yang
Format	Book Chapter
Language	English
Published	Switzerland Springer International Publishing AG 2022 Springer International Publishing
Series	Lecture Notes in Computer Science
Subjects	Communication distance Data locality Executor allocation Spark
Online Access	Get full text
ISBN	9783030967710 3030967719
ISSN	0302-9743 1611-3349
DOI	10.1007/978-3-030-96772-7_32

Cover

More Information
Summary:	Data locality is a key factor influencing the performance of Spark systems. As the execution container of tasks, the executors started on which nodes can directly affect the locality level achieved by the tasks. This paper tries to improve the data locality by executor allocation in reduce stage for Spark framework. Firstly, we calculate the network distance matrix of executors and formulate an optimal executor allocation problem to minimize the total communication distance. Then, an approximation algorithm is proposed and the approximate factor is proved to be 2. Finally, we evaluate the performance of our algorithm in a practical Spark cluster by using several representative benchmarks: sort, pageRank and LDA. Experimental results show that the proposed algorithm can help to improve the data locality and application/job performance obviously.
ISBN:	9783030967710 3030967719
ISSN:	0302-9743 1611-3349
DOI:	10.1007/978-3-030-96772-7_32