Optimizing Data Locality by Executor Allocation in Reduce Stage for Spark Framework

Data locality is a key factor influencing the performance of Spark systems. As the execution container of tasks, the executors started on which nodes can directly affect the locality level achieved by the tasks. This paper tries to improve the data locality by executor allocation in reduce stage for...

Full description

Saved in:
Bibliographic Details
Published inParallel and Distributed Computing, Applications and Technologies Vol. 13148; pp. 349 - 357
Main Authors Fu, Zhongming, He, Mengsi, Tang, Zhuo, Zhang, Yang
Format Book Chapter
LanguageEnglish
Published Switzerland Springer International Publishing AG 2022
Springer International Publishing
SeriesLecture Notes in Computer Science
Subjects
Online AccessGet full text
ISBN9783030967710
3030967719
ISSN0302-9743
1611-3349
DOI10.1007/978-3-030-96772-7_32

Cover

Abstract Data locality is a key factor influencing the performance of Spark systems. As the execution container of tasks, the executors started on which nodes can directly affect the locality level achieved by the tasks. This paper tries to improve the data locality by executor allocation in reduce stage for Spark framework. Firstly, we calculate the network distance matrix of executors and formulate an optimal executor allocation problem to minimize the total communication distance. Then, an approximation algorithm is proposed and the approximate factor is proved to be 2. Finally, we evaluate the performance of our algorithm in a practical Spark cluster by using several representative benchmarks: sort, pageRank and LDA. Experimental results show that the proposed algorithm can help to improve the data locality and application/job performance obviously.
AbstractList Data locality is a key factor influencing the performance of Spark systems. As the execution container of tasks, the executors started on which nodes can directly affect the locality level achieved by the tasks. This paper tries to improve the data locality by executor allocation in reduce stage for Spark framework. Firstly, we calculate the network distance matrix of executors and formulate an optimal executor allocation problem to minimize the total communication distance. Then, an approximation algorithm is proposed and the approximate factor is proved to be 2. Finally, we evaluate the performance of our algorithm in a practical Spark cluster by using several representative benchmarks: sort, pageRank and LDA. Experimental results show that the proposed algorithm can help to improve the data locality and application/job performance obviously.
Author Zhang, Yang
He, Mengsi
Fu, Zhongming
Tang, Zhuo
Author_xml – sequence: 1
  givenname: Zhongming
  orcidid: 0000-0003-3041-6990
  surname: Fu
  fullname: Fu, Zhongming
  email: fuzhongming@hnu.edu.cn
– sequence: 2
  givenname: Mengsi
  orcidid: 0000-0002-4985-2832
  surname: He
  fullname: He, Mengsi
– sequence: 3
  givenname: Zhuo
  orcidid: 0000-0001-9081-8153
  surname: Tang
  fullname: Tang, Zhuo
– sequence: 4
  givenname: Yang
  orcidid: 0000-0002-3111-1534
  surname: Zhang
  fullname: Zhang, Yang
BookMark eNpVkMtOAyEUhvEaW-0buOAFUOAwMLM02qpJE5PWPQGG6tjpMDI0Xp5eat244uT7859wvjE67kLnEbpk9IpRqq4rVRIgFCippFKcKA38AE0yhgx_mTpEIyYZIwCiOvqXMXqMRnnmpFICTtGYAS0oE6UQZ2gyDG-UUq645BRGaPnUp2bTfDfdC74zyeB5cKZt0he2X3j66d02hYhv2jbj1IQONx1e-HrrPF4m8-LxKsfL3sQ1nkWz8R8hri_Qycq0g5_8vedoMZs-3z6Q-dP94-3NnPRcQCKOSwvSW1UWorSqLkFaa1bC5YsKX4vCVoWEkrGirhwwD5ZLz-tC5tTBOeL7pUMf8-d91DaE9aAZ1TuHOgvRoLMG_etL7xzmktiX-hjet35I2u9azncpmta9mj75OGhZcVlS0FBRDRLgB8YicQU
ContentType Book Chapter
Copyright Springer Nature Switzerland AG 2022
Copyright_xml – notice: Springer Nature Switzerland AG 2022
DBID FFUUA
DEWEY 004.35
DOI 10.1007/978-3-030-96772-7_32
DatabaseName ProQuest Ebook Central - Book Chapters - Demo use only
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9783030967727
3030967727
EISSN 1611-3349
Editor Sang, Yingpeng
Fox, Geoffrey
Malek, Manu
Shen, Hong
Arabnia, Hamid R
Xiao, Nong
Gupta, Ajay
Zhang, Yong
Editor_xml – sequence: 1
  fullname: Sang, Yingpeng
– sequence: 2
  fullname: Arabnia, Hamid R
– sequence: 3
  fullname: Fox, Geoffrey
– sequence: 4
  fullname: Malek, Manu
– sequence: 5
  fullname: Shen, Hong
– sequence: 6
  fullname: Xiao, Nong
– sequence: 7
  fullname: Gupta, Ajay
– sequence: 8
  fullname: Zhang, Yong
EndPage 357
ExternalDocumentID EBC6926803_390_363
GroupedDBID 38.
AABBV
AAZWU
ABSVR
ABTHU
ABVND
ACBPT
ACHZO
ACPMC
ADNVS
AEDXK
AEJLV
AEKFX
AHVRR
AIYYB
AJIEK
ALMA_UNASSIGNED_HOLDINGS
BBABE
CZZ
FFUUA
I4C
IEZ
SBO
TPJZQ
TSXQS
Z7R
Z7U
Z7X
Z7Z
Z81
Z83
Z84
Z85
Z88
-DT
-GH
-~X
1SB
29L
2HA
2HV
5QI
875
AASHB
ABMNI
ACGFS
ADCXD
AEFIE
EJD
F5P
FEDTE
HVGLF
LAS
LDH
P2P
RNI
RSU
SVGTG
VI1
~02
ID FETCH-LOGICAL-p243t-c26b36eb78548b7d836bbaf4c3495ed45b95638115d9c31e3b26e2d5695ec3
ISBN 9783030967710
3030967719
ISSN 0302-9743
IngestDate Wed Sep 17 04:25:19 EDT 2025
Tue Oct 21 01:50:30 EDT 2025
IsPeerReviewed true
IsScholarly true
LCCallNum QA75.5-76.95
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-p243t-c26b36eb78548b7d836bbaf4c3495ed45b95638115d9c31e3b26e2d5695ec3
OCLC 1305014844
ORCID 0000-0001-9081-8153
0000-0003-3041-6990
0000-0002-3111-1534
0000-0002-4985-2832
PQID EBC6926803_390_363
PageCount 9
ParticipantIDs springer_books_10_1007_978_3_030_96772_7_32
proquest_ebookcentralchapters_6926803_390_363
PublicationCentury 2000
PublicationDate 2022
PublicationDateYYYYMMDD 2022-01-01
PublicationDate_xml – year: 2022
  text: 2022
PublicationDecade 2020
PublicationPlace Switzerland
PublicationPlace_xml – name: Switzerland
– name: Cham
PublicationSeriesSubtitle Theoretical Computer Science and General Issues
PublicationSeriesTitle Lecture Notes in Computer Science
PublicationSeriesTitleAlternate Lect.Notes Computer
PublicationSubtitle 22nd International Conference, PDCAT 2021, Guangzhou, China, December 17-19, 2021, Proceedings
PublicationTitle Parallel and Distributed Computing, Applications and Technologies
PublicationYear 2022
Publisher Springer International Publishing AG
Springer International Publishing
Publisher_xml – name: Springer International Publishing AG
– name: Springer International Publishing
RelatedPersons Hartmanis, Juris
Gao, Wen
Bertino, Elisa
Woeginger, Gerhard
Goos, Gerhard
Steffen, Bernhard
Yung, Moti
RelatedPersons_xml – sequence: 1
  givenname: Gerhard
  surname: Goos
  fullname: Goos, Gerhard
– sequence: 2
  givenname: Juris
  surname: Hartmanis
  fullname: Hartmanis, Juris
– sequence: 3
  givenname: Elisa
  surname: Bertino
  fullname: Bertino, Elisa
– sequence: 4
  givenname: Wen
  surname: Gao
  fullname: Gao, Wen
– sequence: 5
  givenname: Bernhard
  orcidid: 0000-0001-9619-1558
  surname: Steffen
  fullname: Steffen, Bernhard
– sequence: 6
  givenname: Gerhard
  orcidid: 0000-0001-8816-2693
  surname: Woeginger
  fullname: Woeginger, Gerhard
– sequence: 7
  givenname: Moti
  orcidid: 0000-0003-0848-0873
  surname: Yung
  fullname: Yung, Moti
SSID ssj0002726203
ssj0002792
Score 2.0307264
Snippet Data locality is a key factor influencing the performance of Spark systems. As the execution container of tasks, the executors started on which nodes can...
SourceID springer
proquest
SourceType Publisher
StartPage 349
SubjectTerms Communication distance
Data locality
Executor allocation
Spark
Title Optimizing Data Locality by Executor Allocation in Reduce Stage for Spark Framework
URI http://ebookcentral.proquest.com/lib/SITE_ID/reader.action?docID=6926803&ppg=363
http://link.springer.com/10.1007/978-3-030-96772-7_32
Volume 13148
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV07b9swECZcdyk69I2mL3DoZqiwSJGShg5G4yAIjBRI0iLtQpASlRhp5MCWgTY_rL-vd6RoKU6WdBFsmaaou8O9-N2RkI-xMKksIMjJRVZFCdjkyPAyi0RsTZmCQy6tA8geyv1vycGpOB0M_vZQS-vGfCqu76wr-R-uwj3gK1bJ3oOzm0nhBnwG_sIVOAzXLef3ZprVN73QSzwHxZf672L_Wzy6ChO27qCG9qySSW-D2o3c5NJ76MG9tdujOF_UZ5fBlLn8qEuX2vpsNe8ifD_vz_P14lbS-YcOf_ZL_Ar66HJ-jdmIXd3o0QwNJ7r94PNOf9sC-yeMJr_QngbM5RG2kkV4I2KJEAJ5fKWXF-hfewyZ14LYnXn1edbufxwuGgcrG4UjKoLG6qc0GNtKaYSU5lZStMvL3YiBOW4SyTRt0bFtLRjoeYiUvOq0XrVLbNjIfYPUVl2Hb97yc98q-5ZR6eNIYOYInwZhieJg-h_AAobk4WR6MPu-ye2xFNv8dx4BNmn0u1l-VVhjFFad-y5Q3Vv06jvveuSNSGhr8975RCdPyWOsk6FYwAL0e0YGtn5OngQW0JYFL8hxJwUUpYAGKaDmDw1SQDspoPOaeimgTgooSAF1UkA3UvCSHO1NT77sR-1RHtEVS3gTFUwaLq1JM4iQTVpmXBqjq6QADghbJsKAWgDnMRZlXvDYcsOkZaWQ8GvBX5Fhvajta0KN5rKqyoolJk7iKtc6E2OtRabBrcoN2yFRoI5yaIMW4lx4WqyUzJnMxlzxfKy45DtkFEiocPhKhTbeQHvFFdBeOdorpP2be41-Sx51wv2ODJvl2r4HD7YxH1qB-QfnypT5
linkProvider Library Specific Holdings
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.title=Parallel+and+Distributed+Computing%2C+Applications+and+Technologies&rft.au=Fu%2C+Zhongming&rft.au=He%2C+Mengsi&rft.au=Tang%2C+Zhuo&rft.au=Zhang%2C+Yang&rft.atitle=Optimizing+Data+Locality+by+Executor+Allocation+in+Reduce+Stage+for+Spark+Framework&rft.series=Lecture+Notes+in+Computer+Science&rft.date=2022-01-01&rft.pub=Springer+International+Publishing&rft.isbn=9783030967710&rft.issn=0302-9743&rft.eissn=1611-3349&rft.spage=349&rft.epage=357&rft_id=info:doi/10.1007%2F978-3-030-96772-7_32
thumbnail_s http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Febookcentral.proquest.com%2Fcovers%2F6926803-l.jpg