Optimizing Data Locality by Executor Allocation in Reduce Stage for Spark Framework
Data locality is a key factor influencing the performance of Spark systems. As the execution container of tasks, the executors started on which nodes can directly affect the locality level achieved by the tasks. This paper tries to improve the data locality by executor allocation in reduce stage for...
Saved in:
| Published in | Parallel and Distributed Computing, Applications and Technologies Vol. 13148; pp. 349 - 357 |
|---|---|
| Main Authors | , , , |
| Format | Book Chapter |
| Language | English |
| Published |
Switzerland
Springer International Publishing AG
2022
Springer International Publishing |
| Series | Lecture Notes in Computer Science |
| Subjects | |
| Online Access | Get full text |
| ISBN | 9783030967710 3030967719 |
| ISSN | 0302-9743 1611-3349 |
| DOI | 10.1007/978-3-030-96772-7_32 |
Cover
| Abstract | Data locality is a key factor influencing the performance of Spark systems. As the execution container of tasks, the executors started on which nodes can directly affect the locality level achieved by the tasks. This paper tries to improve the data locality by executor allocation in reduce stage for Spark framework. Firstly, we calculate the network distance matrix of executors and formulate an optimal executor allocation problem to minimize the total communication distance. Then, an approximation algorithm is proposed and the approximate factor is proved to be 2. Finally, we evaluate the performance of our algorithm in a practical Spark cluster by using several representative benchmarks: sort, pageRank and LDA. Experimental results show that the proposed algorithm can help to improve the data locality and application/job performance obviously. |
|---|---|
| AbstractList | Data locality is a key factor influencing the performance of Spark systems. As the execution container of tasks, the executors started on which nodes can directly affect the locality level achieved by the tasks. This paper tries to improve the data locality by executor allocation in reduce stage for Spark framework. Firstly, we calculate the network distance matrix of executors and formulate an optimal executor allocation problem to minimize the total communication distance. Then, an approximation algorithm is proposed and the approximate factor is proved to be 2. Finally, we evaluate the performance of our algorithm in a practical Spark cluster by using several representative benchmarks: sort, pageRank and LDA. Experimental results show that the proposed algorithm can help to improve the data locality and application/job performance obviously. |
| Author | Zhang, Yang He, Mengsi Fu, Zhongming Tang, Zhuo |
| Author_xml | – sequence: 1 givenname: Zhongming orcidid: 0000-0003-3041-6990 surname: Fu fullname: Fu, Zhongming email: fuzhongming@hnu.edu.cn – sequence: 2 givenname: Mengsi orcidid: 0000-0002-4985-2832 surname: He fullname: He, Mengsi – sequence: 3 givenname: Zhuo orcidid: 0000-0001-9081-8153 surname: Tang fullname: Tang, Zhuo – sequence: 4 givenname: Yang orcidid: 0000-0002-3111-1534 surname: Zhang fullname: Zhang, Yang |
| BookMark | eNpVkMtOAyEUhvEaW-0buOAFUOAwMLM02qpJE5PWPQGG6tjpMDI0Xp5eat244uT7859wvjE67kLnEbpk9IpRqq4rVRIgFCippFKcKA38AE0yhgx_mTpEIyYZIwCiOvqXMXqMRnnmpFICTtGYAS0oE6UQZ2gyDG-UUq645BRGaPnUp2bTfDfdC74zyeB5cKZt0he2X3j66d02hYhv2jbj1IQONx1e-HrrPF4m8-LxKsfL3sQ1nkWz8R8hri_Qycq0g5_8vedoMZs-3z6Q-dP94-3NnPRcQCKOSwvSW1UWorSqLkFaa1bC5YsKX4vCVoWEkrGirhwwD5ZLz-tC5tTBOeL7pUMf8-d91DaE9aAZ1TuHOgvRoLMG_etL7xzmktiX-hjet35I2u9azncpmta9mj75OGhZcVlS0FBRDRLgB8YicQU |
| ContentType | Book Chapter |
| Copyright | Springer Nature Switzerland AG 2022 |
| Copyright_xml | – notice: Springer Nature Switzerland AG 2022 |
| DBID | FFUUA |
| DEWEY | 004.35 |
| DOI | 10.1007/978-3-030-96772-7_32 |
| DatabaseName | ProQuest Ebook Central - Book Chapters - Demo use only |
| DatabaseTitleList | |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISBN | 9783030967727 3030967727 |
| EISSN | 1611-3349 |
| Editor | Sang, Yingpeng Fox, Geoffrey Malek, Manu Shen, Hong Arabnia, Hamid R Xiao, Nong Gupta, Ajay Zhang, Yong |
| Editor_xml | – sequence: 1 fullname: Sang, Yingpeng – sequence: 2 fullname: Arabnia, Hamid R – sequence: 3 fullname: Fox, Geoffrey – sequence: 4 fullname: Malek, Manu – sequence: 5 fullname: Shen, Hong – sequence: 6 fullname: Xiao, Nong – sequence: 7 fullname: Gupta, Ajay – sequence: 8 fullname: Zhang, Yong |
| EndPage | 357 |
| ExternalDocumentID | EBC6926803_390_363 |
| GroupedDBID | 38. AABBV AAZWU ABSVR ABTHU ABVND ACBPT ACHZO ACPMC ADNVS AEDXK AEJLV AEKFX AHVRR AIYYB AJIEK ALMA_UNASSIGNED_HOLDINGS BBABE CZZ FFUUA I4C IEZ SBO TPJZQ TSXQS Z7R Z7U Z7X Z7Z Z81 Z83 Z84 Z85 Z88 -DT -GH -~X 1SB 29L 2HA 2HV 5QI 875 AASHB ABMNI ACGFS ADCXD AEFIE EJD F5P FEDTE HVGLF LAS LDH P2P RNI RSU SVGTG VI1 ~02 |
| ID | FETCH-LOGICAL-p243t-c26b36eb78548b7d836bbaf4c3495ed45b95638115d9c31e3b26e2d5695ec3 |
| ISBN | 9783030967710 3030967719 |
| ISSN | 0302-9743 |
| IngestDate | Wed Sep 17 04:25:19 EDT 2025 Tue Oct 21 01:50:30 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| LCCallNum | QA75.5-76.95 |
| Language | English |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-p243t-c26b36eb78548b7d836bbaf4c3495ed45b95638115d9c31e3b26e2d5695ec3 |
| OCLC | 1305014844 |
| ORCID | 0000-0001-9081-8153 0000-0003-3041-6990 0000-0002-3111-1534 0000-0002-4985-2832 |
| PQID | EBC6926803_390_363 |
| PageCount | 9 |
| ParticipantIDs | springer_books_10_1007_978_3_030_96772_7_32 proquest_ebookcentralchapters_6926803_390_363 |
| PublicationCentury | 2000 |
| PublicationDate | 2022 |
| PublicationDateYYYYMMDD | 2022-01-01 |
| PublicationDate_xml | – year: 2022 text: 2022 |
| PublicationDecade | 2020 |
| PublicationPlace | Switzerland |
| PublicationPlace_xml | – name: Switzerland – name: Cham |
| PublicationSeriesSubtitle | Theoretical Computer Science and General Issues |
| PublicationSeriesTitle | Lecture Notes in Computer Science |
| PublicationSeriesTitleAlternate | Lect.Notes Computer |
| PublicationSubtitle | 22nd International Conference, PDCAT 2021, Guangzhou, China, December 17-19, 2021, Proceedings |
| PublicationTitle | Parallel and Distributed Computing, Applications and Technologies |
| PublicationYear | 2022 |
| Publisher | Springer International Publishing AG Springer International Publishing |
| Publisher_xml | – name: Springer International Publishing AG – name: Springer International Publishing |
| RelatedPersons | Hartmanis, Juris Gao, Wen Bertino, Elisa Woeginger, Gerhard Goos, Gerhard Steffen, Bernhard Yung, Moti |
| RelatedPersons_xml | – sequence: 1 givenname: Gerhard surname: Goos fullname: Goos, Gerhard – sequence: 2 givenname: Juris surname: Hartmanis fullname: Hartmanis, Juris – sequence: 3 givenname: Elisa surname: Bertino fullname: Bertino, Elisa – sequence: 4 givenname: Wen surname: Gao fullname: Gao, Wen – sequence: 5 givenname: Bernhard orcidid: 0000-0001-9619-1558 surname: Steffen fullname: Steffen, Bernhard – sequence: 6 givenname: Gerhard orcidid: 0000-0001-8816-2693 surname: Woeginger fullname: Woeginger, Gerhard – sequence: 7 givenname: Moti orcidid: 0000-0003-0848-0873 surname: Yung fullname: Yung, Moti |
| SSID | ssj0002726203 ssj0002792 |
| Score | 2.0307264 |
| Snippet | Data locality is a key factor influencing the performance of Spark systems. As the execution container of tasks, the executors started on which nodes can... |
| SourceID | springer proquest |
| SourceType | Publisher |
| StartPage | 349 |
| SubjectTerms | Communication distance Data locality Executor allocation Spark |
| Title | Optimizing Data Locality by Executor Allocation in Reduce Stage for Spark Framework |
| URI | http://ebookcentral.proquest.com/lib/SITE_ID/reader.action?docID=6926803&ppg=363 http://link.springer.com/10.1007/978-3-030-96772-7_32 |
| Volume | 13148 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV07b9swECZcdyk69I2mL3DoZqiwSJGShg5G4yAIjBRI0iLtQpASlRhp5MCWgTY_rL-vd6RoKU6WdBFsmaaou8O9-N2RkI-xMKksIMjJRVZFCdjkyPAyi0RsTZmCQy6tA8geyv1vycGpOB0M_vZQS-vGfCqu76wr-R-uwj3gK1bJ3oOzm0nhBnwG_sIVOAzXLef3ZprVN73QSzwHxZf672L_Wzy6ChO27qCG9qySSW-D2o3c5NJ76MG9tdujOF_UZ5fBlLn8qEuX2vpsNe8ifD_vz_P14lbS-YcOf_ZL_Ar66HJ-jdmIXd3o0QwNJ7r94PNOf9sC-yeMJr_QngbM5RG2kkV4I2KJEAJ5fKWXF-hfewyZ14LYnXn1edbufxwuGgcrG4UjKoLG6qc0GNtKaYSU5lZStMvL3YiBOW4SyTRt0bFtLRjoeYiUvOq0XrVLbNjIfYPUVl2Hb97yc98q-5ZR6eNIYOYInwZhieJg-h_AAobk4WR6MPu-ye2xFNv8dx4BNmn0u1l-VVhjFFad-y5Q3Vv06jvveuSNSGhr8975RCdPyWOsk6FYwAL0e0YGtn5OngQW0JYFL8hxJwUUpYAGKaDmDw1SQDspoPOaeimgTgooSAF1UkA3UvCSHO1NT77sR-1RHtEVS3gTFUwaLq1JM4iQTVpmXBqjq6QADghbJsKAWgDnMRZlXvDYcsOkZaWQ8GvBX5Fhvajta0KN5rKqyoolJk7iKtc6E2OtRabBrcoN2yFRoI5yaIMW4lx4WqyUzJnMxlzxfKy45DtkFEiocPhKhTbeQHvFFdBeOdorpP2be41-Sx51wv2ODJvl2r4HD7YxH1qB-QfnypT5 |
| linkProvider | Library Specific Holdings |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.title=Parallel+and+Distributed+Computing%2C+Applications+and+Technologies&rft.au=Fu%2C+Zhongming&rft.au=He%2C+Mengsi&rft.au=Tang%2C+Zhuo&rft.au=Zhang%2C+Yang&rft.atitle=Optimizing+Data+Locality+by+Executor+Allocation+in+Reduce+Stage+for+Spark+Framework&rft.series=Lecture+Notes+in+Computer+Science&rft.date=2022-01-01&rft.pub=Springer+International+Publishing&rft.isbn=9783030967710&rft.issn=0302-9743&rft.eissn=1611-3349&rft.spage=349&rft.epage=357&rft_id=info:doi/10.1007%2F978-3-030-96772-7_32 |
| thumbnail_s | http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Febookcentral.proquest.com%2Fcovers%2F6926803-l.jpg |