Handling data skew in join algorithms using MapReduce
•We introduce a skew handling algorithm, called multi-dimensional range partitioning.•The proposed algorithm is more efficient than traditional MapReduce-based join algorithms.•The proposed algorithm is scalable regardless of the size of input data. One of the major obstacles hindering effective joi...
Saved in:
| Published in | Expert systems with applications Vol. 51; pp. 286 - 299 |
|---|---|
| Main Authors | , , , |
| Format | Journal Article |
| Language | English |
| Published |
Elsevier Ltd
01.06.2016
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 0957-4174 1873-6793 |
| DOI | 10.1016/j.eswa.2015.12.024 |
Cover
| Abstract | •We introduce a skew handling algorithm, called multi-dimensional range partitioning.•The proposed algorithm is more efficient than traditional MapReduce-based join algorithms.•The proposed algorithm is scalable regardless of the size of input data.
One of the major obstacles hindering effective join processing on MapReduce is data skew. Since MapReduce’s basic hash-based partitioning method cannot solve the problem properly, two alternatives have been proposed: range-based and randomized methods. However, they still remain some drawbacks: the range-based method does not handle join product skew, and the randomized method performs worse than the basic hash-based partitioning when input relations are not skewed. In this paper, we present a new skew handling method, called multi-dimensional range partitioning (MDRP). The proposed method overcomes the limitations of traditional algorithms in two ways: 1) the number of output records expected at each machine is considered, which leads to better handling of join product skew, and 2) a small number of input records are sampled before the actual join begins so that an efficient execution plan considering the degree of data skew can be created. As a result, in a scalar skew experiment, the proposed join algorithm is about 6.76 times faster than the range-based algorithm when join product skew exists and about 5.14 times than the randomized algorithm when input relations are not skewed. Moreover, through the worst-case analysis, we show that the input and the output imbalances are less than or equal to 2. The proposed algorithm does not require any modification to the original MapReduce environment and is applicable to complex join operations such as theta-joins and multi-way joins. |
|---|---|
| AbstractList | One of the major obstacles hindering effective join processing on MapReduce is data skew. Since MapReduce's basic hash-based partitioning method cannot solve the problem properly, two alternatives have been proposed: range-based and randomized methods. However, they still remain some drawbacks: the range-based method does not handle join product skew, and the randomized method performs worse than the basic hash-based partitioning when input relations are not skewed. In this paper, we present a new skew handling method, called multi-dimensional range partitioning (MDRP). The proposed method overcomes the limitations of traditional algorithms in two ways: 1) the number of output records expected at each machine is considered, which leads to better handling of join product skew, and 2) a small number of input records are sampled before the actual join begins so that an efficient execution plan considering the degree of data skew can be created. As a result, in a scalar skew experiment, the proposed join algorithm is about 6.76 times faster than the range-based algorithm when join product skew exists and about 5.14 times than the randomized algorithm when input relations are not skewed. Moreover, through the worst-case analysis, we show that the input and the output imbalances are less than or equal to 2. The proposed algorithm does not require any modification to the original MapReduce environment and is applicable to complex join operations such as theta-joins and multi-way joins. •We introduce a skew handling algorithm, called multi-dimensional range partitioning.•The proposed algorithm is more efficient than traditional MapReduce-based join algorithms.•The proposed algorithm is scalable regardless of the size of input data. One of the major obstacles hindering effective join processing on MapReduce is data skew. Since MapReduce’s basic hash-based partitioning method cannot solve the problem properly, two alternatives have been proposed: range-based and randomized methods. However, they still remain some drawbacks: the range-based method does not handle join product skew, and the randomized method performs worse than the basic hash-based partitioning when input relations are not skewed. In this paper, we present a new skew handling method, called multi-dimensional range partitioning (MDRP). The proposed method overcomes the limitations of traditional algorithms in two ways: 1) the number of output records expected at each machine is considered, which leads to better handling of join product skew, and 2) a small number of input records are sampled before the actual join begins so that an efficient execution plan considering the degree of data skew can be created. As a result, in a scalar skew experiment, the proposed join algorithm is about 6.76 times faster than the range-based algorithm when join product skew exists and about 5.14 times than the randomized algorithm when input relations are not skewed. Moreover, through the worst-case analysis, we show that the input and the output imbalances are less than or equal to 2. The proposed algorithm does not require any modification to the original MapReduce environment and is applicable to complex join operations such as theta-joins and multi-way joins. |
| Author | Myung, Jaeseok Shim, Junho Lee, Sang-goo Yeon, Jongheum |
| Author_xml | – sequence: 1 givenname: Jaeseok surname: Myung fullname: Myung, Jaeseok email: jsmyung@europa.snu.ac.kr organization: Corporate Design Center, Samsung Electronics Co., Ltd., South Korea – sequence: 2 givenname: Junho orcidid: 0000-0003-4315-4117 surname: Shim fullname: Shim, Junho email: jshim@sookmyung.ac.kr organization: Division of Computer Science, Sookmyung Women’s University, South Korea – sequence: 3 givenname: Jongheum surname: Yeon fullname: Yeon, Jongheum email: jonghm@europa.snu.ac.kr organization: School of Computer Science and Engineering, Seoul National University, South Korea – sequence: 4 givenname: Sang-goo surname: Lee fullname: Lee, Sang-goo email: sglee@europa.snu.ac.kr organization: School of Computer Science and Engineering, Seoul National University, South Korea |
| BookMark | eNp9kLFOwzAURS1UJNrCDzBlZEl4jp3YkVhQBRQJhIRgthz7pTikSbETKv6eRGVi6PLucs-V3lmQWdu1SMglhYQCza_rBMNeJynQLKFpAik_IXMqBYtzUbAZmUORiZhTwc_IIoQagAoAMSfZWre2ce0msrrXUfjEfeTaqO7Go5tN513_sQ3REKbKs969oh0MnpPTSjcBL_5ySd7v795W6_jp5eFxdfsUGyagjwvGCw1FLqWsCokaLPAcqLHWGJZzxiohyrzgUjJreFlWmFaClybNgSEDxpbk6rC7893XgKFXWxcMNo1usRuCojLNxkWZy7GaHqrGdyF4rNTOu632P4qCmhypWk2O1ORI0VSNjkZI_oOM63Xvurb32jXH0ZsDiuP_3w69CsZha9A6j6ZXtnPH8F9EKYM7 |
| CitedBy_id | crossref_primary_10_1016_j_ins_2018_11_007 crossref_primary_10_3390_electronics12071613 crossref_primary_10_1002_cpe_4387 crossref_primary_10_1109_COMST_2021_3094993 crossref_primary_10_1007_s11227_020_03262_6 crossref_primary_10_3233_MGS_180283 crossref_primary_10_1007_s11227_019_02907_5 crossref_primary_10_1016_j_jpdc_2017_02_007 crossref_primary_10_13005_ojcst_10_02_03 crossref_primary_10_1007_s12652_020_01707_7 crossref_primary_10_3390_app12136554 crossref_primary_10_3233_JIFS_201220 crossref_primary_10_1007_s11227_018_2578_0 crossref_primary_10_1016_j_jksuci_2020_05_004 crossref_primary_10_1186_s40537_018_0146_3 |
| Cites_doi | 10.14778/2350229.2350238 10.1145/2094114.2094118 10.1145/1327452.1327492 10.1145/348.318586 10.1007/s00778-013-0319-9 10.1006/jpdc.1994.1148 10.1007/s10586-014-0348-1 10.1109/TPDS.2014.2350972 10.1109/ICDE.2012.58 10.1080/00107510500052444 10.1016/j.procs.2014.05.014 10.1109/ICDE.2016.7498250 10.1016/j.compeleceng.2013.07.001 10.1137/0117039 |
| ContentType | Journal Article |
| Copyright | 2016 Elsevier Ltd |
| Copyright_xml | – notice: 2016 Elsevier Ltd |
| DBID | AAYXX CITATION 7SC 8FD JQ2 L7M L~C L~D |
| DOI | 10.1016/j.eswa.2015.12.024 |
| DatabaseName | CrossRef Computer and Information Systems Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional |
| DatabaseTitle | CrossRef Computer and Information Systems Abstracts Technology Research Database Computer and Information Systems Abstracts – Academic Advanced Technologies Database with Aerospace ProQuest Computer Science Collection Computer and Information Systems Abstracts Professional |
| DatabaseTitleList | Computer and Information Systems Abstracts |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 1873-6793 |
| EndPage | 299 |
| ExternalDocumentID | 10_1016_j_eswa_2015_12_024 S095741741500826X |
| GroupedDBID | --K --M .DC .~1 0R~ 13V 1B1 1RT 1~. 1~5 4.4 457 4G. 5GY 5VS 7-5 71M 8P~ 9JN 9JO AAAKF AABNK AACTN AAEDT AAEDW AAIAV AAIKJ AAKOC AALRI AAOAW AAQFI AARIN AAXUO AAYFN ABBOA ABFNM ABMAC ABMVD ABUCO ABYKQ ACDAQ ACGFS ACHRH ACNTT ACRLP ACZNC ADBBV ADEZE ADTZH AEBSH AECPX AEKER AENEX AFKWA AFTJW AGHFR AGJBL AGUBO AGUMN AGYEJ AHHHB AHJVU AHZHX AIALX AIEXJ AIKHN AITUG AJBFU AJOXV ALEQD ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ AOUOD APLSM AXJTR BJAXD BKOJK BLXMC BNSAS CS3 DU5 EBS EFJIC EFLBG EJD EO8 EO9 EP2 EP3 F5P FDB FIRID FNPLU FYGXN G-Q GBLVA GBOLZ HAMUX IHE J1W JJJVA KOM LG9 LY1 LY7 M41 MO0 N9A O-L O9- OAUVE OZT P-8 P-9 P2P PC. PQQKQ Q38 RIG ROL RPZ SDF SDG SDP SDS SES SPC SPCBC SSB SSD SSL SST SSV SSZ T5K TN5 ~G- 29G AAAKG AAQXK AATTM AAXKI AAYWO AAYXX ABJNI ABKBG ABWVN ABXDB ACLOT ACNNM ACRPL ACVFH ADCNI ADJOM ADMUD ADNMO AEIPS AEUPX AFJKZ AFPUW AGQPQ AIGII AIIUN AKBMS AKRWK AKYEP ANKPU APXCP ASPBG AVWKF AZFZN CITATION EFKBS FEDTE FGOYB G-2 HLZ HVGLF HZ~ R2- SBC SET SEW WUQ XPP ZMT ~HD 7SC 8FD JQ2 L7M L~C L~D |
| ID | FETCH-LOGICAL-c370t-9349a096888f98ea0d04601cddcc36433f77b694883dc4bbfe2f74bc2603e3033 |
| IEDL.DBID | .~1 |
| ISSN | 0957-4174 |
| IngestDate | Thu Oct 02 11:19:18 EDT 2025 Thu Apr 24 22:50:43 EDT 2025 Wed Oct 01 03:51:47 EDT 2025 Fri Feb 23 02:29:04 EST 2024 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Keywords | Multi-dimensional range partitioning Join algorithm Skew handling MapReduce |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c370t-9349a096888f98ea0d04601cddcc36433f77b694883dc4bbfe2f74bc2603e3033 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| ORCID | 0000-0003-4315-4117 |
| PQID | 1825460868 |
| PQPubID | 23500 |
| PageCount | 14 |
| ParticipantIDs | proquest_miscellaneous_1825460868 crossref_primary_10_1016_j_eswa_2015_12_024 crossref_citationtrail_10_1016_j_eswa_2015_12_024 elsevier_sciencedirect_doi_10_1016_j_eswa_2015_12_024 |
| ProviderPackageCode | CITATION AAYXX |
| PublicationCentury | 2000 |
| PublicationDate | 2016-06-01 |
| PublicationDateYYYYMMDD | 2016-06-01 |
| PublicationDate_xml | – month: 06 year: 2016 text: 2016-06-01 day: 01 |
| PublicationDecade | 2010 |
| PublicationTitle | Expert systems with applications |
| PublicationYear | 2016 |
| Publisher | Elsevier Ltd |
| Publisher_xml | – name: Elsevier Ltd |
| References | Chen (bib0007) 2014 Hassan (bib0018) 2014; 29 Gufler, B. (2012). Load balancing in mapreduce based on scalable cardinality estimates. In Proceedings of the 2012 IEEE 28th international conference on data engineering, (pp. 522–533). Zhang (bib0035) 2012; 5 Doulkeridis, Norvag (bib0011) 2014; 23 Okcan, Riedewald (bib0027) 2011 White (bib0033) 2009 Gibbons (bib0012) 1997 Xu (bib0034) 2014; 40 Afrati, Ullman (bib0001) 2010 Epstein (bib0036) 1978 Polychroniou (bib0028) 2014 Slagter (bib0030) 2014; 17 Beame (bib0004) 2014 Lu, Tan (bib0023) 1994; 23 Kitsuregawa, Ogawa (bib0020) 1990 Newman (bib0025) 2005; 46 DeWitt (bib0010) 1991 Graham (bib0013) 1969; 17 Shatdal, Naughton (bib0029) 1993 Hahn, Warren (bib0016) 1999 Kwon (bib0021) 2012 Nievergelt (bib0026) 1984; 9 DeWitt (bib0009) 1992 Lee (bib0022) 2012; 40 Atta (bib0002) 2011 Dean, Ghemawat (bib0008) 2008; 51 Gu (bib0014) 2014 Beame (bib0003) 2013 Vitorovic, A. (2015). Load balancing and skew resilience for parallel joins: EPFL report. Walton (bib0032) 1991 Chen (bib0006) 2015; 26 Harada, Kitsuregawa (bib0017) 1995 Muralikrishna, DeWitt (bib0024) 1988 Bruno (bib0005) 2014; vol. 7 Hua, Lee (bib0019) 1991 Harada (10.1016/j.eswa.2015.12.024_bib0017) 1995 Lee (10.1016/j.eswa.2015.12.024_bib0022) 2012; 40 Walton (10.1016/j.eswa.2015.12.024_bib0032) 1991 Atta (10.1016/j.eswa.2015.12.024_bib0002) 2011 Lu (10.1016/j.eswa.2015.12.024_bib0023) 1994; 23 Bruno (10.1016/j.eswa.2015.12.024_bib0005) 2014; vol. 7 Chen (10.1016/j.eswa.2015.12.024_bib0006) 2015; 26 DeWitt (10.1016/j.eswa.2015.12.024_bib0009) 1992 Hassan (10.1016/j.eswa.2015.12.024_bib0018) 2014; 29 Doulkeridis (10.1016/j.eswa.2015.12.024_bib0011) 2014; 23 DeWitt (10.1016/j.eswa.2015.12.024_bib0010) 1991 Zhang (10.1016/j.eswa.2015.12.024_bib0035) 2012; 5 Dean (10.1016/j.eswa.2015.12.024_bib0008) 2008; 51 Kwon (10.1016/j.eswa.2015.12.024_bib0021) 2012 Xu (10.1016/j.eswa.2015.12.024_bib0034) 2014; 40 Okcan (10.1016/j.eswa.2015.12.024_bib0027) 2011 Gu (10.1016/j.eswa.2015.12.024_bib0014) 2014 Kitsuregawa (10.1016/j.eswa.2015.12.024_bib0020) 1990 White (10.1016/j.eswa.2015.12.024_bib0033) 2009 Graham (10.1016/j.eswa.2015.12.024_bib0013) 1969; 17 Nievergelt (10.1016/j.eswa.2015.12.024_bib0026) 1984; 9 Hua (10.1016/j.eswa.2015.12.024_bib0019) 1991 Beame (10.1016/j.eswa.2015.12.024_bib0004) 2014 Hahn (10.1016/j.eswa.2015.12.024_bib0016) 1999 Slagter (10.1016/j.eswa.2015.12.024_bib0030) 2014; 17 Gibbons (10.1016/j.eswa.2015.12.024_bib0012) 1997 Newman (10.1016/j.eswa.2015.12.024_bib0025) 2005; 46 Chen (10.1016/j.eswa.2015.12.024_bib0007) 2014 10.1016/j.eswa.2015.12.024_bib0015 Beame (10.1016/j.eswa.2015.12.024_bib0003) 2013 10.1016/j.eswa.2015.12.024_bib0031 Afrati (10.1016/j.eswa.2015.12.024_bib0001) 2010 Polychroniou (10.1016/j.eswa.2015.12.024_bib0028) 2014 Shatdal (10.1016/j.eswa.2015.12.024_bib0029) 1993 Epstein (10.1016/j.eswa.2015.12.024_bib0036) 1978 Muralikrishna (10.1016/j.eswa.2015.12.024_bib0024) 1988 |
| References_xml | – volume: 40 start-page: 675 year: 2014 end-page: 687 ident: bib0034 article-title: Balancing reducer workload for skewed data using sampling-based partitioning publication-title: Computers & Electrical Engineering – volume: 29 start-page: 145 year: 2014 end-page: 158 ident: bib0018 article-title: Handling data-skew effects in join operations using mapreduce publication-title: Procedia Computer Science – reference: Gufler, B. (2012). Load balancing in mapreduce based on scalable cardinality estimates. In Proceedings of the 2012 IEEE 28th international conference on data engineering, (pp. 522–533). – reference: Vitorovic, A. (2015). Load balancing and skew resilience for parallel joins: EPFL report. – start-page: 949 year: 2011 end-page: 960 ident: bib0027 article-title: Processing theta-joins using mapreduce publication-title: Proceedings of the 2011 ACM SIGMOD international conference on management of data – volume: 9 start-page: 38 year: 1984 end-page: 71 ident: bib0026 article-title: The grid file: An adaptable, symmetric multikey file structure publication-title: ACM Transactions on Database Systems – volume: vol. 7 start-page: 1484 year: 2014 end-page: 1495 ident: bib0005 article-title: Advanced join strategies for large-scale distributed computation publication-title: Proceedings of the VLDB endowment – start-page: 443 year: 1991 end-page: 452 ident: bib0010 article-title: An evaluation of non-equijoin algorithms publication-title: Proceedings of the 17th international conference on very large data bases – year: 1999 ident: bib0016 article-title: Extended edited synoptic cloud reports from ships and land stations over the globe, 1952–1996 – start-page: 27 year: 1992 end-page: 40 ident: bib0009 article-title: Practical skew handling in parallel joins publication-title: Proceedings of the 18th international conference on very large data bases – volume: 46 start-page: 323 year: 2005 end-page: 351 ident: bib0025 article-title: Power laws, pareto distributions and zipf’s law publication-title: Contemporary Physics – start-page: 537 year: 1991 end-page: 548 ident: bib0032 article-title: A taxonomy and performance model of data skew effects in parallel joins publication-title: Proceedings of the 17th international conference on very large data bases – start-page: 119 year: 1993 end-page: 128 ident: bib0029 article-title: Using shared virtual memory for parallel join processing publication-title: Proceedings of the 1993 ACM SIGMOD international conference on management of data – volume: 17 start-page: 629 year: 2014 end-page: 641 ident: bib0030 article-title: Smartjoin: A network-aware multiway join for mapreduce publication-title: Cluster Computing – volume: 5 start-page: 1184 year: 2012 end-page: 1195 ident: bib0035 article-title: Efficient multi-way theta-join processing using mapreduce publication-title: VLDB Endowment – start-page: 99 year: 2010 end-page: 110 ident: bib0001 article-title: Optimizing joins in a map-reduce environment publication-title: Proceedings of the 13th international conference on extending database technology – volume: 51 start-page: 107 year: 2008 end-page: 113 ident: bib0008 article-title: Mapreduce: Simplified data processing on large clusters publication-title: Communications of ACM – start-page: 246 year: 2014 end-page: 261 ident: bib0014 article-title: Cost-based join algorithm selection in hadoop publication-title: Web Information Systems Engineering – year: 1997 ident: bib0012 publication-title: Nonparametric methods for quantitative analysis – start-page: 169 year: 1978 end-page: 180 ident: bib0036 article-title: Distributed query processing in a relational data base system publication-title: Proceedings of the 1978 ACM SIGMOD international conference on management of data – volume: 26 start-page: 2520 year: 2015 end-page: 2533 ident: bib0006 article-title: LIBRA: Lightweight data skew mitigation in mapreduce publication-title: IEEE Transactions on Parallel and Distributed Systems – volume: 17 start-page: 416 year: 1969 end-page: 429 ident: bib0013 article-title: Bounds on multiprocessing timing anomalies publication-title: SIAM Journal on Applied Mathematics – start-page: 25 year: 2012 end-page: 36 ident: bib0021 article-title: Skewtune: Mitigating skew in mapreduce applications publication-title: Proceedings of the 2012 ACM SIGMOD international conference on management of data – volume: 23 start-page: 382 year: 1994 end-page: 398 ident: bib0023 article-title: Load-balanced join processing in shared-nothing systems publication-title: Journal of Parallel and Distributed Computing – start-page: 525 year: 1991 end-page: 535 ident: bib0019 article-title: Handling data skew in multiprocessor database computers using partition tuning publication-title: Proceedings of the 17th international conference on very large data bases – year: 2009 ident: bib0033 publication-title: Hadoop: The definitive guide – volume: 23 start-page: 355 year: 2014 end-page: 380 ident: bib0011 article-title: A survey of large-scale analytical query processing in mapreduce publication-title: The VLDB Journal – start-page: 170 year: 2011 end-page: 175 ident: bib0002 article-title: SAND join—A skew handling join algorithm for google’s mapreduce framework publication-title: 2011 IEEE 14th international multitopic conference (INMIC) – start-page: 28 year: 1988 end-page: 36 ident: bib0024 article-title: Equi-depth multidimensional histograms publication-title: Proceedings of the 1988 ACM SIGMOD international conference on management of data – start-page: 1483 year: 2014 end-page: 1494 ident: bib0028 article-title: Track join: Distributed joins with minimal network traffic publication-title: Proceedings of the 2014 ACM SIGMOD international conference on management of data – year: 2013 ident: bib0003 article-title: Communication steps for parallel query processing publication-title: Proceedings of the 32nd symposium on principles of database systems – start-page: 246 year: 1995 end-page: 255 ident: bib0017 article-title: Dynamic join product skew handling for hash-joins in shared-nothing database systems publication-title: Proceedings of the 4th international conference on database systems for advanced applications – volume: 40 start-page: 11 year: 2012 end-page: 20 ident: bib0022 article-title: Parallel data processing with mapreduce: A survey publication-title: SIGMOD Record – start-page: 212 year: 2014 end-page: 223 ident: bib0004 article-title: Skew in parallel query processing publication-title: Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems – start-page: 229 year: 2014 end-page: 241 ident: bib0007 article-title: Algorithms and Architectures for Parallel Processing publication-title: Volume 8630 of the series Lecture Notes in Computer Science – start-page: 210 year: 1990 end-page: 221 ident: bib0020 article-title: Bucket spreading parallel hash: A new, robust, parallel hash join method for data skew in the super database computer (SDC) publication-title: Proceedings of the 16th international conference on very large data bases – volume: 5 start-page: 1184 issue: 11 year: 2012 ident: 10.1016/j.eswa.2015.12.024_bib0035 article-title: Efficient multi-way theta-join processing using mapreduce publication-title: VLDB Endowment doi: 10.14778/2350229.2350238 – start-page: 210 year: 1990 ident: 10.1016/j.eswa.2015.12.024_bib0020 article-title: Bucket spreading parallel hash: A new, robust, parallel hash join method for data skew in the super database computer (SDC) – volume: 40 start-page: 11 issue: 4 year: 2012 ident: 10.1016/j.eswa.2015.12.024_bib0022 article-title: Parallel data processing with mapreduce: A survey publication-title: SIGMOD Record doi: 10.1145/2094114.2094118 – volume: 51 start-page: 107 year: 2008 ident: 10.1016/j.eswa.2015.12.024_bib0008 article-title: Mapreduce: Simplified data processing on large clusters publication-title: Communications of ACM doi: 10.1145/1327452.1327492 – volume: vol. 7 start-page: 1484 year: 2014 ident: 10.1016/j.eswa.2015.12.024_bib0005 article-title: Advanced join strategies for large-scale distributed computation – start-page: 169 year: 1978 ident: 10.1016/j.eswa.2015.12.024_bib0036 article-title: Distributed query processing in a relational data base system – start-page: 1483 year: 2014 ident: 10.1016/j.eswa.2015.12.024_bib0028 article-title: Track join: Distributed joins with minimal network traffic – volume: 9 start-page: 38 issue: 1 year: 1984 ident: 10.1016/j.eswa.2015.12.024_bib0026 article-title: The grid file: An adaptable, symmetric multikey file structure publication-title: ACM Transactions on Database Systems doi: 10.1145/348.318586 – start-page: 443 year: 1991 ident: 10.1016/j.eswa.2015.12.024_bib0010 article-title: An evaluation of non-equijoin algorithms – volume: 23 start-page: 355 issue: 3 year: 2014 ident: 10.1016/j.eswa.2015.12.024_bib0011 article-title: A survey of large-scale analytical query processing in mapreduce publication-title: The VLDB Journal doi: 10.1007/s00778-013-0319-9 – year: 1997 ident: 10.1016/j.eswa.2015.12.024_bib0012 – volume: 23 start-page: 382 issue: 3 year: 1994 ident: 10.1016/j.eswa.2015.12.024_bib0023 article-title: Load-balanced join processing in shared-nothing systems publication-title: Journal of Parallel and Distributed Computing doi: 10.1006/jpdc.1994.1148 – start-page: 525 year: 1991 ident: 10.1016/j.eswa.2015.12.024_bib0019 article-title: Handling data skew in multiprocessor database computers using partition tuning – start-page: 170 year: 2011 ident: 10.1016/j.eswa.2015.12.024_bib0002 article-title: SAND join—A skew handling join algorithm for google’s mapreduce framework – start-page: 99 year: 2010 ident: 10.1016/j.eswa.2015.12.024_bib0001 article-title: Optimizing joins in a map-reduce environment – year: 2013 ident: 10.1016/j.eswa.2015.12.024_bib0003 article-title: Communication steps for parallel query processing – start-page: 212 year: 2014 ident: 10.1016/j.eswa.2015.12.024_bib0004 article-title: Skew in parallel query processing – start-page: 229 year: 2014 ident: 10.1016/j.eswa.2015.12.024_bib0007 article-title: Algorithms and Architectures for Parallel Processing publication-title: Volume 8630 of the series Lecture Notes in Computer Science – start-page: 25 year: 2012 ident: 10.1016/j.eswa.2015.12.024_bib0021 article-title: Skewtune: Mitigating skew in mapreduce applications – start-page: 119 year: 1993 ident: 10.1016/j.eswa.2015.12.024_bib0029 article-title: Using shared virtual memory for parallel join processing – volume: 17 start-page: 629 issue: 3 year: 2014 ident: 10.1016/j.eswa.2015.12.024_bib0030 article-title: Smartjoin: A network-aware multiway join for mapreduce publication-title: Cluster Computing doi: 10.1007/s10586-014-0348-1 – volume: 26 start-page: 2520 issue: 9 year: 2015 ident: 10.1016/j.eswa.2015.12.024_bib0006 article-title: LIBRA: Lightweight data skew mitigation in mapreduce publication-title: IEEE Transactions on Parallel and Distributed Systems doi: 10.1109/TPDS.2014.2350972 – ident: 10.1016/j.eswa.2015.12.024_bib0015 doi: 10.1109/ICDE.2012.58 – volume: 46 start-page: 323 year: 2005 ident: 10.1016/j.eswa.2015.12.024_bib0025 article-title: Power laws, pareto distributions and zipf’s law publication-title: Contemporary Physics doi: 10.1080/00107510500052444 – start-page: 537 year: 1991 ident: 10.1016/j.eswa.2015.12.024_bib0032 article-title: A taxonomy and performance model of data skew effects in parallel joins – start-page: 27 year: 1992 ident: 10.1016/j.eswa.2015.12.024_bib0009 article-title: Practical skew handling in parallel joins – volume: 29 start-page: 145 year: 2014 ident: 10.1016/j.eswa.2015.12.024_bib0018 article-title: Handling data-skew effects in join operations using mapreduce publication-title: Procedia Computer Science doi: 10.1016/j.procs.2014.05.014 – start-page: 949 year: 2011 ident: 10.1016/j.eswa.2015.12.024_bib0027 article-title: Processing theta-joins using mapreduce – year: 2009 ident: 10.1016/j.eswa.2015.12.024_bib0033 – start-page: 246 year: 1995 ident: 10.1016/j.eswa.2015.12.024_bib0017 article-title: Dynamic join product skew handling for hash-joins in shared-nothing database systems – start-page: 246 year: 2014 ident: 10.1016/j.eswa.2015.12.024_bib0014 article-title: Cost-based join algorithm selection in hadoop publication-title: Web Information Systems Engineering – start-page: 28 year: 1988 ident: 10.1016/j.eswa.2015.12.024_bib0024 article-title: Equi-depth multidimensional histograms – ident: 10.1016/j.eswa.2015.12.024_bib0031 doi: 10.1109/ICDE.2016.7498250 – volume: 40 start-page: 675 issue: 2 year: 2014 ident: 10.1016/j.eswa.2015.12.024_bib0034 article-title: Balancing reducer workload for skewed data using sampling-based partitioning publication-title: Computers & Electrical Engineering doi: 10.1016/j.compeleceng.2013.07.001 – volume: 17 start-page: 416 issue: 2 year: 1969 ident: 10.1016/j.eswa.2015.12.024_bib0013 article-title: Bounds on multiprocessing timing anomalies publication-title: SIAM Journal on Applied Mathematics doi: 10.1137/0117039 – year: 1999 ident: 10.1016/j.eswa.2015.12.024_bib0016 |
| SSID | ssj0017007 |
| Score | 2.3174646 |
| Snippet | •We introduce a skew handling algorithm, called multi-dimensional range partitioning.•The proposed algorithm is more efficient than traditional MapReduce-based... One of the major obstacles hindering effective join processing on MapReduce is data skew. Since MapReduce's basic hash-based partitioning method cannot solve... |
| SourceID | proquest crossref elsevier |
| SourceType | Aggregation Database Enrichment Source Index Database Publisher |
| StartPage | 286 |
| SubjectTerms | Algorithms Expert systems Handles Handling Join algorithm MapReduce Multi-dimensional range partitioning Obstacles Partitioning Scalars Skew handling |
| Title | Handling data skew in join algorithms using MapReduce |
| URI | https://dx.doi.org/10.1016/j.eswa.2015.12.024 https://www.proquest.com/docview/1825460868 |
| Volume | 51 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVESC databaseName: Baden-Württemberg Complete Freedom Collection (Elsevier) customDbUrl: eissn: 1873-6793 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0017007 issn: 0957-4174 databaseCode: GBLVA dateStart: 20110101 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier – providerCode: PRVESC databaseName: Elsevier ScienceDirect [Accès UNIL ; CHUV ; HEP Vaud ; Sites BCUL] customDbUrl: eissn: 1873-6793 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0017007 issn: 0957-4174 databaseCode: ACRLP dateStart: 19950101 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier – providerCode: PRVESC databaseName: Elsevier SD Freedom Collection Journals [SCFCJ] - access via UTK customDbUrl: eissn: 1873-6793 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0017007 issn: 0957-4174 databaseCode: AIKHN dateStart: 19950101 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier – providerCode: PRVESC databaseName: ScienceDirect (Elsevier) customDbUrl: eissn: 1873-6793 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0017007 issn: 0957-4174 databaseCode: .~1 dateStart: 19950101 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier – providerCode: PRVLSH databaseName: Elsevier Journals customDbUrl: mediaType: online eissn: 1873-6793 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0017007 issn: 0957-4174 databaseCode: AKRWK dateStart: 19900101 isFulltext: true providerName: Library Specific Holdings |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LS8NAEF5EL158i_XFCt4kNuluus2xiKUq9eADelt2N7O1tabFVnrztzuTR0FBD14CCbshTLLzzWS_-Yax80YaGumdCiAGTFBM6IOWj10ATgkPyiRWUe1w777ZfZa3_bi_wq6qWhiiVZa-v_Dpubcur9RLa9anw2H9EYMDhENCRMKxZp8q2KWiLgaXn0uaB8nPqUJvTwU0uiycKTheMFuQ9lAU578EG_I3cPrhpnPs6WyxjTJo5O3iubbZCmQ7bLNqyMDL9bnL4i5pJiAYcSJ-8tkrLPgw46MJHsx4MHkfzl_eZpy47gPeM9MH0m2FPfbcuX666gZlX4TACRXOg0TIxGDqgcmrT1pgwpR2NyOXps4JjDCEV8o2E1yaInXSWg8Nr6R1mLoIQMgS-2w1m2RwwDhYakgllUnBSpxpQQgTWh9SMyFvkxqLKoNoV4qGU--Ksa7YYSNNRtRkRB01NBqxxi6Wc6aFZMafo-PKzvrbi9fo0_-cd1a9FI0rgrY5TAaTj5mOco1_TNVah_-89xFbx7NmwQg7Zqvz9w84wdhjbk_zj-uUrbVv7rr3X_eB2DE |
| linkProvider | Elsevier |
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV25TsQwELU4Cmi4ETdGokNhk9hZb0qEQMuxFBzSdpbtjGE5sqs9RMe3M5MDCSQoaFIkdhSN43kz9vMbxg7jLDTSOxVAApigmNAHLZ-4AJwSHpRJraKzw52bZvtBXnaT7hQ7rc_CEK2y8v2lTy-8dXWnUVmzMej1GncYHCAcEiISjjW702xWJrGiDOz444vnQfpzqhTcUwE1r07OlCQvGL2T-FCUFGuCsfwNnX746QJ8zpfYQhU18pPyw5bZFOQrbLGuyMCrCbrKkjaJJiAacWJ-8tELvPNezp_7eDGvj_1hb_z0NuJEdn_kHTO4JeFWWGMP52f3p-2gKowQOKHCcZAKmRrMPTB79WkLTJjR9mbkssw5gSGG8ErZZopzU2ROWush9kpah7mLAMQssc5m8n4OG4yDpYpUUpkMrMSeFoQwofUhVRPyNt1kUW0Q7SrVcCpe8apretizJiNqMqKOYo1G3GRHX30GpWbGn62T2s7628hrdOp_9juoB0XjlKB9DpNDfzLSUSHyj7laa-uf795nc-37zrW-vri52mbz-KRZ0sN22Mx4OIFdDETGdq_40T4BHfPZxg |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Handling+data+skew+in+join+algorithms+using+MapReduce&rft.jtitle=Expert+systems+with+applications&rft.au=Myung%2C+Jaeseok&rft.au=Shim%2C+Junho&rft.au=Yeon%2C+Jongheum&rft.au=Lee%2C+Sang-goo&rft.date=2016-06-01&rft.issn=0957-4174&rft.volume=51&rft.spage=286&rft.epage=299&rft_id=info:doi/10.1016%2Fj.eswa.2015.12.024&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_eswa_2015_12_024 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0957-4174&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0957-4174&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0957-4174&client=summon |