Handling data skew in join algorithms using MapReduce

•We introduce a skew handling algorithm, called multi-dimensional range partitioning.•The proposed algorithm is more efficient than traditional MapReduce-based join algorithms.•The proposed algorithm is scalable regardless of the size of input data. One of the major obstacles hindering effective joi...

Full description

Saved in:
Bibliographic Details
Published inExpert systems with applications Vol. 51; pp. 286 - 299
Main Authors Myung, Jaeseok, Shim, Junho, Yeon, Jongheum, Lee, Sang-goo
Format Journal Article
LanguageEnglish
Published Elsevier Ltd 01.06.2016
Subjects
Online AccessGet full text
ISSN0957-4174
1873-6793
DOI10.1016/j.eswa.2015.12.024

Cover

Abstract •We introduce a skew handling algorithm, called multi-dimensional range partitioning.•The proposed algorithm is more efficient than traditional MapReduce-based join algorithms.•The proposed algorithm is scalable regardless of the size of input data. One of the major obstacles hindering effective join processing on MapReduce is data skew. Since MapReduce’s basic hash-based partitioning method cannot solve the problem properly, two alternatives have been proposed: range-based and randomized methods. However, they still remain some drawbacks: the range-based method does not handle join product skew, and the randomized method performs worse than the basic hash-based partitioning when input relations are not skewed. In this paper, we present a new skew handling method, called multi-dimensional range partitioning (MDRP). The proposed method overcomes the limitations of traditional algorithms in two ways: 1) the number of output records expected at each machine is considered, which leads to better handling of join product skew, and 2) a small number of input records are sampled before the actual join begins so that an efficient execution plan considering the degree of data skew can be created. As a result, in a scalar skew experiment, the proposed join algorithm is about 6.76 times faster than the range-based algorithm when join product skew exists and about 5.14 times than the randomized algorithm when input relations are not skewed. Moreover, through the worst-case analysis, we show that the input and the output imbalances are less than or equal to 2. The proposed algorithm does not require any modification to the original MapReduce environment and is applicable to complex join operations such as theta-joins and multi-way joins.
AbstractList One of the major obstacles hindering effective join processing on MapReduce is data skew. Since MapReduce's basic hash-based partitioning method cannot solve the problem properly, two alternatives have been proposed: range-based and randomized methods. However, they still remain some drawbacks: the range-based method does not handle join product skew, and the randomized method performs worse than the basic hash-based partitioning when input relations are not skewed. In this paper, we present a new skew handling method, called multi-dimensional range partitioning (MDRP). The proposed method overcomes the limitations of traditional algorithms in two ways: 1) the number of output records expected at each machine is considered, which leads to better handling of join product skew, and 2) a small number of input records are sampled before the actual join begins so that an efficient execution plan considering the degree of data skew can be created. As a result, in a scalar skew experiment, the proposed join algorithm is about 6.76 times faster than the range-based algorithm when join product skew exists and about 5.14 times than the randomized algorithm when input relations are not skewed. Moreover, through the worst-case analysis, we show that the input and the output imbalances are less than or equal to 2. The proposed algorithm does not require any modification to the original MapReduce environment and is applicable to complex join operations such as theta-joins and multi-way joins.
•We introduce a skew handling algorithm, called multi-dimensional range partitioning.•The proposed algorithm is more efficient than traditional MapReduce-based join algorithms.•The proposed algorithm is scalable regardless of the size of input data. One of the major obstacles hindering effective join processing on MapReduce is data skew. Since MapReduce’s basic hash-based partitioning method cannot solve the problem properly, two alternatives have been proposed: range-based and randomized methods. However, they still remain some drawbacks: the range-based method does not handle join product skew, and the randomized method performs worse than the basic hash-based partitioning when input relations are not skewed. In this paper, we present a new skew handling method, called multi-dimensional range partitioning (MDRP). The proposed method overcomes the limitations of traditional algorithms in two ways: 1) the number of output records expected at each machine is considered, which leads to better handling of join product skew, and 2) a small number of input records are sampled before the actual join begins so that an efficient execution plan considering the degree of data skew can be created. As a result, in a scalar skew experiment, the proposed join algorithm is about 6.76 times faster than the range-based algorithm when join product skew exists and about 5.14 times than the randomized algorithm when input relations are not skewed. Moreover, through the worst-case analysis, we show that the input and the output imbalances are less than or equal to 2. The proposed algorithm does not require any modification to the original MapReduce environment and is applicable to complex join operations such as theta-joins and multi-way joins.
Author Myung, Jaeseok
Shim, Junho
Lee, Sang-goo
Yeon, Jongheum
Author_xml – sequence: 1
  givenname: Jaeseok
  surname: Myung
  fullname: Myung, Jaeseok
  email: jsmyung@europa.snu.ac.kr
  organization: Corporate Design Center, Samsung Electronics Co., Ltd., South Korea
– sequence: 2
  givenname: Junho
  orcidid: 0000-0003-4315-4117
  surname: Shim
  fullname: Shim, Junho
  email: jshim@sookmyung.ac.kr
  organization: Division of Computer Science, Sookmyung Women’s University, South Korea
– sequence: 3
  givenname: Jongheum
  surname: Yeon
  fullname: Yeon, Jongheum
  email: jonghm@europa.snu.ac.kr
  organization: School of Computer Science and Engineering, Seoul National University, South Korea
– sequence: 4
  givenname: Sang-goo
  surname: Lee
  fullname: Lee, Sang-goo
  email: sglee@europa.snu.ac.kr
  organization: School of Computer Science and Engineering, Seoul National University, South Korea
BookMark eNp9kLFOwzAURS1UJNrCDzBlZEl4jp3YkVhQBRQJhIRgthz7pTikSbETKv6eRGVi6PLucs-V3lmQWdu1SMglhYQCza_rBMNeJynQLKFpAik_IXMqBYtzUbAZmUORiZhTwc_IIoQagAoAMSfZWre2ce0msrrXUfjEfeTaqO7Go5tN513_sQ3REKbKs969oh0MnpPTSjcBL_5ySd7v795W6_jp5eFxdfsUGyagjwvGCw1FLqWsCokaLPAcqLHWGJZzxiohyrzgUjJreFlWmFaClybNgSEDxpbk6rC7893XgKFXWxcMNo1usRuCojLNxkWZy7GaHqrGdyF4rNTOu632P4qCmhypWk2O1ORI0VSNjkZI_oOM63Xvurb32jXH0ZsDiuP_3w69CsZha9A6j6ZXtnPH8F9EKYM7
CitedBy_id crossref_primary_10_1016_j_ins_2018_11_007
crossref_primary_10_3390_electronics12071613
crossref_primary_10_1002_cpe_4387
crossref_primary_10_1109_COMST_2021_3094993
crossref_primary_10_1007_s11227_020_03262_6
crossref_primary_10_3233_MGS_180283
crossref_primary_10_1007_s11227_019_02907_5
crossref_primary_10_1016_j_jpdc_2017_02_007
crossref_primary_10_13005_ojcst_10_02_03
crossref_primary_10_1007_s12652_020_01707_7
crossref_primary_10_3390_app12136554
crossref_primary_10_3233_JIFS_201220
crossref_primary_10_1007_s11227_018_2578_0
crossref_primary_10_1016_j_jksuci_2020_05_004
crossref_primary_10_1186_s40537_018_0146_3
Cites_doi 10.14778/2350229.2350238
10.1145/2094114.2094118
10.1145/1327452.1327492
10.1145/348.318586
10.1007/s00778-013-0319-9
10.1006/jpdc.1994.1148
10.1007/s10586-014-0348-1
10.1109/TPDS.2014.2350972
10.1109/ICDE.2012.58
10.1080/00107510500052444
10.1016/j.procs.2014.05.014
10.1109/ICDE.2016.7498250
10.1016/j.compeleceng.2013.07.001
10.1137/0117039
ContentType Journal Article
Copyright 2016 Elsevier Ltd
Copyright_xml – notice: 2016 Elsevier Ltd
DBID AAYXX
CITATION
7SC
8FD
JQ2
L7M
L~C
L~D
DOI 10.1016/j.eswa.2015.12.024
DatabaseName CrossRef
Computer and Information Systems Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Computer and Information Systems Abstracts
Technology Research Database
Computer and Information Systems Abstracts – Academic
Advanced Technologies Database with Aerospace
ProQuest Computer Science Collection
Computer and Information Systems Abstracts Professional
DatabaseTitleList Computer and Information Systems Abstracts

DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1873-6793
EndPage 299
ExternalDocumentID 10_1016_j_eswa_2015_12_024
S095741741500826X
GroupedDBID --K
--M
.DC
.~1
0R~
13V
1B1
1RT
1~.
1~5
4.4
457
4G.
5GY
5VS
7-5
71M
8P~
9JN
9JO
AAAKF
AABNK
AACTN
AAEDT
AAEDW
AAIAV
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AARIN
AAXUO
AAYFN
ABBOA
ABFNM
ABMAC
ABMVD
ABUCO
ABYKQ
ACDAQ
ACGFS
ACHRH
ACNTT
ACRLP
ACZNC
ADBBV
ADEZE
ADTZH
AEBSH
AECPX
AEKER
AENEX
AFKWA
AFTJW
AGHFR
AGJBL
AGUBO
AGUMN
AGYEJ
AHHHB
AHJVU
AHZHX
AIALX
AIEXJ
AIKHN
AITUG
AJBFU
AJOXV
ALEQD
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
AOUOD
APLSM
AXJTR
BJAXD
BKOJK
BLXMC
BNSAS
CS3
DU5
EBS
EFJIC
EFLBG
EJD
EO8
EO9
EP2
EP3
F5P
FDB
FIRID
FNPLU
FYGXN
G-Q
GBLVA
GBOLZ
HAMUX
IHE
J1W
JJJVA
KOM
LG9
LY1
LY7
M41
MO0
N9A
O-L
O9-
OAUVE
OZT
P-8
P-9
P2P
PC.
PQQKQ
Q38
RIG
ROL
RPZ
SDF
SDG
SDP
SDS
SES
SPC
SPCBC
SSB
SSD
SSL
SST
SSV
SSZ
T5K
TN5
~G-
29G
AAAKG
AAQXK
AATTM
AAXKI
AAYWO
AAYXX
ABJNI
ABKBG
ABWVN
ABXDB
ACLOT
ACNNM
ACRPL
ACVFH
ADCNI
ADJOM
ADMUD
ADNMO
AEIPS
AEUPX
AFJKZ
AFPUW
AGQPQ
AIGII
AIIUN
AKBMS
AKRWK
AKYEP
ANKPU
APXCP
ASPBG
AVWKF
AZFZN
CITATION
EFKBS
FEDTE
FGOYB
G-2
HLZ
HVGLF
HZ~
R2-
SBC
SET
SEW
WUQ
XPP
ZMT
~HD
7SC
8FD
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c370t-9349a096888f98ea0d04601cddcc36433f77b694883dc4bbfe2f74bc2603e3033
IEDL.DBID .~1
ISSN 0957-4174
IngestDate Thu Oct 02 11:19:18 EDT 2025
Thu Apr 24 22:50:43 EDT 2025
Wed Oct 01 03:51:47 EDT 2025
Fri Feb 23 02:29:04 EST 2024
IsPeerReviewed true
IsScholarly true
Keywords Multi-dimensional range partitioning
Join algorithm
Skew handling
MapReduce
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c370t-9349a096888f98ea0d04601cddcc36433f77b694883dc4bbfe2f74bc2603e3033
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ORCID 0000-0003-4315-4117
PQID 1825460868
PQPubID 23500
PageCount 14
ParticipantIDs proquest_miscellaneous_1825460868
crossref_primary_10_1016_j_eswa_2015_12_024
crossref_citationtrail_10_1016_j_eswa_2015_12_024
elsevier_sciencedirect_doi_10_1016_j_eswa_2015_12_024
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2016-06-01
PublicationDateYYYYMMDD 2016-06-01
PublicationDate_xml – month: 06
  year: 2016
  text: 2016-06-01
  day: 01
PublicationDecade 2010
PublicationTitle Expert systems with applications
PublicationYear 2016
Publisher Elsevier Ltd
Publisher_xml – name: Elsevier Ltd
References Chen (bib0007) 2014
Hassan (bib0018) 2014; 29
Gufler, B. (2012). Load balancing in mapreduce based on scalable cardinality estimates. In Proceedings of the 2012 IEEE 28th international conference on data engineering, (pp. 522–533).
Zhang (bib0035) 2012; 5
Doulkeridis, Norvag (bib0011) 2014; 23
Okcan, Riedewald (bib0027) 2011
White (bib0033) 2009
Gibbons (bib0012) 1997
Xu (bib0034) 2014; 40
Afrati, Ullman (bib0001) 2010
Epstein (bib0036) 1978
Polychroniou (bib0028) 2014
Slagter (bib0030) 2014; 17
Beame (bib0004) 2014
Lu, Tan (bib0023) 1994; 23
Kitsuregawa, Ogawa (bib0020) 1990
Newman (bib0025) 2005; 46
DeWitt (bib0010) 1991
Graham (bib0013) 1969; 17
Shatdal, Naughton (bib0029) 1993
Hahn, Warren (bib0016) 1999
Kwon (bib0021) 2012
Nievergelt (bib0026) 1984; 9
DeWitt (bib0009) 1992
Lee (bib0022) 2012; 40
Atta (bib0002) 2011
Dean, Ghemawat (bib0008) 2008; 51
Gu (bib0014) 2014
Beame (bib0003) 2013
Vitorovic, A. (2015). Load balancing and skew resilience for parallel joins: EPFL report.
Walton (bib0032) 1991
Chen (bib0006) 2015; 26
Harada, Kitsuregawa (bib0017) 1995
Muralikrishna, DeWitt (bib0024) 1988
Bruno (bib0005) 2014; vol. 7
Hua, Lee (bib0019) 1991
Harada (10.1016/j.eswa.2015.12.024_bib0017) 1995
Lee (10.1016/j.eswa.2015.12.024_bib0022) 2012; 40
Walton (10.1016/j.eswa.2015.12.024_bib0032) 1991
Atta (10.1016/j.eswa.2015.12.024_bib0002) 2011
Lu (10.1016/j.eswa.2015.12.024_bib0023) 1994; 23
Bruno (10.1016/j.eswa.2015.12.024_bib0005) 2014; vol. 7
Chen (10.1016/j.eswa.2015.12.024_bib0006) 2015; 26
DeWitt (10.1016/j.eswa.2015.12.024_bib0009) 1992
Hassan (10.1016/j.eswa.2015.12.024_bib0018) 2014; 29
Doulkeridis (10.1016/j.eswa.2015.12.024_bib0011) 2014; 23
DeWitt (10.1016/j.eswa.2015.12.024_bib0010) 1991
Zhang (10.1016/j.eswa.2015.12.024_bib0035) 2012; 5
Dean (10.1016/j.eswa.2015.12.024_bib0008) 2008; 51
Kwon (10.1016/j.eswa.2015.12.024_bib0021) 2012
Xu (10.1016/j.eswa.2015.12.024_bib0034) 2014; 40
Okcan (10.1016/j.eswa.2015.12.024_bib0027) 2011
Gu (10.1016/j.eswa.2015.12.024_bib0014) 2014
Kitsuregawa (10.1016/j.eswa.2015.12.024_bib0020) 1990
White (10.1016/j.eswa.2015.12.024_bib0033) 2009
Graham (10.1016/j.eswa.2015.12.024_bib0013) 1969; 17
Nievergelt (10.1016/j.eswa.2015.12.024_bib0026) 1984; 9
Hua (10.1016/j.eswa.2015.12.024_bib0019) 1991
Beame (10.1016/j.eswa.2015.12.024_bib0004) 2014
Hahn (10.1016/j.eswa.2015.12.024_bib0016) 1999
Slagter (10.1016/j.eswa.2015.12.024_bib0030) 2014; 17
Gibbons (10.1016/j.eswa.2015.12.024_bib0012) 1997
Newman (10.1016/j.eswa.2015.12.024_bib0025) 2005; 46
Chen (10.1016/j.eswa.2015.12.024_bib0007) 2014
10.1016/j.eswa.2015.12.024_bib0015
Beame (10.1016/j.eswa.2015.12.024_bib0003) 2013
10.1016/j.eswa.2015.12.024_bib0031
Afrati (10.1016/j.eswa.2015.12.024_bib0001) 2010
Polychroniou (10.1016/j.eswa.2015.12.024_bib0028) 2014
Shatdal (10.1016/j.eswa.2015.12.024_bib0029) 1993
Epstein (10.1016/j.eswa.2015.12.024_bib0036) 1978
Muralikrishna (10.1016/j.eswa.2015.12.024_bib0024) 1988
References_xml – volume: 40
  start-page: 675
  year: 2014
  end-page: 687
  ident: bib0034
  article-title: Balancing reducer workload for skewed data using sampling-based partitioning
  publication-title: Computers & Electrical Engineering
– volume: 29
  start-page: 145
  year: 2014
  end-page: 158
  ident: bib0018
  article-title: Handling data-skew effects in join operations using mapreduce
  publication-title: Procedia Computer Science
– reference: Gufler, B. (2012). Load balancing in mapreduce based on scalable cardinality estimates. In Proceedings of the 2012 IEEE 28th international conference on data engineering, (pp. 522–533).
– reference: Vitorovic, A. (2015). Load balancing and skew resilience for parallel joins: EPFL report.
– start-page: 949
  year: 2011
  end-page: 960
  ident: bib0027
  article-title: Processing theta-joins using mapreduce
  publication-title: Proceedings of the 2011 ACM SIGMOD international conference on management of data
– volume: 9
  start-page: 38
  year: 1984
  end-page: 71
  ident: bib0026
  article-title: The grid file: An adaptable, symmetric multikey file structure
  publication-title: ACM Transactions on Database Systems
– volume: vol. 7
  start-page: 1484
  year: 2014
  end-page: 1495
  ident: bib0005
  article-title: Advanced join strategies for large-scale distributed computation
  publication-title: Proceedings of the VLDB endowment
– start-page: 443
  year: 1991
  end-page: 452
  ident: bib0010
  article-title: An evaluation of non-equijoin algorithms
  publication-title: Proceedings of the 17th international conference on very large data bases
– year: 1999
  ident: bib0016
  article-title: Extended edited synoptic cloud reports from ships and land stations over the globe, 1952–1996
– start-page: 27
  year: 1992
  end-page: 40
  ident: bib0009
  article-title: Practical skew handling in parallel joins
  publication-title: Proceedings of the 18th international conference on very large data bases
– volume: 46
  start-page: 323
  year: 2005
  end-page: 351
  ident: bib0025
  article-title: Power laws, pareto distributions and zipf’s law
  publication-title: Contemporary Physics
– start-page: 537
  year: 1991
  end-page: 548
  ident: bib0032
  article-title: A taxonomy and performance model of data skew effects in parallel joins
  publication-title: Proceedings of the 17th international conference on very large data bases
– start-page: 119
  year: 1993
  end-page: 128
  ident: bib0029
  article-title: Using shared virtual memory for parallel join processing
  publication-title: Proceedings of the 1993 ACM SIGMOD international conference on management of data
– volume: 17
  start-page: 629
  year: 2014
  end-page: 641
  ident: bib0030
  article-title: Smartjoin: A network-aware multiway join for mapreduce
  publication-title: Cluster Computing
– volume: 5
  start-page: 1184
  year: 2012
  end-page: 1195
  ident: bib0035
  article-title: Efficient multi-way theta-join processing using mapreduce
  publication-title: VLDB Endowment
– start-page: 99
  year: 2010
  end-page: 110
  ident: bib0001
  article-title: Optimizing joins in a map-reduce environment
  publication-title: Proceedings of the 13th international conference on extending database technology
– volume: 51
  start-page: 107
  year: 2008
  end-page: 113
  ident: bib0008
  article-title: Mapreduce: Simplified data processing on large clusters
  publication-title: Communications of ACM
– start-page: 246
  year: 2014
  end-page: 261
  ident: bib0014
  article-title: Cost-based join algorithm selection in hadoop
  publication-title: Web Information Systems Engineering
– year: 1997
  ident: bib0012
  publication-title: Nonparametric methods for quantitative analysis
– start-page: 169
  year: 1978
  end-page: 180
  ident: bib0036
  article-title: Distributed query processing in a relational data base system
  publication-title: Proceedings of the 1978 ACM SIGMOD international conference on management of data
– volume: 26
  start-page: 2520
  year: 2015
  end-page: 2533
  ident: bib0006
  article-title: LIBRA: Lightweight data skew mitigation in mapreduce
  publication-title: IEEE Transactions on Parallel and Distributed Systems
– volume: 17
  start-page: 416
  year: 1969
  end-page: 429
  ident: bib0013
  article-title: Bounds on multiprocessing timing anomalies
  publication-title: SIAM Journal on Applied Mathematics
– start-page: 25
  year: 2012
  end-page: 36
  ident: bib0021
  article-title: Skewtune: Mitigating skew in mapreduce applications
  publication-title: Proceedings of the 2012 ACM SIGMOD international conference on management of data
– volume: 23
  start-page: 382
  year: 1994
  end-page: 398
  ident: bib0023
  article-title: Load-balanced join processing in shared-nothing systems
  publication-title: Journal of Parallel and Distributed Computing
– start-page: 525
  year: 1991
  end-page: 535
  ident: bib0019
  article-title: Handling data skew in multiprocessor database computers using partition tuning
  publication-title: Proceedings of the 17th international conference on very large data bases
– year: 2009
  ident: bib0033
  publication-title: Hadoop: The definitive guide
– volume: 23
  start-page: 355
  year: 2014
  end-page: 380
  ident: bib0011
  article-title: A survey of large-scale analytical query processing in mapreduce
  publication-title: The VLDB Journal
– start-page: 170
  year: 2011
  end-page: 175
  ident: bib0002
  article-title: SAND join—A skew handling join algorithm for google’s mapreduce framework
  publication-title: 2011 IEEE 14th international multitopic conference (INMIC)
– start-page: 28
  year: 1988
  end-page: 36
  ident: bib0024
  article-title: Equi-depth multidimensional histograms
  publication-title: Proceedings of the 1988 ACM SIGMOD international conference on management of data
– start-page: 1483
  year: 2014
  end-page: 1494
  ident: bib0028
  article-title: Track join: Distributed joins with minimal network traffic
  publication-title: Proceedings of the 2014 ACM SIGMOD international conference on management of data
– year: 2013
  ident: bib0003
  article-title: Communication steps for parallel query processing
  publication-title: Proceedings of the 32nd symposium on principles of database systems
– start-page: 246
  year: 1995
  end-page: 255
  ident: bib0017
  article-title: Dynamic join product skew handling for hash-joins in shared-nothing database systems
  publication-title: Proceedings of the 4th international conference on database systems for advanced applications
– volume: 40
  start-page: 11
  year: 2012
  end-page: 20
  ident: bib0022
  article-title: Parallel data processing with mapreduce: A survey
  publication-title: SIGMOD Record
– start-page: 212
  year: 2014
  end-page: 223
  ident: bib0004
  article-title: Skew in parallel query processing
  publication-title: Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems
– start-page: 229
  year: 2014
  end-page: 241
  ident: bib0007
  article-title: Algorithms and Architectures for Parallel Processing
  publication-title: Volume 8630 of the series Lecture Notes in Computer Science
– start-page: 210
  year: 1990
  end-page: 221
  ident: bib0020
  article-title: Bucket spreading parallel hash: A new, robust, parallel hash join method for data skew in the super database computer (SDC)
  publication-title: Proceedings of the 16th international conference on very large data bases
– volume: 5
  start-page: 1184
  issue: 11
  year: 2012
  ident: 10.1016/j.eswa.2015.12.024_bib0035
  article-title: Efficient multi-way theta-join processing using mapreduce
  publication-title: VLDB Endowment
  doi: 10.14778/2350229.2350238
– start-page: 210
  year: 1990
  ident: 10.1016/j.eswa.2015.12.024_bib0020
  article-title: Bucket spreading parallel hash: A new, robust, parallel hash join method for data skew in the super database computer (SDC)
– volume: 40
  start-page: 11
  issue: 4
  year: 2012
  ident: 10.1016/j.eswa.2015.12.024_bib0022
  article-title: Parallel data processing with mapreduce: A survey
  publication-title: SIGMOD Record
  doi: 10.1145/2094114.2094118
– volume: 51
  start-page: 107
  year: 2008
  ident: 10.1016/j.eswa.2015.12.024_bib0008
  article-title: Mapreduce: Simplified data processing on large clusters
  publication-title: Communications of ACM
  doi: 10.1145/1327452.1327492
– volume: vol. 7
  start-page: 1484
  year: 2014
  ident: 10.1016/j.eswa.2015.12.024_bib0005
  article-title: Advanced join strategies for large-scale distributed computation
– start-page: 169
  year: 1978
  ident: 10.1016/j.eswa.2015.12.024_bib0036
  article-title: Distributed query processing in a relational data base system
– start-page: 1483
  year: 2014
  ident: 10.1016/j.eswa.2015.12.024_bib0028
  article-title: Track join: Distributed joins with minimal network traffic
– volume: 9
  start-page: 38
  issue: 1
  year: 1984
  ident: 10.1016/j.eswa.2015.12.024_bib0026
  article-title: The grid file: An adaptable, symmetric multikey file structure
  publication-title: ACM Transactions on Database Systems
  doi: 10.1145/348.318586
– start-page: 443
  year: 1991
  ident: 10.1016/j.eswa.2015.12.024_bib0010
  article-title: An evaluation of non-equijoin algorithms
– volume: 23
  start-page: 355
  issue: 3
  year: 2014
  ident: 10.1016/j.eswa.2015.12.024_bib0011
  article-title: A survey of large-scale analytical query processing in mapreduce
  publication-title: The VLDB Journal
  doi: 10.1007/s00778-013-0319-9
– year: 1997
  ident: 10.1016/j.eswa.2015.12.024_bib0012
– volume: 23
  start-page: 382
  issue: 3
  year: 1994
  ident: 10.1016/j.eswa.2015.12.024_bib0023
  article-title: Load-balanced join processing in shared-nothing systems
  publication-title: Journal of Parallel and Distributed Computing
  doi: 10.1006/jpdc.1994.1148
– start-page: 525
  year: 1991
  ident: 10.1016/j.eswa.2015.12.024_bib0019
  article-title: Handling data skew in multiprocessor database computers using partition tuning
– start-page: 170
  year: 2011
  ident: 10.1016/j.eswa.2015.12.024_bib0002
  article-title: SAND join—A skew handling join algorithm for google’s mapreduce framework
– start-page: 99
  year: 2010
  ident: 10.1016/j.eswa.2015.12.024_bib0001
  article-title: Optimizing joins in a map-reduce environment
– year: 2013
  ident: 10.1016/j.eswa.2015.12.024_bib0003
  article-title: Communication steps for parallel query processing
– start-page: 212
  year: 2014
  ident: 10.1016/j.eswa.2015.12.024_bib0004
  article-title: Skew in parallel query processing
– start-page: 229
  year: 2014
  ident: 10.1016/j.eswa.2015.12.024_bib0007
  article-title: Algorithms and Architectures for Parallel Processing
  publication-title: Volume 8630 of the series Lecture Notes in Computer Science
– start-page: 25
  year: 2012
  ident: 10.1016/j.eswa.2015.12.024_bib0021
  article-title: Skewtune: Mitigating skew in mapreduce applications
– start-page: 119
  year: 1993
  ident: 10.1016/j.eswa.2015.12.024_bib0029
  article-title: Using shared virtual memory for parallel join processing
– volume: 17
  start-page: 629
  issue: 3
  year: 2014
  ident: 10.1016/j.eswa.2015.12.024_bib0030
  article-title: Smartjoin: A network-aware multiway join for mapreduce
  publication-title: Cluster Computing
  doi: 10.1007/s10586-014-0348-1
– volume: 26
  start-page: 2520
  issue: 9
  year: 2015
  ident: 10.1016/j.eswa.2015.12.024_bib0006
  article-title: LIBRA: Lightweight data skew mitigation in mapreduce
  publication-title: IEEE Transactions on Parallel and Distributed Systems
  doi: 10.1109/TPDS.2014.2350972
– ident: 10.1016/j.eswa.2015.12.024_bib0015
  doi: 10.1109/ICDE.2012.58
– volume: 46
  start-page: 323
  year: 2005
  ident: 10.1016/j.eswa.2015.12.024_bib0025
  article-title: Power laws, pareto distributions and zipf’s law
  publication-title: Contemporary Physics
  doi: 10.1080/00107510500052444
– start-page: 537
  year: 1991
  ident: 10.1016/j.eswa.2015.12.024_bib0032
  article-title: A taxonomy and performance model of data skew effects in parallel joins
– start-page: 27
  year: 1992
  ident: 10.1016/j.eswa.2015.12.024_bib0009
  article-title: Practical skew handling in parallel joins
– volume: 29
  start-page: 145
  year: 2014
  ident: 10.1016/j.eswa.2015.12.024_bib0018
  article-title: Handling data-skew effects in join operations using mapreduce
  publication-title: Procedia Computer Science
  doi: 10.1016/j.procs.2014.05.014
– start-page: 949
  year: 2011
  ident: 10.1016/j.eswa.2015.12.024_bib0027
  article-title: Processing theta-joins using mapreduce
– year: 2009
  ident: 10.1016/j.eswa.2015.12.024_bib0033
– start-page: 246
  year: 1995
  ident: 10.1016/j.eswa.2015.12.024_bib0017
  article-title: Dynamic join product skew handling for hash-joins in shared-nothing database systems
– start-page: 246
  year: 2014
  ident: 10.1016/j.eswa.2015.12.024_bib0014
  article-title: Cost-based join algorithm selection in hadoop
  publication-title: Web Information Systems Engineering
– start-page: 28
  year: 1988
  ident: 10.1016/j.eswa.2015.12.024_bib0024
  article-title: Equi-depth multidimensional histograms
– ident: 10.1016/j.eswa.2015.12.024_bib0031
  doi: 10.1109/ICDE.2016.7498250
– volume: 40
  start-page: 675
  issue: 2
  year: 2014
  ident: 10.1016/j.eswa.2015.12.024_bib0034
  article-title: Balancing reducer workload for skewed data using sampling-based partitioning
  publication-title: Computers & Electrical Engineering
  doi: 10.1016/j.compeleceng.2013.07.001
– volume: 17
  start-page: 416
  issue: 2
  year: 1969
  ident: 10.1016/j.eswa.2015.12.024_bib0013
  article-title: Bounds on multiprocessing timing anomalies
  publication-title: SIAM Journal on Applied Mathematics
  doi: 10.1137/0117039
– year: 1999
  ident: 10.1016/j.eswa.2015.12.024_bib0016
SSID ssj0017007
Score 2.3174646
Snippet •We introduce a skew handling algorithm, called multi-dimensional range partitioning.•The proposed algorithm is more efficient than traditional MapReduce-based...
One of the major obstacles hindering effective join processing on MapReduce is data skew. Since MapReduce's basic hash-based partitioning method cannot solve...
SourceID proquest
crossref
elsevier
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 286
SubjectTerms Algorithms
Expert systems
Handles
Handling
Join algorithm
MapReduce
Multi-dimensional range partitioning
Obstacles
Partitioning
Scalars
Skew handling
Title Handling data skew in join algorithms using MapReduce
URI https://dx.doi.org/10.1016/j.eswa.2015.12.024
https://www.proquest.com/docview/1825460868
Volume 51
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVESC
  databaseName: Baden-Württemberg Complete Freedom Collection (Elsevier)
  customDbUrl:
  eissn: 1873-6793
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0017007
  issn: 0957-4174
  databaseCode: GBLVA
  dateStart: 20110101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
– providerCode: PRVESC
  databaseName: Elsevier ScienceDirect [Accès UNIL ; CHUV ; HEP Vaud ; Sites BCUL]
  customDbUrl:
  eissn: 1873-6793
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0017007
  issn: 0957-4174
  databaseCode: ACRLP
  dateStart: 19950101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
– providerCode: PRVESC
  databaseName: Elsevier SD Freedom Collection Journals [SCFCJ] - access via UTK
  customDbUrl:
  eissn: 1873-6793
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0017007
  issn: 0957-4174
  databaseCode: AIKHN
  dateStart: 19950101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
– providerCode: PRVESC
  databaseName: ScienceDirect (Elsevier)
  customDbUrl:
  eissn: 1873-6793
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0017007
  issn: 0957-4174
  databaseCode: .~1
  dateStart: 19950101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
– providerCode: PRVLSH
  databaseName: Elsevier Journals
  customDbUrl:
  mediaType: online
  eissn: 1873-6793
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0017007
  issn: 0957-4174
  databaseCode: AKRWK
  dateStart: 19900101
  isFulltext: true
  providerName: Library Specific Holdings
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LS8NAEF5EL158i_XFCt4kNuluus2xiKUq9eADelt2N7O1tabFVnrztzuTR0FBD14CCbshTLLzzWS_-Yax80YaGumdCiAGTFBM6IOWj10ATgkPyiRWUe1w777ZfZa3_bi_wq6qWhiiVZa-v_Dpubcur9RLa9anw2H9EYMDhENCRMKxZp8q2KWiLgaXn0uaB8nPqUJvTwU0uiycKTheMFuQ9lAU578EG_I3cPrhpnPs6WyxjTJo5O3iubbZCmQ7bLNqyMDL9bnL4i5pJiAYcSJ-8tkrLPgw46MJHsx4MHkfzl_eZpy47gPeM9MH0m2FPfbcuX666gZlX4TACRXOg0TIxGDqgcmrT1pgwpR2NyOXps4JjDCEV8o2E1yaInXSWg8Nr6R1mLoIQMgS-2w1m2RwwDhYakgllUnBSpxpQQgTWh9SMyFvkxqLKoNoV4qGU--Ksa7YYSNNRtRkRB01NBqxxi6Wc6aFZMafo-PKzvrbi9fo0_-cd1a9FI0rgrY5TAaTj5mOco1_TNVah_-89xFbx7NmwQg7Zqvz9w84wdhjbk_zj-uUrbVv7rr3X_eB2DE
linkProvider Elsevier
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV25TsQwELU4Cmi4ETdGokNhk9hZb0qEQMuxFBzSdpbtjGE5sqs9RMe3M5MDCSQoaFIkdhSN43kz9vMbxg7jLDTSOxVAApigmNAHLZ-4AJwSHpRJraKzw52bZvtBXnaT7hQ7rc_CEK2y8v2lTy-8dXWnUVmzMej1GncYHCAcEiISjjW702xWJrGiDOz444vnQfpzqhTcUwE1r07OlCQvGL2T-FCUFGuCsfwNnX746QJ8zpfYQhU18pPyw5bZFOQrbLGuyMCrCbrKkjaJJiAacWJ-8tELvPNezp_7eDGvj_1hb_z0NuJEdn_kHTO4JeFWWGMP52f3p-2gKowQOKHCcZAKmRrMPTB79WkLTJjR9mbkssw5gSGG8ErZZopzU2ROWush9kpah7mLAMQssc5m8n4OG4yDpYpUUpkMrMSeFoQwofUhVRPyNt1kUW0Q7SrVcCpe8apretizJiNqMqKOYo1G3GRHX30GpWbGn62T2s7628hrdOp_9juoB0XjlKB9DpNDfzLSUSHyj7laa-uf795nc-37zrW-vri52mbz-KRZ0sN22Mx4OIFdDETGdq_40T4BHfPZxg
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Handling+data+skew+in+join+algorithms+using+MapReduce&rft.jtitle=Expert+systems+with+applications&rft.au=Myung%2C+Jaeseok&rft.au=Shim%2C+Junho&rft.au=Yeon%2C+Jongheum&rft.au=Lee%2C+Sang-goo&rft.date=2016-06-01&rft.issn=0957-4174&rft.volume=51&rft.spage=286&rft.epage=299&rft_id=info:doi/10.1016%2Fj.eswa.2015.12.024&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_eswa_2015_12_024
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0957-4174&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0957-4174&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0957-4174&client=summon