Big Data: Tutorial and guidelines on information and process fusion for analytics algorithms with MapReduce

•An overview of the technologies for Big Data analytics is presented.•A taxonomy for the design of information and process fusion in Big Data is introduced.•The scalability of the alternative approaches of such fusion of information/models is analyzed.•Several guidelines are given for future study o...

Full description

Saved in:
Bibliographic Details
Published inInformation fusion Vol. 42; pp. 51 - 61
Main Authors Ramírez-Gallego, Sergio, Fernández, Alberto, García, Salvador, Chen, Min, Herrera, Francisco
Format Journal Article
LanguageEnglish
Published Elsevier B.V 01.07.2018
Subjects
Online AccessGet full text
ISSN1566-2535
1872-6305
DOI10.1016/j.inffus.2017.10.001

Cover

Abstract •An overview of the technologies for Big Data analytics is presented.•A taxonomy for the design of information and process fusion in Big Data is introduced.•The scalability of the alternative approaches of such fusion of information/models is analyzed.•Several guidelines are given for future study on the topic. We live in a world were data are generated from a myriad of sources, and it is really cheap to collect and storage such data. However, the real benefit is not related to the data itself, but with the algorithms that are capable of processing such data in a tolerable elapse time, and to extract valuable knowledge from it. Therefore, the use of Big Data Analytics tools provide very significant advantages to both industry and academia. The MapReduce programming framework can be stressed as the main paradigm related with such tools. It is mainly identified by carrying out a distributed execution for the sake of providing a high degree of scalability, together with a fault-tolerant scheme. In every MapReduce algorithm, first local models are learned with a subset of the original data within the so-called Map tasks. Then, the Reduce task is devoted to fuse the partial outputs generated by each Map. The ways of designing such fusion of information/models may have a strong impact in the quality of the final system. In this work, we will enumerate and analyze two alternative methodologies that may be found both in the specialized literature and in standard Machine Learning libraries for Big Data. Our main objective is to provide an introduction of the characteristics of these methodologies, as well as giving some guidelines for the design of novel algorithms in this field of research. Finally, a short experimental study will allow us to contrast the scalability issues for each type of process fusion in MapReduce for Big Data Analytics.
AbstractList •An overview of the technologies for Big Data analytics is presented.•A taxonomy for the design of information and process fusion in Big Data is introduced.•The scalability of the alternative approaches of such fusion of information/models is analyzed.•Several guidelines are given for future study on the topic. We live in a world were data are generated from a myriad of sources, and it is really cheap to collect and storage such data. However, the real benefit is not related to the data itself, but with the algorithms that are capable of processing such data in a tolerable elapse time, and to extract valuable knowledge from it. Therefore, the use of Big Data Analytics tools provide very significant advantages to both industry and academia. The MapReduce programming framework can be stressed as the main paradigm related with such tools. It is mainly identified by carrying out a distributed execution for the sake of providing a high degree of scalability, together with a fault-tolerant scheme. In every MapReduce algorithm, first local models are learned with a subset of the original data within the so-called Map tasks. Then, the Reduce task is devoted to fuse the partial outputs generated by each Map. The ways of designing such fusion of information/models may have a strong impact in the quality of the final system. In this work, we will enumerate and analyze two alternative methodologies that may be found both in the specialized literature and in standard Machine Learning libraries for Big Data. Our main objective is to provide an introduction of the characteristics of these methodologies, as well as giving some guidelines for the design of novel algorithms in this field of research. Finally, a short experimental study will allow us to contrast the scalability issues for each type of process fusion in MapReduce for Big Data Analytics.
Author García, Salvador
Chen, Min
Herrera, Francisco
Ramírez-Gallego, Sergio
Fernández, Alberto
Author_xml – sequence: 1
  givenname: Sergio
  surname: Ramírez-Gallego
  fullname: Ramírez-Gallego, Sergio
  email: sramirez@decsai.ugr.es
  organization: Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain
– sequence: 2
  givenname: Alberto
  surname: Fernández
  fullname: Fernández, Alberto
  email: alberto@decsai.ugr.es
  organization: Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain
– sequence: 3
  givenname: Salvador
  surname: García
  fullname: García, Salvador
  email: salvagl@decsai.ugr.es
  organization: Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain
– sequence: 4
  givenname: Min
  surname: Chen
  fullname: Chen, Min
  email: minchen2012@hust.edu.cn
  organization: School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China
– sequence: 5
  givenname: Francisco
  surname: Herrera
  fullname: Herrera, Francisco
  email: herrera@decsai.ugr.es
  organization: Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain
BookMark eNqFUMtOwzAQtFCRaAt_wME_kGLHtZP0gATlKRUhoXK2ts6muKRJZTug_j0O5cQBTjOa3ZnVzogMmrZBQs45m3DG1cVmYpuq6vwkZTyL0oQxfkSGPM_SRAkmB5FLpZJUCnlCRt5v4kLGBB-S92u7pjcQYEaXXWidhZpCU9J1Z0usbYOetg2N8a3bQrCR99Odaw16T-PNXorDKEO9D9Z4CvU65oS3raefEegT7F6w7AyekuMKao9nPzgmr3e3y_lDsni-f5xfLRIjmAqJlFOWp8jlFIoCeZEWObIShMqVEcgVroAzBMxkxdNVjjyrJJPFSkkQjEklxmR6yDWu9d5hpXfObsHtNWe6L0xv9KEw3RfWq7GPaJv9shkbvn8ODmz9n_nyYMb42IdFp72x2BgsrUMTdNnavwO-AIkojM4
CitedBy_id crossref_primary_10_1016_j_inffus_2019_01_001
crossref_primary_10_1016_j_inffus_2019_12_012
crossref_primary_10_1016_j_inffus_2023_02_023
crossref_primary_10_1186_s13673_019_0190_9
crossref_primary_10_1016_j_neucom_2020_05_078
crossref_primary_10_1109_ACCESS_2020_2969204
crossref_primary_10_3233_JIFS_220171
crossref_primary_10_1007_s12559_018_9613_6
crossref_primary_10_1016_j_ins_2019_11_022
crossref_primary_10_1109_ACCESS_2018_2872322
crossref_primary_10_1002_int_22193
crossref_primary_10_3390_e23091222
crossref_primary_10_1007_s12559_019_09647_x
crossref_primary_10_1016_j_knosys_2020_106598
crossref_primary_10_1109_ACCESS_2020_3007956
crossref_primary_10_1007_s10489_024_05763_w
crossref_primary_10_3390_electronics10151757
crossref_primary_10_1016_j_knosys_2019_105120
crossref_primary_10_1016_j_neucom_2018_01_039
crossref_primary_10_1016_j_neucom_2020_07_007
crossref_primary_10_3233_ICA_170555
crossref_primary_10_1016_j_ins_2020_12_082
crossref_primary_10_1016_j_inffus_2019_11_001
crossref_primary_10_1155_2021_9375664
crossref_primary_10_1007_s11042_022_13929_2
crossref_primary_10_1016_j_ins_2024_120883
crossref_primary_10_1088_1742_6596_1432_1_012074
crossref_primary_10_1016_j_inffus_2018_10_009
crossref_primary_10_1016_j_inffus_2022_10_028
crossref_primary_10_1016_j_ijmedinf_2019_05_024
crossref_primary_10_1007_s12559_019_09630_6
crossref_primary_10_1007_s12559_019_09632_4
crossref_primary_10_1016_j_eswa_2021_115419
crossref_primary_10_1109_ACCESS_2020_2988717
crossref_primary_10_1016_j_inffus_2023_102180
crossref_primary_10_1109_ACCESS_2019_2955983
crossref_primary_10_1186_s41044_018_0038_8
crossref_primary_10_3390_en12061036
crossref_primary_10_1016_j_aei_2023_102024
crossref_primary_10_1002_widm_1289
crossref_primary_10_1016_j_procs_2023_10_010
crossref_primary_10_1109_TFUZZ_2021_3049911
crossref_primary_10_1016_j_ins_2024_121587
crossref_primary_10_1155_2022_9708138
crossref_primary_10_1145_3480968
crossref_primary_10_3390_app122312265
crossref_primary_10_1051_e3sconf_202339904033
crossref_primary_10_1109_JSTARS_2022_3189052
crossref_primary_10_1016_j_inffus_2021_03_007
crossref_primary_10_1007_s12559_018_9612_7
crossref_primary_10_1186_s41044_019_0041_8
crossref_primary_10_3390_bdcc2040034
crossref_primary_10_1016_j_inffus_2018_07_008
crossref_primary_10_1186_s12859_018_2148_8
crossref_primary_10_1016_j_inffus_2020_10_008
crossref_primary_10_1016_j_ins_2018_12_002
crossref_primary_10_1061_JMENEA_MEENG_5498
crossref_primary_10_1109_MCI_2018_2881645
crossref_primary_10_3390_bdcc5010012
crossref_primary_10_1016_j_inffus_2021_04_016
crossref_primary_10_1145_3400031
crossref_primary_10_22201_iibi_24488321xe_2020_82_58035
crossref_primary_10_32604_cmc_2022_029604
crossref_primary_10_3390_rs14071568
crossref_primary_10_1016_j_comcom_2020_06_020
crossref_primary_10_1016_j_knosys_2018_04_037
crossref_primary_10_1007_s12559_019_09655_x
crossref_primary_10_1016_j_rcim_2019_101861
crossref_primary_10_1186_s40537_023_00808_2
crossref_primary_10_1016_j_comnet_2018_01_016
crossref_primary_10_1016_j_engappai_2020_104030
crossref_primary_10_1007_s42452_020_03870_0
crossref_primary_10_3390_s18124474
crossref_primary_10_1155_2021_9958427
crossref_primary_10_1016_j_knosys_2018_12_028
crossref_primary_10_1007_s12559_024_10295_z
crossref_primary_10_1007_s12652_019_01261_x
crossref_primary_10_1109_ACCESS_2018_2879158
crossref_primary_10_1016_j_displa_2023_102526
crossref_primary_10_1109_TBDATA_2021_3139069
crossref_primary_10_1016_j_future_2018_03_008
crossref_primary_10_1017_jmo_2018_81
crossref_primary_10_1016_j_asoc_2019_105504
crossref_primary_10_1177_1550147719870657
crossref_primary_10_1080_24751839_2018_1501542
Cites_doi 10.1016/j.inffus.2004.04.009
10.1109/COMST.2015.2444095
10.1109/TSMCC.2010.2103939
10.1109/ACCESS.2017.2694446
10.1002/int.21833
10.1002/widm.1134
10.1145/2094114.2094118
10.1023/A:1007614523901
10.1109/ACCESS.2014.2332453
10.1109/MCAS.2006.1688199
10.1016/j.ins.2014.03.043
10.1145/1327452.1327492
10.1080/18756891.2015.1017377
10.1016/j.inffus.2016.10.004
10.1007/s00778-014-0357-y
10.1016/j.fss.2014.01.015
10.1155/2015/748681
10.1109/TCYB.2015.2507599
10.1016/j.inffus.2015.06.002
10.1016/j.ins.2014.01.015
10.1016/j.inffus.2017.02.004
10.1038/ncomms5308
10.1016/j.inffus.2015.08.005
10.1145/79173.79181
10.1145/1629175.1629198
10.1109/SURV.2013.103013.00206
10.1080/18756891.2016.1180820
10.2307/41703503
10.1109/TKDE.2013.109
10.1007/s40747-017-0037-9
10.1016/j.neucom.2014.04.078
10.3390/bdcc1010001
10.1007/BF01589116
10.1109/TKDE.2011.208
10.1016/j.knosys.2016.06.012
10.1016/j.inffus.2015.06.005
10.1186/s41044-016-0020-2
ContentType Journal Article
Copyright 2017 Elsevier B.V.
Copyright_xml – notice: 2017 Elsevier B.V.
DBID AAYXX
CITATION
DOI 10.1016/j.inffus.2017.10.001
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Mathematics
EISSN 1872-6305
EndPage 61
ExternalDocumentID 10_1016_j_inffus_2017_10_001
S1566253517305912
GroupedDBID --K
--M
.DC
.~1
0R~
1B1
1~.
1~5
29I
4.4
457
4G.
5GY
5VS
7-5
71M
8P~
AACTN
AAEDT
AAEDW
AAIAV
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AAQXK
AAXUO
AAYFN
ABBOA
ABFNM
ABJNI
ABMAC
ABXDB
ABYKQ
ACDAQ
ACGFS
ACNNM
ACRLP
ACZNC
ADBBV
ADEZE
ADJOM
ADMUD
ADTZH
AEBSH
AECPX
AEKER
AENEX
AFKWA
AFTJW
AGHFR
AGUBO
AGYEJ
AHJVU
AHZHX
AIALX
AIEXJ
AIKHN
AITUG
AJBFU
AJOXV
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
AOUOD
ASPBG
AVWKF
AXJTR
AZFZN
BJAXD
BKOJK
BLXMC
CS3
DU5
EBS
EFJIC
EFLBG
EJD
EO8
EO9
EP2
EP3
F0J
F5P
FDB
FEDTE
FGOYB
FIRID
FNPLU
FYGXN
G-Q
GBLVA
GBOLZ
HVGLF
HZ~
IHE
J1W
JJJVA
KOM
M41
MO0
N9A
O-L
O9-
OAUVE
OZT
P-8
P-9
P2P
PC.
Q38
R2-
RIG
ROL
RPZ
SDF
SDG
SDP
SES
SEW
SPC
SPCBC
SST
SSV
SSZ
T5K
UHS
ZMT
~G-
AATTM
AAXKI
AAYWO
AAYXX
ABWVN
ACLOT
ACRPL
ACVFH
ADCNI
ADNMO
AEIPS
AEUPX
AFJKZ
AFPUW
AGQPQ
AIGII
AIIUN
AKBMS
AKRWK
AKYEP
ANKPU
APXCP
CITATION
EFKBS
~HD
ID FETCH-LOGICAL-c306t-554082e154a99e19298e0da3686c3e16eba10eae75f12b8e17f5059b65a300563
IEDL.DBID .~1
ISSN 1566-2535
IngestDate Thu Apr 24 23:08:14 EDT 2025
Wed Oct 29 21:11:58 EDT 2025
Fri Feb 23 02:46:47 EST 2024
IsPeerReviewed true
IsScholarly true
Keywords Big Data Analytics
Spark
Information fusion
Machine learning
MapReduce
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c306t-554082e154a99e19298e0da3686c3e16eba10eae75f12b8e17f5059b65a300563
PageCount 11
ParticipantIDs crossref_primary_10_1016_j_inffus_2017_10_001
crossref_citationtrail_10_1016_j_inffus_2017_10_001
elsevier_sciencedirect_doi_10_1016_j_inffus_2017_10_001
PublicationCentury 2000
PublicationDate July 2018
2018-07-00
PublicationDateYYYYMMDD 2018-07-01
PublicationDate_xml – month: 07
  year: 2018
  text: July 2018
PublicationDecade 2010
PublicationTitle Information fusion
PublicationYear 2018
Publisher Elsevier B.V
Publisher_xml – name: Elsevier B.V
References Tsai, Lai, Chiang, Yang (bib0031) 2014; 16
Ramírez-Gallego, Lastra, Martínez-Rego, Bolón-Canedo, Benítez, Herrera, Alonso-Betanzos (bib0070) 2017; 32
Hamstra, Karau, Zaharia, Konwinski, Wendell (bib0051) 2015
Chen, Chiang, Storey (bib0001) 2012; 36
Larson, Chang (bib0009) 2016; 36
D. Blog, Scalable decision trees in MLlib, 2017
Chen, Zhang (bib0004) 2014; 275
Minelli, Chambers, Dhiraj (bib0002) 2013
Meng, Bradley, Yavuz, Sparks, Venkataraman, Liu, Freeman, Tsai, Amde, Owen, Xin, Xin, Franklin, Zadeh, Zaharia, Talwalkar (bib0028) 2016; 17
Fernandez, Rio, Chawla, Herrera (bib0086) 2017; 3
Lam (bib0026) 2011
Zaharia, Chowdhury, Franklin, Shenker, Stoica (bib0029) 2010
Sun, Luo, Chen (bib0035) 2017; 36
Schapire (bib0075) 1999
Ramírez-Gallego, no Talín, Martínez-Rego, Bolón-Canedo, Benítez, Alonso-Betanzos, Herrera (bib0083) 2017; in press
Wu, Zhu, Wu, Ding (bib0011) 2014; 26
Zhang, Wu, Yang (bib0036) 2014
del Río, López, Benítez, Herrera (bib0038) 2014; 285
Lee, Lee, Choi, Chung, Moon (bib0020) 2011; 40
Owen, Anil, Dunning, Friedman (bib0023) 2011
A. Tez, Apache tez, 2017
Meng, Li, Zhang (bib0037) 2015
White (bib0027) 2015
Chen, Mao, Zhang, Leung (bib0017) 2014
Apache Flink Project, Peeking into Apache flink’s engine room, 2017
Abbasi, Sarker, Chiang (bib0013) 2016; 17
A. Mahout, Apache mahout, 2017
Gandomi, Haider (bib0015) 2015; 35
Galpert, Río, Herrera, Ancede-Gallardo, Antunes, Agero-Chapin (bib0032) 2015; 2015
P, Herbach, Basu, Bayardo, Inc (bib0053) 2009
The Apache Software Foundation (bib0024) 2017
A. Spark, Machine learning library (MLlib) guide, 2017
Davis (bib0006) 2017
García-Gil, Ramírez-Gallego, García, Herrera (bib0061) 2017; 2
Hwang, Chen (bib0005) 2017
.
Rokach (bib0074) 2016; 27
Choi, Chan, Yue (bib0010) 2017; 47
Kuncheva (bib0079) 2005; 6
Dean, Ghemawat (bib0019) 2010; 53
Zaharia, Chowdhury, Das, Dave, Ma, McCauley, Franklin, Shenker, Stoica (bib0030) 2012
del Río, López, Benítez, Herrera (bib0039) 2015; 8
A. YARN, Apache YARN, 2017
Fernández, Río, López, Bawakid, del Jesus, Benítez, Herrera (bib0003) 2014; 4
A.S. Foundation, Apache project directory, 2017
A. Spark, Apache spark: lightning-fast cluster computing, 2017
S. Packages, 3rd party spark packages, 2017
Buitinck, Louppe, Blondel, Pedregosa, Mueller, Grisel, Niculae, Prettenhofer, Gramfort, Grobler, Layton, VanderPlas, Joly, Holt, Varoquaux (bib0060) 2013
Valiant (bib0042) 1990; 33
Balazs, Velásquez (bib0034) 2016; 27
M. Sung, SIMD parallel processing michael sung 6.911: architectures anonymous, 2000.
Hu, Wen, Chua, Li (bib0016) 2014; 2
Zhao, Ma, He (bib0068) 2009; 5931
M. Lichman, UCI machine learning repository, 2013.
Fernandez, Carmona, del Jesus, Herrera (bib0078) 2016; 9
Palit, Reddy (bib0064) 2012; 24
D. Blog, Random forests and boosting in MLlib, 2017
Triguero, Derrac, García, Herrera (bib0077) 2012; 42
Dean, Ghemawat (bib0018) 2008; 51
Jaggi, Smith, Takác, Terhorst, Krishnan, Hofmann, Jordan (bib0059) 2014; abs/1409.1458
Alexandrov, Bergmann, Ewen, Freytag, Hueske, Heise, Kao, Leich, Leser, Markl, Naumann, Peters, Rheinlnder, Sax, Schelter, Hger, Tzoumas, Warneke (bib0057) 2014; 23
Schapire, Singer (bib0076) 1999; 37
Lyubimov, Palumbo (bib0025) 2016
Hastie, Tibshirani, Friedman (bib0080) 2011
H.D.F. System, Hadoop distributed file system, 2017
D. Harris, The history of Hadoop: from 4 nodes to the future of data, 2013
Wixom, Ariyachandra, Douglas, Goul, Gupta, Iyer, Kulkarni, Mooney, Phillips-Wren, Turetken (bib0012) 2014; 34
Triguero, Peralta, Bacardit, García, Herrera (bib0065) 2015; 150
Baldi, Sadowski, Whiteson (bib0084) 2014; 5
Polikar (bib0071) 2006; 6
Shvachko, Kuang, Radia, Chansler (bib0021) 2010
Krawczyk, Minku, Gama, Stefanowski, Woniak (bib0022) 2017; 37
Dean, Ghemawat (bib0043) 2004
Zaman Khan RZ (bib0041) 2013; 2
Liu, Nocedal (bib0082) 1989; 45
Ewen, Tzoumas, Kaufmann, Markl (bib0056) 2012; 5
Chen, Hao, Hwang, Wang, Wang (bib0033) 2017; 5
del Río, López, Benítez, Herrera (bib0062) 2015; 8
Chen (bib0014) 2016; 1
Hueske, Peters, Sax, Rheinlnder, Bergmann, Krettek, Tzoumas (bib0058) 2012; 5
López, Río, Benítez, Herrera (bib0063) 2015; 258
Assuncao, Fernandes, Lopes, Normey (bib0072) 2013
Orgaz, Jung, Camacho (bib0008) 2016; 28
Maillo, Ramírez-Gallego, Triguero, Herrera (bib0069) 2017; 117
Al-Fuqaha, Guizani, Mohammadi, Aledhari, Ayyash (bib0007) 2015; 17
Wang, Goh, Wong, Montana (bib0073) 2013; 14
A. Flink, Apache flink, 2017
Zaharia (10.1016/j.inffus.2017.10.001_sbref0030) 2012
Hu (10.1016/j.inffus.2017.10.001_bib0016) 2014; 2
Schapire (10.1016/j.inffus.2017.10.001_bib0075) 1999
Lyubimov (10.1016/j.inffus.2017.10.001_bib0025) 2016
10.1016/j.inffus.2017.10.001_bib0050
Abbasi (10.1016/j.inffus.2017.10.001_bib0013) 2016; 17
Chen (10.1016/j.inffus.2017.10.001_sbref0014) 2016; 1
10.1016/j.inffus.2017.10.001_bib0052
Minelli (10.1016/j.inffus.2017.10.001_bib0002) 2013
10.1016/j.inffus.2017.10.001_bib0054
10.1016/j.inffus.2017.10.001_bib0055
Jaggi (10.1016/j.inffus.2017.10.001_bib0059) 2014; abs/1409.1458
Zhang (10.1016/j.inffus.2017.10.001_bib0036) 2014
10.1016/j.inffus.2017.10.001_bib0047
P (10.1016/j.inffus.2017.10.001_bib0053) 2009
10.1016/j.inffus.2017.10.001_bib0048
Orgaz (10.1016/j.inffus.2017.10.001_bib0008) 2016; 28
10.1016/j.inffus.2017.10.001_bib0049
Krawczyk (10.1016/j.inffus.2017.10.001_bib0022) 2017; 37
Triguero (10.1016/j.inffus.2017.10.001_bib0065) 2015; 150
Dean (10.1016/j.inffus.2017.10.001_bib0018) 2008; 51
Schapire (10.1016/j.inffus.2017.10.001_bib0076) 1999; 37
Hueske (10.1016/j.inffus.2017.10.001_bib0058) 2012; 5
Hamstra (10.1016/j.inffus.2017.10.001_bib0051) 2015
Polikar (10.1016/j.inffus.2017.10.001_bib0071) 2006; 6
Wang (10.1016/j.inffus.2017.10.001_bib0073) 2013; 14
10.1016/j.inffus.2017.10.001_bib0081
10.1016/j.inffus.2017.10.001_bib0040
10.1016/j.inffus.2017.10.001_bib0085
Chen (10.1016/j.inffus.2017.10.001_bib0033) 2017; 5
10.1016/j.inffus.2017.10.001_bib0044
Hastie (10.1016/j.inffus.2017.10.001_bib0080) 2011
Chen (10.1016/j.inffus.2017.10.001_bib0017) 2014
10.1016/j.inffus.2017.10.001_bib0045
10.1016/j.inffus.2017.10.001_bib0046
Tsai (10.1016/j.inffus.2017.10.001_bib0031) 2014; 16
Buitinck (10.1016/j.inffus.2017.10.001_bib0060) 2013
Triguero (10.1016/j.inffus.2017.10.001_bib0077) 2012; 42
Zaharia (10.1016/j.inffus.2017.10.001_bib0029) 2010
Chen (10.1016/j.inffus.2017.10.001_bib0004) 2014; 275
Galpert (10.1016/j.inffus.2017.10.001_bib0032) 2015; 2015
Liu (10.1016/j.inffus.2017.10.001_bib0082) 1989; 45
Fernandez (10.1016/j.inffus.2017.10.001_bib0086) 2017; 3
Baldi (10.1016/j.inffus.2017.10.001_bib0084) 2014; 5
Alexandrov (10.1016/j.inffus.2017.10.001_bib0057) 2014; 23
Larson (10.1016/j.inffus.2017.10.001_bib0009) 2016; 36
Ewen (10.1016/j.inffus.2017.10.001_bib0056) 2012; 5
Choi (10.1016/j.inffus.2017.10.001_bib0010) 2017; 47
Owen (10.1016/j.inffus.2017.10.001_bib0023) 2011
Sun (10.1016/j.inffus.2017.10.001_bib0035) 2017; 36
Meng (10.1016/j.inffus.2017.10.001_bib0028) 2016; 17
del Río (10.1016/j.inffus.2017.10.001_bib0039) 2015; 8
López (10.1016/j.inffus.2017.10.001_bib0063) 2015; 258
White (10.1016/j.inffus.2017.10.001_bib0027) 2015
Kuncheva (10.1016/j.inffus.2017.10.001_bib0079) 2005; 6
García-Gil (10.1016/j.inffus.2017.10.001_bib0061) 2017; 2
Chen (10.1016/j.inffus.2017.10.001_bib0001) 2012; 36
Fernandez (10.1016/j.inffus.2017.10.001_bib0078) 2016; 9
Lee (10.1016/j.inffus.2017.10.001_bib0020) 2011; 40
Ramírez-Gallego (10.1016/j.inffus.2017.10.001_bib0070) 2017; 32
Meng (10.1016/j.inffus.2017.10.001_bib0037) 2015
Maillo (10.1016/j.inffus.2017.10.001_bib0069) 2017; 117
Davis (10.1016/j.inffus.2017.10.001_bib0006) 2017
Dean (10.1016/j.inffus.2017.10.001_bib0019) 2010; 53
Lam (10.1016/j.inffus.2017.10.001_bib0026) 2011
Valiant (10.1016/j.inffus.2017.10.001_bib0042) 1990; 33
Assuncao (10.1016/j.inffus.2017.10.001_bib0072) 2013
Gandomi (10.1016/j.inffus.2017.10.001_bib0015) 2015; 35
del Río (10.1016/j.inffus.2017.10.001_bib0038) 2014; 285
Zhao (10.1016/j.inffus.2017.10.001_bib0068) 2009; 5931
The Apache Software Foundation (10.1016/j.inffus.2017.10.001_bib0024) 2017
Dean (10.1016/j.inffus.2017.10.001_bib0043) 2004
10.1016/j.inffus.2017.10.001_bib0066
10.1016/j.inffus.2017.10.001_bib0067
Shvachko (10.1016/j.inffus.2017.10.001_bib0021) 2010
Fernández (10.1016/j.inffus.2017.10.001_bib0003) 2014; 4
Wu (10.1016/j.inffus.2017.10.001_bib0011) 2014; 26
Ramírez-Gallego (10.1016/j.inffus.2017.10.001_bib0083) 2017; in press
Hwang (10.1016/j.inffus.2017.10.001_bib0005) 2017
Balazs (10.1016/j.inffus.2017.10.001_bib0034) 2016; 27
Palit (10.1016/j.inffus.2017.10.001_bib0064) 2012; 24
Al-Fuqaha (10.1016/j.inffus.2017.10.001_bib0007) 2015; 17
Zaman Khan RZ (10.1016/j.inffus.2017.10.001_bib0041) 2013; 2
Rokach (10.1016/j.inffus.2017.10.001_bib0074) 2016; 27
Wixom (10.1016/j.inffus.2017.10.001_bib0012) 2014; 34
del Río (10.1016/j.inffus.2017.10.001_bib0062) 2015; 8
References_xml – volume: 17
  start-page: 1
  year: 2016
  end-page: 7
  ident: bib0028
  article-title: Mllib: machine learning in apache spark
  publication-title: J. Mach. Learn. Res.
– start-page: 425
  year: 2013
  end-page: 426
  ident: bib0072
  article-title: Distributed stochastic aware random forests - efficient data mining for big data
  publication-title: Proceedings - 2013 IEEE International Congress on Big Data, BigData 2013
– year: 2017
  ident: bib0005
  article-title: Big Data Analytics for Cloud, IoT and Cognitive Computing
– reference: A. Tez, Apache tez, 2017, (
– volume: 40
  start-page: 11
  year: 2011
  end-page: 20
  ident: bib0020
  article-title: Parallel data processing with mapreduce: a survey
  publication-title: SIGMOD Record
– volume: 36
  start-page: 700
  year: 2016
  end-page: 710
  ident: bib0009
  article-title: A review and future direction of agile, business intelligence, analytics and data science
  publication-title: Int. J. Inf. Manage.
– reference: ).
– volume: in press
  year: 2017
  ident: bib0083
  article-title: An information theoretic feature selection framework for big data under apache spark
  publication-title: IEEE Trans. Syst. Man Cybern.
– reference: H.D.F. System, Hadoop distributed file system, 2017, (
– start-page: 1
  year: 2010
  end-page: 7
  ident: bib0029
  article-title: Spark: cluster computing with working sets
  publication-title: HotCloud 2010
– volume: 8
  start-page: 422
  year: 2015
  end-page: 437
  ident: bib0039
  article-title: A mapreduce approach to address big data classification problems based on the fusion of linguistic fuzzy rules
  publication-title: Int. J. Comput. Intell. Syst.
– volume: 117
  start-page: 3
  year: 2017
  end-page: 15
  ident: bib0069
  article-title: Knn-is: an iterative spark-based design of the k-nearest neighbors classifier for big data.
  publication-title: Knowl. Based Syst.
– volume: 16
  start-page: 77
  year: 2014
  end-page: 97
  ident: bib0031
  article-title: Data mining for internet of things: a survey
  publication-title: IEEE Commun. Surv. Tut.
– volume: 23
  start-page: 939
  year: 2014
  end-page: 964
  ident: bib0057
  article-title: The stratosphere platform for big data analytics
  publication-title: Int. J. Very Large Databases
– volume: 51
  start-page: 107
  year: 2008
  end-page: 113
  ident: bib0018
  article-title: MapReduce: simplified data processing on large clusters
  publication-title: Commun. ACM
– volume: 150
  start-page: 331
  year: 2015
  end-page: 345
  ident: bib0065
  article-title: Mrpr: a mapreduce solution for prototype reduction in big data classification.
  publication-title: Neurocomputing
– reference: D. Blog, Random forests and boosting in MLlib, 2017, (
– year: 2017
  ident: bib0024
  article-title: Mahout, an open source project which includes scalable machine learning algorithms
– volume: 27
  start-page: 111
  year: 2016
  end-page: 125
  ident: bib0074
  article-title: Decision forest: twenty years of research
  publication-title: Inf. Fus.
– volume: 2
  start-page: 652
  year: 2014
  end-page: 687
  ident: bib0016
  article-title: Toward scalable systems for big data analytics: a technology tutorial
  publication-title: IEEE Access
– year: 2011
  ident: bib0026
  article-title: Hadoop in Action
– year: 2010
  ident: bib0021
  article-title: The hadoop distributed file system
  publication-title: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies, MSST2010
– reference: A. Spark, Machine learning library (MLlib) guide, 2017, (
– volume: 14
  year: 2013
  ident: bib0073
  article-title: Random forests on hadoop for genome-wide association studies of multivariate neuroimaging phenotypes
  publication-title: BMC Bioinfor.
– reference: A. Spark, Apache spark: lightning-fast cluster computing, 2017, (
– volume: 27
  start-page: 95
  year: 2016
  end-page: 110
  ident: bib0034
  article-title: Opinion mining and information fusion: a survey
  publication-title: Inf. Fus.
– volume: 8
  start-page: 422
  year: 2015
  end-page: 437
  ident: bib0062
  article-title: A mapreduce approach to address big data classification problems based on the fusion of linguistic fuzzy rules.
  publication-title: Int. J. Comput. Intell. Syst.
– volume: 37
  start-page: 297
  year: 1999
  end-page: 336
  ident: bib0076
  article-title: Improved boosting algorithms using confidence-rated predictions
  publication-title: Mach. Learn.
– volume: 5
  start-page: 1256
  year: 2012
  end-page: 1267
  ident: bib0058
  article-title: Opening the black boxes in data flow optimization
  publication-title: PVLDB
– year: 2017
  ident: bib0006
  article-title: BIG DATA and DATA ANALYTICS: The Beginner’s Guide to Understanding the Analytical World.
– volume: 45
  start-page: 503
  year: 1989
  end-page: 528
  ident: bib0082
  article-title: On the limited memory BFGS method for large scale optimization
  publication-title: Math. Program.
– volume: 37
  start-page: 132
  year: 2017
  end-page: 156
  ident: bib0022
  article-title: Ensemble learning for data stream analysis: a survey
  publication-title: Inf. Fus.
– volume: 258
  start-page: 5
  year: 2015
  end-page: 38
  ident: bib0063
  article-title: Cost-sensitive linguistic fuzzy rule based classification systems under the MapReduce framework for imbalanced big data
  publication-title: Fuzzy Sets Syst.
– volume: 5
  start-page: 4308
  year: 2014
  ident: bib0084
  article-title: Searching for exotic particles in high-Energy physics with deep learning
  publication-title: Nat. Commun.
– volume: 42
  start-page: 86
  year: 2012
  end-page: 100
  ident: bib0077
  article-title: A taxonomy and experimental study on prototype generation for nearest neighbor classification.
  publication-title: IEEE Trans. Syst. Man Cybern. Part C
– reference: A. Flink, Apache flink, 2017, (
– start-page: 1426
  year: 2009
  end-page: 1437
  ident: bib0053
  article-title: PLANET: massively parallel learning of tree ensembles with mapreduce
  publication-title: PVLDB
– volume: 36
  start-page: 1165
  year: 2012
  end-page: 1188
  ident: bib0001
  article-title: Business intelligence and analytics: from big data to big impact
  publication-title: MIS Q.
– volume: 33
  start-page: 103
  year: 1990
  end-page: 111
  ident: bib0042
  article-title: A bridging model for parallel computation
  publication-title: Commun. ACM
– volume: 17
  start-page: 2347
  year: 2015
  end-page: 2376
  ident: bib0007
  article-title: Internet of things: a survey on enabling technologies, protocols, and applications
  publication-title: IEEE Commun. Surv. Tutorials
– volume: 34
  start-page: 1
  year: 2014
  end-page: 13
  ident: bib0012
  article-title: The current state of business intelligence in academia: the arrival of big data
  publication-title: Commun. Assoc. Inf. Syst.
– start-page: 137
  year: 2004
  end-page: 150
  ident: bib0043
  article-title: Mapreduce: Simplified data processing on large clusters
  publication-title: In Proceedings of the 6th Symposium on Operating Systems Design and Implementation (OSDI)
– volume: 9
  start-page: 69
  year: 2016
  end-page: 80
  ident: bib0078
  article-title: A view on fuzzy systems for big data: progress and opportunities
  publication-title: Int. J. Comput. Intell. Systems
– volume: 1
  year: 2016
  ident: bib0014
  article-title: Welcome to the new interdisciplinary journal combining big data and cognitive computing
  publication-title: Big Data Cognit. Comput.
– reference: A. Mahout, Apache mahout, 2017, (
– start-page: 108
  year: 2013
  end-page: 122
  ident: bib0060
  article-title: API design for machine learning software: experiences from the scikit-learn project
  publication-title: ECML PKDD Workshop: Languages for Data Mining and Machine Learning
– start-page: 439
  year: 2014
  end-page: 443
  ident: bib0036
  article-title: Parallelization of ontology construction and fusion based on mapreduce
  publication-title: 2014 IEEE 3rd International Conference on Cloud Computing and Intelligence Systems
– volume: 275
  start-page: 314
  year: 2014
  end-page: 347
  ident: bib0004
  article-title: Data-intensive applications, challenges, techniques and technologies: a survey on big data
  publication-title: Inf. Sci.
– volume: 6
  start-page: 3
  year: 2005
  end-page: 4
  ident: bib0079
  article-title: Diversity in multiple classifier systems
  publication-title: Inf. Fus.
– volume: 28
  start-page: 45
  year: 2016
  end-page: 59
  ident: bib0008
  article-title: Social big data: recent achievements and new challenges
  publication-title: Inf. Fus.
– reference: A.S. Foundation, Apache project directory, 2017, (
– reference: D. Blog, Scalable decision trees in MLlib, 2017, (
– volume: 17
  start-page: 1
  year: 2016
  end-page: 32
  ident: bib0013
  article-title: Big data research in information systems: toward an inclusive research agenda
  publication-title: J. Assoc. Inf. Syst.
– year: 2015
  ident: bib0051
  article-title: Learning Spark: Lightning-Fast Big Data Analytics
– reference: A. YARN, Apache YARN, 2017, (
– start-page: 1539
  year: 2015
  end-page: 1544
  ident: bib0037
  article-title: Parallel information fusion method for microarray data analysis
  publication-title: 2015 IEEE International Conference on Big Data (Big Data)
– year: 2015
  ident: bib0027
  article-title: Hadoop: The Definitive Guide
– volume: 2015
  start-page: 748681:1
  year: 2015
  end-page: 748681:12
  ident: bib0032
  article-title: An effective big data supervised imbalanced classification approach for ortholog detection in related yeast species
  publication-title: Biomed. Res. Int.
– volume: 6
  start-page: 21
  year: 2006
  end-page: 45
  ident: bib0071
  article-title: Ensemble based systems in decision making
  publication-title: IEEE Circuits Syst. Mag.
– year: 2014
  ident: bib0017
  article-title: Big Data - Related Technologies, Challenges and Future Prospects
  publication-title: Springer briefs in computer science
– volume: 285
  start-page: 112
  year: 2014
  end-page: 137
  ident: bib0038
  article-title: On the use of mapreduce for imbalanced big data using random forest
  publication-title: Inf. Sci. (Ny)
– volume: 32
  start-page: 134
  year: 2017
  end-page: 152
  ident: bib0070
  article-title: Fast-mrmr: fast minimum redundancy maximum relevance algorithm for high-dimensional big data.
  publication-title: Int. J. Intell. Syst.
– volume: 47
  start-page: 81
  year: 2017
  end-page: 92
  ident: bib0010
  article-title: Recent development in big data analytics for business operations and risk management
  publication-title: IEEE Trans. Cybern.
– volume: 24
  start-page: 1904
  year: 2012
  end-page: 1916
  ident: bib0064
  article-title: Scalable and parallel boosting with mapreduce
  publication-title: IEEE Trans. Knowl. Data Eng.
– year: 2013
  ident: bib0002
  article-title: Big Data, Big Analytics: Emerging Business Intelligence and Analytic Trends for Today’s Businesses
– year: 2011
  ident: bib0023
  article-title: Mahout in Action
– volume: 5931
  start-page: 674
  year: 2009
  end-page: 679
  ident: bib0068
  article-title: Parallel k-means clustering based on mapreduce
  publication-title: CloudCom 2009
– volume: abs/1409.1458
  year: 2014
  ident: bib0059
  article-title: Communication-efficient distributed dual coordinate ascent
  publication-title: CoRR
– volume: 4
  start-page: 380
  year: 2014
  end-page: 409
  ident: bib0003
  article-title: Big data with cloud computing: an insight on the computing environment, mapreduce and programming framework
  publication-title: WIREs Data Min. Knowl. Discovery
– volume: 53
  start-page: 72
  year: 2010
  end-page: 77
  ident: bib0019
  article-title: MapReduce: a flexible data processing tool
  publication-title: Commun. ACM
– reference: D. Harris, The history of Hadoop: from 4 nodes to the future of data, 2013, (
– year: 2011
  ident: bib0080
  article-title: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition
  publication-title: Springer series in statistics
– volume: 36
  start-page: 10
  year: 2017
  end-page: 25
  ident: bib0035
  article-title: A review of natural language processing techniques for opinion mining systems
  publication-title: Inf. Fus.
– volume: 2
  start-page: 1
  year: 2017
  ident: bib0061
  article-title: A comparison on scalability for batch big data processing on apache spark and apache flink
  publication-title: Big Data Anal.
– start-page: 1401
  year: 1999
  end-page: 1406
  ident: bib0075
  article-title: A brief introduction to boosting
  publication-title: IJCAI
– volume: 2
  start-page: 81
  year: 2013
  end-page: 85
  ident: bib0041
  article-title: Use of DAG in distributed parallel computing
  publication-title: Int. J. Appl. Innov. Eng. Manage.
– year: 2012
  ident: bib0030
  article-title: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing
  publication-title: Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, NSDI’12
– volume: 5
  start-page: 8869
  year: 2017
  end-page: 8879
  ident: bib0033
  article-title: Disease prediction by machine learning over big data from healthcare communities.
  publication-title: IEEE Access
– volume: 35
  start-page: 137
  year: 2015
  end-page: 144
  ident: bib0015
  article-title: Beyond the hype: big data concepts, methods, and analytics
  publication-title: Int. J. Inf. Manage.
– reference: M. Sung, SIMD parallel processing michael sung 6.911: architectures anonymous, 2000.
– reference: S. Packages, 3rd party spark packages, 2017, (
– volume: 26
  start-page: 97
  year: 2014
  end-page: 107
  ident: bib0011
  article-title: Data mining with big data
  publication-title: Knowl. Data Eng. IEEE Trans.
– year: 2016
  ident: bib0025
  article-title: Apache Mahout: Beyond MapReduce
– volume: 5
  start-page: 1268
  year: 2012
  end-page: 1279
  ident: bib0056
  article-title: Spinning fast iterative data flows
  publication-title: PVLDB
– reference: M. Lichman, UCI machine learning repository, 2013.
– reference: Apache Flink Project, Peeking into Apache flink’s engine room, 2017, (
– volume: 3
  start-page: 105
  year: 2017
  end-page: 120
  ident: bib0086
  article-title: An insight into imbalanced big data classification: outcomes and challenges
  publication-title: Complex Intell. Syst.
– volume: 6
  start-page: 3
  issue: 1
  year: 2005
  ident: 10.1016/j.inffus.2017.10.001_bib0079
  article-title: Diversity in multiple classifier systems
  publication-title: Inf. Fus.
  doi: 10.1016/j.inffus.2004.04.009
– volume: 17
  start-page: 2347
  issue: 4
  year: 2015
  ident: 10.1016/j.inffus.2017.10.001_bib0007
  article-title: Internet of things: a survey on enabling technologies, protocols, and applications
  publication-title: IEEE Commun. Surv. Tutorials
  doi: 10.1109/COMST.2015.2444095
– year: 2013
  ident: 10.1016/j.inffus.2017.10.001_bib0002
– year: 2015
  ident: 10.1016/j.inffus.2017.10.001_bib0027
– volume: 42
  start-page: 86
  issue: 1
  year: 2012
  ident: 10.1016/j.inffus.2017.10.001_bib0077
  article-title: A taxonomy and experimental study on prototype generation for nearest neighbor classification.
  publication-title: IEEE Trans. Syst. Man Cybern. Part C
  doi: 10.1109/TSMCC.2010.2103939
– volume: 5
  start-page: 8869
  year: 2017
  ident: 10.1016/j.inffus.2017.10.001_bib0033
  article-title: Disease prediction by machine learning over big data from healthcare communities.
  publication-title: IEEE Access
  doi: 10.1109/ACCESS.2017.2694446
– year: 2010
  ident: 10.1016/j.inffus.2017.10.001_bib0021
  article-title: The hadoop distributed file system
– volume: 32
  start-page: 134
  issue: 2
  year: 2017
  ident: 10.1016/j.inffus.2017.10.001_bib0070
  article-title: Fast-mrmr: fast minimum redundancy maximum relevance algorithm for high-dimensional big data.
  publication-title: Int. J. Intell. Syst.
  doi: 10.1002/int.21833
– volume: 4
  start-page: 380
  issue: 5
  year: 2014
  ident: 10.1016/j.inffus.2017.10.001_bib0003
  article-title: Big data with cloud computing: an insight on the computing environment, mapreduce and programming framework
  publication-title: WIREs Data Min. Knowl. Discovery
  doi: 10.1002/widm.1134
– volume: 40
  start-page: 11
  issue: 4
  year: 2011
  ident: 10.1016/j.inffus.2017.10.001_bib0020
  article-title: Parallel data processing with mapreduce: a survey
  publication-title: SIGMOD Record
  doi: 10.1145/2094114.2094118
– ident: 10.1016/j.inffus.2017.10.001_bib0085
– volume: 37
  start-page: 297
  issue: 3
  year: 1999
  ident: 10.1016/j.inffus.2017.10.001_bib0076
  article-title: Improved boosting algorithms using confidence-rated predictions
  publication-title: Mach. Learn.
  doi: 10.1023/A:1007614523901
– volume: 2
  start-page: 652
  year: 2014
  ident: 10.1016/j.inffus.2017.10.001_bib0016
  article-title: Toward scalable systems for big data analytics: a technology tutorial
  publication-title: IEEE Access
  doi: 10.1109/ACCESS.2014.2332453
– ident: 10.1016/j.inffus.2017.10.001_bib0047
– ident: 10.1016/j.inffus.2017.10.001_bib0081
– ident: 10.1016/j.inffus.2017.10.001_bib0066
– volume: 6
  start-page: 21
  issue: 3
  year: 2006
  ident: 10.1016/j.inffus.2017.10.001_bib0071
  article-title: Ensemble based systems in decision making
  publication-title: IEEE Circuits Syst. Mag.
  doi: 10.1109/MCAS.2006.1688199
– volume: abs/1409.1458
  year: 2014
  ident: 10.1016/j.inffus.2017.10.001_bib0059
  article-title: Communication-efficient distributed dual coordinate ascent
  publication-title: CoRR
– volume: 14
  issue: Suppl 16
  year: 2013
  ident: 10.1016/j.inffus.2017.10.001_bib0073
  article-title: Random forests on hadoop for genome-wide association studies of multivariate neuroimaging phenotypes
  publication-title: BMC Bioinfor.
– volume: 285
  start-page: 112
  year: 2014
  ident: 10.1016/j.inffus.2017.10.001_bib0038
  article-title: On the use of mapreduce for imbalanced big data using random forest
  publication-title: Inf. Sci. (Ny)
  doi: 10.1016/j.ins.2014.03.043
– volume: 5
  start-page: 1256
  year: 2012
  ident: 10.1016/j.inffus.2017.10.001_bib0058
  article-title: Opening the black boxes in data flow optimization
  publication-title: PVLDB
– volume: 51
  start-page: 107
  issue: 1
  year: 2008
  ident: 10.1016/j.inffus.2017.10.001_bib0018
  article-title: MapReduce: simplified data processing on large clusters
  publication-title: Commun. ACM
  doi: 10.1145/1327452.1327492
– volume: 8
  start-page: 422
  issue: 3
  year: 2015
  ident: 10.1016/j.inffus.2017.10.001_bib0062
  article-title: A mapreduce approach to address big data classification problems based on the fusion of linguistic fuzzy rules.
  publication-title: Int. J. Comput. Intell. Syst.
  doi: 10.1080/18756891.2015.1017377
– year: 2017
  ident: 10.1016/j.inffus.2017.10.001_bib0024
– ident: 10.1016/j.inffus.2017.10.001_bib0052
– year: 2011
  ident: 10.1016/j.inffus.2017.10.001_bib0026
– volume: 36
  start-page: 10
  year: 2017
  ident: 10.1016/j.inffus.2017.10.001_bib0035
  article-title: A review of natural language processing techniques for opinion mining systems
  publication-title: Inf. Fus.
  doi: 10.1016/j.inffus.2016.10.004
– volume: 8
  start-page: 422
  issue: 3
  year: 2015
  ident: 10.1016/j.inffus.2017.10.001_bib0039
  article-title: A mapreduce approach to address big data classification problems based on the fusion of linguistic fuzzy rules
  publication-title: Int. J. Comput. Intell. Syst.
  doi: 10.1080/18756891.2015.1017377
– year: 2016
  ident: 10.1016/j.inffus.2017.10.001_bib0025
– year: 2011
  ident: 10.1016/j.inffus.2017.10.001_bib0080
  article-title: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition
– year: 2014
  ident: 10.1016/j.inffus.2017.10.001_bib0017
  article-title: Big Data - Related Technologies, Challenges and Future Prospects
– volume: 23
  start-page: 939
  issue: 6
  year: 2014
  ident: 10.1016/j.inffus.2017.10.001_bib0057
  article-title: The stratosphere platform for big data analytics
  publication-title: Int. J. Very Large Databases
  doi: 10.1007/s00778-014-0357-y
– volume: 36
  start-page: 700
  issue: 5
  year: 2016
  ident: 10.1016/j.inffus.2017.10.001_bib0009
  article-title: A review and future direction of agile, business intelligence, analytics and data science
  publication-title: Int. J. Inf. Manage.
– volume: 258
  start-page: 5
  year: 2015
  ident: 10.1016/j.inffus.2017.10.001_bib0063
  article-title: Cost-sensitive linguistic fuzzy rule based classification systems under the MapReduce framework for imbalanced big data
  publication-title: Fuzzy Sets Syst.
  doi: 10.1016/j.fss.2014.01.015
– volume: 2015
  start-page: 748681:1
  year: 2015
  ident: 10.1016/j.inffus.2017.10.001_bib0032
  article-title: An effective big data supervised imbalanced classification approach for ortholog detection in related yeast species
  publication-title: Biomed. Res. Int.
  doi: 10.1155/2015/748681
– ident: 10.1016/j.inffus.2017.10.001_bib0046
– start-page: 1
  year: 2010
  ident: 10.1016/j.inffus.2017.10.001_bib0029
  article-title: Spark: cluster computing with working sets
– volume: 47
  start-page: 81
  issue: 1
  year: 2017
  ident: 10.1016/j.inffus.2017.10.001_bib0010
  article-title: Recent development in big data analytics for business operations and risk management
  publication-title: IEEE Trans. Cybern.
  doi: 10.1109/TCYB.2015.2507599
– ident: 10.1016/j.inffus.2017.10.001_bib0067
– volume: 27
  start-page: 95
  year: 2016
  ident: 10.1016/j.inffus.2017.10.001_bib0034
  article-title: Opinion mining and information fusion: a survey
  publication-title: Inf. Fus.
  doi: 10.1016/j.inffus.2015.06.002
– start-page: 1426
  year: 2009
  ident: 10.1016/j.inffus.2017.10.001_bib0053
  article-title: PLANET: massively parallel learning of tree ensembles with mapreduce
  publication-title: PVLDB
– volume: 275
  start-page: 314
  year: 2014
  ident: 10.1016/j.inffus.2017.10.001_bib0004
  article-title: Data-intensive applications, challenges, techniques and technologies: a survey on big data
  publication-title: Inf. Sci.
  doi: 10.1016/j.ins.2014.01.015
– volume: 37
  start-page: 132
  year: 2017
  ident: 10.1016/j.inffus.2017.10.001_bib0022
  article-title: Ensemble learning for data stream analysis: a survey
  publication-title: Inf. Fus.
  doi: 10.1016/j.inffus.2017.02.004
– start-page: 1539
  year: 2015
  ident: 10.1016/j.inffus.2017.10.001_bib0037
  article-title: Parallel information fusion method for microarray data analysis
– volume: 5
  start-page: 4308
  year: 2014
  ident: 10.1016/j.inffus.2017.10.001_bib0084
  article-title: Searching for exotic particles in high-Energy physics with deep learning
  publication-title: Nat. Commun.
  doi: 10.1038/ncomms5308
– volume: 28
  start-page: 45
  year: 2016
  ident: 10.1016/j.inffus.2017.10.001_bib0008
  article-title: Social big data: recent achievements and new challenges
  publication-title: Inf. Fus.
  doi: 10.1016/j.inffus.2015.08.005
– volume: 33
  start-page: 103
  issue: 8
  year: 1990
  ident: 10.1016/j.inffus.2017.10.001_bib0042
  article-title: A bridging model for parallel computation
  publication-title: Commun. ACM
  doi: 10.1145/79173.79181
– start-page: 137
  year: 2004
  ident: 10.1016/j.inffus.2017.10.001_bib0043
  article-title: Mapreduce: Simplified data processing on large clusters
– ident: 10.1016/j.inffus.2017.10.001_bib0054
– volume: 53
  start-page: 72
  issue: 1
  year: 2010
  ident: 10.1016/j.inffus.2017.10.001_bib0019
  article-title: MapReduce: a flexible data processing tool
  publication-title: Commun. ACM
  doi: 10.1145/1629175.1629198
– year: 2012
  ident: 10.1016/j.inffus.2017.10.001_sbref0030
  article-title: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing
– volume: 16
  start-page: 77
  issue: 1
  year: 2014
  ident: 10.1016/j.inffus.2017.10.001_bib0031
  article-title: Data mining for internet of things: a survey
  publication-title: IEEE Commun. Surv. Tut.
  doi: 10.1109/SURV.2013.103013.00206
– volume: in press
  year: 2017
  ident: 10.1016/j.inffus.2017.10.001_bib0083
  article-title: An information theoretic feature selection framework for big data under apache spark
  publication-title: IEEE Trans. Syst. Man Cybern.
– year: 2017
  ident: 10.1016/j.inffus.2017.10.001_bib0005
– start-page: 425
  year: 2013
  ident: 10.1016/j.inffus.2017.10.001_bib0072
  article-title: Distributed stochastic aware random forests - efficient data mining for big data
– volume: 35
  start-page: 137
  issue: 2
  year: 2015
  ident: 10.1016/j.inffus.2017.10.001_bib0015
  article-title: Beyond the hype: big data concepts, methods, and analytics
  publication-title: Int. J. Inf. Manage.
– volume: 5
  start-page: 1268
  issue: 11
  year: 2012
  ident: 10.1016/j.inffus.2017.10.001_bib0056
  article-title: Spinning fast iterative data flows
  publication-title: PVLDB
– volume: 9
  start-page: 69
  issue: 1
  year: 2016
  ident: 10.1016/j.inffus.2017.10.001_bib0078
  article-title: A view on fuzzy systems for big data: progress and opportunities
  publication-title: Int. J. Comput. Intell. Systems
  doi: 10.1080/18756891.2016.1180820
– ident: 10.1016/j.inffus.2017.10.001_bib0045
– volume: 17
  start-page: 1
  issue: 34
  year: 2016
  ident: 10.1016/j.inffus.2017.10.001_bib0028
  article-title: Mllib: machine learning in apache spark
  publication-title: J. Mach. Learn. Res.
– volume: 17
  start-page: 1
  issue: 2
  year: 2016
  ident: 10.1016/j.inffus.2017.10.001_bib0013
  article-title: Big data research in information systems: toward an inclusive research agenda
  publication-title: J. Assoc. Inf. Syst.
– ident: 10.1016/j.inffus.2017.10.001_bib0049
– ident: 10.1016/j.inffus.2017.10.001_bib0044
– start-page: 439
  year: 2014
  ident: 10.1016/j.inffus.2017.10.001_bib0036
  article-title: Parallelization of ontology construction and fusion based on mapreduce
– ident: 10.1016/j.inffus.2017.10.001_bib0040
– volume: 36
  start-page: 1165
  issue: 4
  year: 2012
  ident: 10.1016/j.inffus.2017.10.001_bib0001
  article-title: Business intelligence and analytics: from big data to big impact
  publication-title: MIS Q.
  doi: 10.2307/41703503
– volume: 26
  start-page: 97
  issue: 1
  year: 2014
  ident: 10.1016/j.inffus.2017.10.001_bib0011
  article-title: Data mining with big data
  publication-title: Knowl. Data Eng. IEEE Trans.
  doi: 10.1109/TKDE.2013.109
– start-page: 1401
  year: 1999
  ident: 10.1016/j.inffus.2017.10.001_bib0075
  article-title: A brief introduction to boosting
– volume: 3
  start-page: 105
  issue: 2
  year: 2017
  ident: 10.1016/j.inffus.2017.10.001_bib0086
  article-title: An insight into imbalanced big data classification: outcomes and challenges
  publication-title: Complex Intell. Syst.
  doi: 10.1007/s40747-017-0037-9
– year: 2015
  ident: 10.1016/j.inffus.2017.10.001_bib0051
– year: 2017
  ident: 10.1016/j.inffus.2017.10.001_bib0006
– volume: 150
  start-page: 331
  year: 2015
  ident: 10.1016/j.inffus.2017.10.001_bib0065
  article-title: Mrpr: a mapreduce solution for prototype reduction in big data classification.
  publication-title: Neurocomputing
  doi: 10.1016/j.neucom.2014.04.078
– ident: 10.1016/j.inffus.2017.10.001_bib0050
– year: 2011
  ident: 10.1016/j.inffus.2017.10.001_bib0023
– volume: 1
  issue: 1
  year: 2016
  ident: 10.1016/j.inffus.2017.10.001_sbref0014
  article-title: Welcome to the new interdisciplinary journal combining big data and cognitive computing
  publication-title: Big Data Cognit. Comput.
  doi: 10.3390/bdcc1010001
– volume: 2
  start-page: 81
  issue: 11
  year: 2013
  ident: 10.1016/j.inffus.2017.10.001_bib0041
  article-title: Use of DAG in distributed parallel computing
  publication-title: Int. J. Appl. Innov. Eng. Manage.
– ident: 10.1016/j.inffus.2017.10.001_bib0055
– start-page: 108
  year: 2013
  ident: 10.1016/j.inffus.2017.10.001_bib0060
  article-title: API design for machine learning software: experiences from the scikit-learn project
– volume: 5931
  start-page: 674
  year: 2009
  ident: 10.1016/j.inffus.2017.10.001_bib0068
  article-title: Parallel k-means clustering based on mapreduce
– volume: 45
  start-page: 503
  issue: 3
  year: 1989
  ident: 10.1016/j.inffus.2017.10.001_bib0082
  article-title: On the limited memory BFGS method for large scale optimization
  publication-title: Math. Program.
  doi: 10.1007/BF01589116
– volume: 24
  start-page: 1904
  issue: 10
  year: 2012
  ident: 10.1016/j.inffus.2017.10.001_bib0064
  article-title: Scalable and parallel boosting with mapreduce
  publication-title: IEEE Trans. Knowl. Data Eng.
  doi: 10.1109/TKDE.2011.208
– volume: 34
  start-page: 1
  issue: 1
  year: 2014
  ident: 10.1016/j.inffus.2017.10.001_bib0012
  article-title: The current state of business intelligence in academia: the arrival of big data
  publication-title: Commun. Assoc. Inf. Syst.
– ident: 10.1016/j.inffus.2017.10.001_bib0048
– volume: 117
  start-page: 3
  year: 2017
  ident: 10.1016/j.inffus.2017.10.001_bib0069
  article-title: Knn-is: an iterative spark-based design of the k-nearest neighbors classifier for big data.
  publication-title: Knowl. Based Syst.
  doi: 10.1016/j.knosys.2016.06.012
– volume: 27
  start-page: 111
  year: 2016
  ident: 10.1016/j.inffus.2017.10.001_bib0074
  article-title: Decision forest: twenty years of research
  publication-title: Inf. Fus.
  doi: 10.1016/j.inffus.2015.06.005
– volume: 2
  start-page: 1
  issue: 1
  year: 2017
  ident: 10.1016/j.inffus.2017.10.001_bib0061
  article-title: A comparison on scalability for batch big data processing on apache spark and apache flink
  publication-title: Big Data Anal.
  doi: 10.1186/s41044-016-0020-2
SSID ssj0017031
Score 2.5081642
Snippet •An overview of the technologies for Big Data analytics is presented.•A taxonomy for the design of information and process fusion in Big Data is...
SourceID crossref
elsevier
SourceType Enrichment Source
Index Database
Publisher
StartPage 51
SubjectTerms Big Data Analytics
Information fusion
Machine learning
MapReduce
Spark
Title Big Data: Tutorial and guidelines on information and process fusion for analytics algorithms with MapReduce
URI https://dx.doi.org/10.1016/j.inffus.2017.10.001
Volume 42
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVESC
  databaseName: Baden-Württemberg Complete Freedom Collection (Elsevier)
  customDbUrl:
  eissn: 1872-6305
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0017031
  issn: 1566-2535
  databaseCode: GBLVA
  dateStart: 20110101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
– providerCode: PRVESC
  databaseName: Elsevier ScienceDirect Freedom Collection Journals
  customDbUrl:
  eissn: 1872-6305
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0017031
  issn: 1566-2535
  databaseCode: ACRLP
  dateStart: 20000701
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
– providerCode: PRVESC
  databaseName: Science Direct
  customDbUrl:
  eissn: 1872-6305
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0017031
  issn: 1566-2535
  databaseCode: .~1
  dateStart: 20000701
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
– providerCode: PRVESC
  databaseName: ScienceDirect Journal Collection
  customDbUrl:
  eissn: 1872-6305
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0017031
  issn: 1566-2535
  databaseCode: AIKHN
  dateStart: 20000701
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
– providerCode: PRVLSH
  databaseName: Elsevier Journals
  customDbUrl:
  mediaType: online
  eissn: 1872-6305
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0017031
  issn: 1566-2535
  databaseCode: AKRWK
  dateStart: 20000701
  isFulltext: true
  providerName: Library Specific Holdings
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LT8JAEN4QvOjB-Iz4IHvwWuj2sW29IUrwATEICbdmd7sgiqWRcvDib3emD6KJ0cRT09lHNrPTeTQz3xByrlz4mqVUBmPSMhzwSEEPCm5YTEXcdCaKR1g73Ovz7si5HbvjCmmXtTCYVlno_lynZ9q6oDQLbjaT2az5iJGH5douAyF1g6zTsON42MWg8bFO82CIz55hpnI4AMwuy-eyHC-4xMkKQbuZ18hyvNjP5umLyenskO3CV6St_Di7pKLjPbLVWwOtLvfJy-VsSq9EKi7oENEIQJqoiCM6XSF8Faa000VMC3RUvINsNMmrAyicCkkwCGQxf8c9qZhPYZ_06XVJ8Sct7YlkgPiu-oCMOtfDdtco-icYCgKB1ABPAQy8BidJBIEGVy7wtRkJm_tc2ZpxLQUztdCeO2GW9DXzJuAPBZK7AkHsuX1IqvEi1keEmgoCi8ALIrD_jo4s4cjIlwEit8jI1qpG7JJtoSrAxbHHxTwss8iew5zZITIbqcDsGjHWq5IcXOOP-V55I-E3IQlB__-68vjfK0_IJrz5eYbuKammbyt9Bn5IKuuZoNXJRqs9uH_A581dt_8JWSzf-w
linkProvider Elsevier
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV07T8MwELZQGYAB8RTl6YE1bZyHk7BBoSrQdIBW6hbZjlsKJY1oOrDw27nLowIJgcR6fsg6X-6-i-4-E3KuXPiapVQGY9IyHECk4AcFNyymYm46I8Vj7B0Oe7wzcO6G7nCFtKpeGCyrLH1_4dNzb11KmqU2m-lk0nzEzMNybZeBkboBvjS86riWhxlY42NZ58GQoD0nTeVwAphe9c_lRV5wi6MFsnYzr5EXebGf49OXmNPeIpslWKSXxXm2yYpOdshGuGRane-Sl6vJmF6LTFzQPtIRgDlRkcR0vED-Kqxpp7OElvSoeAn5aFq0B1A4FYpgEMRi-o57UjEdwz7Z0-uc4l9aGor0AQle9R4ZtG_6rY5RPqBgKMgEMgOgAkR4DShJBIEGLBf42oyFzX2ubM24loKZWmjPHTFL-pp5IwBEgeSuQBZ7bu-TWjJL9AGhpoLMIvCCGACAo2NLODL2ZYDULTK2taoTu1JbpEp2cXzkYhpVZWTPUaHsCJWNUlB2nRjLVWnBrvHHfK-6keiblUQQAH5defjvlWdkrdMPu1H3tnd_RNZhxC_KdY9JLXtb6BMAJZk8zY3uE6FA3_s
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Big+Data%3A+Tutorial+and+guidelines+on+information+and+process+fusion+for+analytics+algorithms+with+MapReduce&rft.jtitle=Information+fusion&rft.au=Ram%C3%ADrez-Gallego%2C+Sergio&rft.au=Fern%C3%A1ndez%2C+Alberto&rft.au=Garc%C3%ADa%2C+Salvador&rft.au=Chen%2C+Min&rft.date=2018-07-01&rft.pub=Elsevier+B.V&rft.issn=1566-2535&rft.eissn=1872-6305&rft.volume=42&rft.spage=51&rft.epage=61&rft_id=info:doi/10.1016%2Fj.inffus.2017.10.001&rft.externalDocID=S1566253517305912
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1566-2535&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1566-2535&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1566-2535&client=summon