Big Data: Tutorial and guidelines on information and process fusion for analytics algorithms with MapReduce

•An overview of the technologies for Big Data analytics is presented.•A taxonomy for the design of information and process fusion in Big Data is introduced.•The scalability of the alternative approaches of such fusion of information/models is analyzed.•Several guidelines are given for future study o...

Full description

Saved in:

Bibliographic Details
Published in	Information fusion Vol. 42; pp. 51 - 61
Main Authors	Ramírez-Gallego, Sergio, Fernández, Alberto, García, Salvador, Chen, Min, Herrera, Francisco
Format	Journal Article
Language	English
Published	Elsevier B.V 01.07.2018
Subjects	Big Data Analytics Information fusion Machine learning MapReduce Spark Big Data Analytics Spark Information fusion Machine learning MapReduce
Online Access	Get full text
ISSN	1566-2535 1872-6305
DOI	10.1016/j.inffus.2017.10.001

Cover

Abstract	•An overview of the technologies for Big Data analytics is presented.•A taxonomy for the design of information and process fusion in Big Data is introduced.•The scalability of the alternative approaches of such fusion of information/models is analyzed.•Several guidelines are given for future study on the topic. We live in a world were data are generated from a myriad of sources, and it is really cheap to collect and storage such data. However, the real benefit is not related to the data itself, but with the algorithms that are capable of processing such data in a tolerable elapse time, and to extract valuable knowledge from it. Therefore, the use of Big Data Analytics tools provide very significant advantages to both industry and academia. The MapReduce programming framework can be stressed as the main paradigm related with such tools. It is mainly identified by carrying out a distributed execution for the sake of providing a high degree of scalability, together with a fault-tolerant scheme. In every MapReduce algorithm, first local models are learned with a subset of the original data within the so-called Map tasks. Then, the Reduce task is devoted to fuse the partial outputs generated by each Map. The ways of designing such fusion of information/models may have a strong impact in the quality of the final system. In this work, we will enumerate and analyze two alternative methodologies that may be found both in the specialized literature and in standard Machine Learning libraries for Big Data. Our main objective is to provide an introduction of the characteristics of these methodologies, as well as giving some guidelines for the design of novel algorithms in this field of research. Finally, a short experimental study will allow us to contrast the scalability issues for each type of process fusion in MapReduce for Big Data Analytics.
AbstractList	•An overview of the technologies for Big Data analytics is presented.•A taxonomy for the design of information and process fusion in Big Data is introduced.•The scalability of the alternative approaches of such fusion of information/models is analyzed.•Several guidelines are given for future study on the topic. We live in a world were data are generated from a myriad of sources, and it is really cheap to collect and storage such data. However, the real benefit is not related to the data itself, but with the algorithms that are capable of processing such data in a tolerable elapse time, and to extract valuable knowledge from it. Therefore, the use of Big Data Analytics tools provide very significant advantages to both industry and academia. The MapReduce programming framework can be stressed as the main paradigm related with such tools. It is mainly identified by carrying out a distributed execution for the sake of providing a high degree of scalability, together with a fault-tolerant scheme. In every MapReduce algorithm, first local models are learned with a subset of the original data within the so-called Map tasks. Then, the Reduce task is devoted to fuse the partial outputs generated by each Map. The ways of designing such fusion of information/models may have a strong impact in the quality of the final system. In this work, we will enumerate and analyze two alternative methodologies that may be found both in the specialized literature and in standard Machine Learning libraries for Big Data. Our main objective is to provide an introduction of the characteristics of these methodologies, as well as giving some guidelines for the design of novel algorithms in this field of research. Finally, a short experimental study will allow us to contrast the scalability issues for each type of process fusion in MapReduce for Big Data Analytics.
Author	García, Salvador Chen, Min Herrera, Francisco Ramírez-Gallego, Sergio Fernández, Alberto
Author_xml	– sequence: 1 givenname: Sergio surname: Ramírez-Gallego fullname: Ramírez-Gallego, Sergio email: sramirez@decsai.ugr.es organization: Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain – sequence: 2 givenname: Alberto surname: Fernández fullname: Fernández, Alberto email: alberto@decsai.ugr.es organization: Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain – sequence: 3 givenname: Salvador surname: García fullname: García, Salvador email: salvagl@decsai.ugr.es organization: Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain – sequence: 4 givenname: Min surname: Chen fullname: Chen, Min email: minchen2012@hust.edu.cn organization: School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China – sequence: 5 givenname: Francisco surname: Herrera fullname: Herrera, Francisco email: herrera@decsai.ugr.es organization: Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain
BookMark	eNqFUMtOwzAQtFCRaAt_wME_kGLHtZP0gATlKRUhoXK2ts6muKRJZTug_j0O5cQBTjOa3ZnVzogMmrZBQs45m3DG1cVmYpuq6vwkZTyL0oQxfkSGPM_SRAkmB5FLpZJUCnlCRt5v4kLGBB-S92u7pjcQYEaXXWidhZpCU9J1Z0usbYOetg2N8a3bQrCR99Odaw16T-PNXorDKEO9D9Z4CvU65oS3raefEegT7F6w7AyekuMKao9nPzgmr3e3y_lDsni-f5xfLRIjmAqJlFOWp8jlFIoCeZEWObIShMqVEcgVroAzBMxkxdNVjjyrJJPFSkkQjEklxmR6yDWu9d5hpXfObsHtNWe6L0xv9KEw3RfWq7GPaJv9shkbvn8ODmz9n_nyYMb42IdFp72x2BgsrUMTdNnavwO-AIkojM4
CitedBy_id	crossref_primary_10_1016_j_inffus_2019_01_001 crossref_primary_10_1016_j_inffus_2019_12_012 crossref_primary_10_1016_j_inffus_2023_02_023 crossref_primary_10_1186_s13673_019_0190_9 crossref_primary_10_1016_j_neucom_2020_05_078 crossref_primary_10_1109_ACCESS_2020_2969204 crossref_primary_10_3233_JIFS_220171 crossref_primary_10_1007_s12559_018_9613_6 crossref_primary_10_1016_j_ins_2019_11_022 crossref_primary_10_1109_ACCESS_2018_2872322 crossref_primary_10_1002_int_22193 crossref_primary_10_3390_e23091222 crossref_primary_10_1007_s12559_019_09647_x crossref_primary_10_1016_j_knosys_2020_106598 crossref_primary_10_1109_ACCESS_2020_3007956 crossref_primary_10_1007_s10489_024_05763_w crossref_primary_10_3390_electronics10151757 crossref_primary_10_1016_j_knosys_2019_105120 crossref_primary_10_1016_j_neucom_2018_01_039 crossref_primary_10_1016_j_neucom_2020_07_007 crossref_primary_10_3233_ICA_170555 crossref_primary_10_1016_j_ins_2020_12_082 crossref_primary_10_1016_j_inffus_2019_11_001 crossref_primary_10_1155_2021_9375664 crossref_primary_10_1007_s11042_022_13929_2 crossref_primary_10_1016_j_ins_2024_120883 crossref_primary_10_1088_1742_6596_1432_1_012074 crossref_primary_10_1016_j_inffus_2018_10_009 crossref_primary_10_1016_j_inffus_2022_10_028 crossref_primary_10_1016_j_ijmedinf_2019_05_024 crossref_primary_10_1007_s12559_019_09630_6 crossref_primary_10_1007_s12559_019_09632_4 crossref_primary_10_1016_j_eswa_2021_115419 crossref_primary_10_1109_ACCESS_2020_2988717 crossref_primary_10_1016_j_inffus_2023_102180 crossref_primary_10_1109_ACCESS_2019_2955983 crossref_primary_10_1186_s41044_018_0038_8 crossref_primary_10_3390_en12061036 crossref_primary_10_1016_j_aei_2023_102024 crossref_primary_10_1002_widm_1289 crossref_primary_10_1016_j_procs_2023_10_010 crossref_primary_10_1109_TFUZZ_2021_3049911 crossref_primary_10_1016_j_ins_2024_121587 crossref_primary_10_1155_2022_9708138 crossref_primary_10_1145_3480968 crossref_primary_10_3390_app122312265 crossref_primary_10_1051_e3sconf_202339904033 crossref_primary_10_1109_JSTARS_2022_3189052 crossref_primary_10_1016_j_inffus_2021_03_007 crossref_primary_10_1007_s12559_018_9612_7 crossref_primary_10_1186_s41044_019_0041_8 crossref_primary_10_3390_bdcc2040034 crossref_primary_10_1016_j_inffus_2018_07_008 crossref_primary_10_1186_s12859_018_2148_8 crossref_primary_10_1016_j_inffus_2020_10_008 crossref_primary_10_1016_j_ins_2018_12_002 crossref_primary_10_1061_JMENEA_MEENG_5498 crossref_primary_10_1109_MCI_2018_2881645 crossref_primary_10_3390_bdcc5010012 crossref_primary_10_1016_j_inffus_2021_04_016 crossref_primary_10_1145_3400031 crossref_primary_10_22201_iibi_24488321xe_2020_82_58035 crossref_primary_10_32604_cmc_2022_029604 crossref_primary_10_3390_rs14071568 crossref_primary_10_1016_j_comcom_2020_06_020 crossref_primary_10_1016_j_knosys_2018_04_037 crossref_primary_10_1007_s12559_019_09655_x crossref_primary_10_1016_j_rcim_2019_101861 crossref_primary_10_1186_s40537_023_00808_2 crossref_primary_10_1016_j_comnet_2018_01_016 crossref_primary_10_1016_j_engappai_2020_104030 crossref_primary_10_1007_s42452_020_03870_0 crossref_primary_10_3390_s18124474 crossref_primary_10_1155_2021_9958427 crossref_primary_10_1016_j_knosys_2018_12_028 crossref_primary_10_1007_s12559_024_10295_z crossref_primary_10_1007_s12652_019_01261_x crossref_primary_10_1109_ACCESS_2018_2879158 crossref_primary_10_1016_j_displa_2023_102526 crossref_primary_10_1109_TBDATA_2021_3139069 crossref_primary_10_1016_j_future_2018_03_008 crossref_primary_10_1017_jmo_2018_81 crossref_primary_10_1016_j_asoc_2019_105504 crossref_primary_10_1177_1550147719870657 crossref_primary_10_1080_24751839_2018_1501542
Cites_doi	10.1016/j.inffus.2004.04.009 10.1109/COMST.2015.2444095 10.1109/TSMCC.2010.2103939 10.1109/ACCESS.2017.2694446 10.1002/int.21833 10.1002/widm.1134 10.1145/2094114.2094118 10.1023/A:1007614523901 10.1109/ACCESS.2014.2332453 10.1109/MCAS.2006.1688199 10.1016/j.ins.2014.03.043 10.1145/1327452.1327492 10.1080/18756891.2015.1017377 10.1016/j.inffus.2016.10.004 10.1007/s00778-014-0357-y 10.1016/j.fss.2014.01.015 10.1155/2015/748681 10.1109/TCYB.2015.2507599 10.1016/j.inffus.2015.06.002 10.1016/j.ins.2014.01.015 10.1016/j.inffus.2017.02.004 10.1038/ncomms5308 10.1016/j.inffus.2015.08.005 10.1145/79173.79181 10.1145/1629175.1629198 10.1109/SURV.2013.103013.00206 10.1080/18756891.2016.1180820 10.2307/41703503 10.1109/TKDE.2013.109 10.1007/s40747-017-0037-9 10.1016/j.neucom.2014.04.078 10.3390/bdcc1010001 10.1007/BF01589116 10.1109/TKDE.2011.208 10.1016/j.knosys.2016.06.012 10.1016/j.inffus.2015.06.005 10.1186/s41044-016-0020-2
ContentType	Journal Article
Copyright	2017 Elsevier B.V.
Copyright_xml	– notice: 2017 Elsevier B.V.
DBID	AAYXX CITATION
DOI	10.1016/j.inffus.2017.10.001
DatabaseName	CrossRef
DatabaseTitle	CrossRef
DatabaseTitleList
DeliveryMethod	fulltext_linktorsrc
Discipline	Mathematics
EISSN	1872-6305
EndPage	61
ExternalDocumentID	10_1016_j_inffus_2017_10_001 S1566253517305912
GroupedDBID	--K --M .DC .~1 0R~ 1B1 1~. 1~5 29I 4.4 457 4G. 5GY 5VS 7-5 71M 8P~ AACTN AAEDT AAEDW AAIAV AAIKJ AAKOC AALRI AAOAW AAQFI AAQXK AAXUO AAYFN ABBOA ABFNM ABJNI ABMAC ABXDB ABYKQ ACDAQ ACGFS ACNNM ACRLP ACZNC ADBBV ADEZE ADJOM ADMUD ADTZH AEBSH AECPX AEKER AENEX AFKWA AFTJW AGHFR AGUBO AGYEJ AHJVU AHZHX AIALX AIEXJ AIKHN AITUG AJBFU AJOXV ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ AOUOD ASPBG AVWKF AXJTR AZFZN BJAXD BKOJK BLXMC CS3 DU5 EBS EFJIC EFLBG EJD EO8 EO9 EP2 EP3 F0J F5P FDB FEDTE FGOYB FIRID FNPLU FYGXN G-Q GBLVA GBOLZ HVGLF HZ~ IHE J1W JJJVA KOM M41 MO0 N9A O-L O9- OAUVE OZT P-8 P-9 P2P PC. Q38 R2- RIG ROL RPZ SDF SDG SDP SES SEW SPC SPCBC SST SSV SSZ T5K UHS ZMT ~G- AATTM AAXKI AAYWO AAYXX ABWVN ACLOT ACRPL ACVFH ADCNI ADNMO AEIPS AEUPX AFJKZ AFPUW AGQPQ AIGII AIIUN AKBMS AKRWK AKYEP ANKPU APXCP CITATION EFKBS ~HD
ID	FETCH-LOGICAL-c306t-554082e154a99e19298e0da3686c3e16eba10eae75f12b8e17f5059b65a300563
IEDL.DBID	.~1
ISSN	1566-2535
IngestDate	Thu Apr 24 23:08:14 EDT 2025 Wed Oct 29 21:11:58 EDT 2025 Fri Feb 23 02:46:47 EST 2024
IsPeerReviewed	true
IsScholarly	true
Keywords	Big Data Analytics Spark Information fusion Machine learning MapReduce
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c306t-554082e154a99e19298e0da3686c3e16eba10eae75f12b8e17f5059b65a300563
PageCount	11
ParticipantIDs	crossref_primary_10_1016_j_inffus_2017_10_001 crossref_citationtrail_10_1016_j_inffus_2017_10_001 elsevier_sciencedirect_doi_10_1016_j_inffus_2017_10_001
PublicationCentury	2000
PublicationDate	July 2018 2018-07-00
PublicationDateYYYYMMDD	2018-07-01
PublicationDate_xml	– month: 07 year: 2018 text: July 2018
PublicationDecade	2010
PublicationTitle	Information fusion
PublicationYear	2018
Publisher	Elsevier B.V
Publisher_xml	– name: Elsevier B.V
References	Tsai, Lai, Chiang, Yang (bib0031) 2014; 16 Ramírez-Gallego, Lastra, Martínez-Rego, Bolón-Canedo, Benítez, Herrera, Alonso-Betanzos (bib0070) 2017; 32 Hamstra, Karau, Zaharia, Konwinski, Wendell (bib0051) 2015 Chen, Chiang, Storey (bib0001) 2012; 36 Larson, Chang (bib0009) 2016; 36 D. Blog, Scalable decision trees in MLlib, 2017 Chen, Zhang (bib0004) 2014; 275 Minelli, Chambers, Dhiraj (bib0002) 2013 Meng, Bradley, Yavuz, Sparks, Venkataraman, Liu, Freeman, Tsai, Amde, Owen, Xin, Xin, Franklin, Zadeh, Zaharia, Talwalkar (bib0028) 2016; 17 Fernandez, Rio, Chawla, Herrera (bib0086) 2017; 3 Lam (bib0026) 2011 Zaharia, Chowdhury, Franklin, Shenker, Stoica (bib0029) 2010 Sun, Luo, Chen (bib0035) 2017; 36 Schapire (bib0075) 1999 Ramírez-Gallego, no Talín, Martínez-Rego, Bolón-Canedo, Benítez, Alonso-Betanzos, Herrera (bib0083) 2017; in press Wu, Zhu, Wu, Ding (bib0011) 2014; 26 Zhang, Wu, Yang (bib0036) 2014 del Río, López, Benítez, Herrera (bib0038) 2014; 285 Lee, Lee, Choi, Chung, Moon (bib0020) 2011; 40 Owen, Anil, Dunning, Friedman (bib0023) 2011 A. Tez, Apache tez, 2017 Meng, Li, Zhang (bib0037) 2015 White (bib0027) 2015 Chen, Mao, Zhang, Leung (bib0017) 2014 Apache Flink Project, Peeking into Apache flink’s engine room, 2017 Abbasi, Sarker, Chiang (bib0013) 2016; 17 A. Mahout, Apache mahout, 2017 Gandomi, Haider (bib0015) 2015; 35 Galpert, Río, Herrera, Ancede-Gallardo, Antunes, Agero-Chapin (bib0032) 2015; 2015 P, Herbach, Basu, Bayardo, Inc (bib0053) 2009 The Apache Software Foundation (bib0024) 2017 A. Spark, Machine learning library (MLlib) guide, 2017 Davis (bib0006) 2017 García-Gil, Ramírez-Gallego, García, Herrera (bib0061) 2017; 2 Hwang, Chen (bib0005) 2017 . Rokach (bib0074) 2016; 27 Choi, Chan, Yue (bib0010) 2017; 47 Kuncheva (bib0079) 2005; 6 Dean, Ghemawat (bib0019) 2010; 53 Zaharia, Chowdhury, Das, Dave, Ma, McCauley, Franklin, Shenker, Stoica (bib0030) 2012 del Río, López, Benítez, Herrera (bib0039) 2015; 8 A. YARN, Apache YARN, 2017 Fernández, Río, López, Bawakid, del Jesus, Benítez, Herrera (bib0003) 2014; 4 A.S. Foundation, Apache project directory, 2017 A. Spark, Apache spark: lightning-fast cluster computing, 2017 S. Packages, 3rd party spark packages, 2017 Buitinck, Louppe, Blondel, Pedregosa, Mueller, Grisel, Niculae, Prettenhofer, Gramfort, Grobler, Layton, VanderPlas, Joly, Holt, Varoquaux (bib0060) 2013 Valiant (bib0042) 1990; 33 Balazs, Velásquez (bib0034) 2016; 27 M. Sung, SIMD parallel processing michael sung 6.911: architectures anonymous, 2000. Hu, Wen, Chua, Li (bib0016) 2014; 2 Zhao, Ma, He (bib0068) 2009; 5931 M. Lichman, UCI machine learning repository, 2013. Fernandez, Carmona, del Jesus, Herrera (bib0078) 2016; 9 Palit, Reddy (bib0064) 2012; 24 D. Blog, Random forests and boosting in MLlib, 2017 Triguero, Derrac, García, Herrera (bib0077) 2012; 42 Dean, Ghemawat (bib0018) 2008; 51 Jaggi, Smith, Takác, Terhorst, Krishnan, Hofmann, Jordan (bib0059) 2014; abs/1409.1458 Alexandrov, Bergmann, Ewen, Freytag, Hueske, Heise, Kao, Leich, Leser, Markl, Naumann, Peters, Rheinlnder, Sax, Schelter, Hger, Tzoumas, Warneke (bib0057) 2014; 23 Schapire, Singer (bib0076) 1999; 37 Lyubimov, Palumbo (bib0025) 2016 Hastie, Tibshirani, Friedman (bib0080) 2011 H.D.F. System, Hadoop distributed file system, 2017 D. Harris, The history of Hadoop: from 4 nodes to the future of data, 2013 Wixom, Ariyachandra, Douglas, Goul, Gupta, Iyer, Kulkarni, Mooney, Phillips-Wren, Turetken (bib0012) 2014; 34 Triguero, Peralta, Bacardit, García, Herrera (bib0065) 2015; 150 Baldi, Sadowski, Whiteson (bib0084) 2014; 5 Polikar (bib0071) 2006; 6 Shvachko, Kuang, Radia, Chansler (bib0021) 2010 Krawczyk, Minku, Gama, Stefanowski, Woniak (bib0022) 2017; 37 Dean, Ghemawat (bib0043) 2004 Zaman Khan RZ (bib0041) 2013; 2 Liu, Nocedal (bib0082) 1989; 45 Ewen, Tzoumas, Kaufmann, Markl (bib0056) 2012; 5 Chen, Hao, Hwang, Wang, Wang (bib0033) 2017; 5 del Río, López, Benítez, Herrera (bib0062) 2015; 8 Chen (bib0014) 2016; 1 Hueske, Peters, Sax, Rheinlnder, Bergmann, Krettek, Tzoumas (bib0058) 2012; 5 López, Río, Benítez, Herrera (bib0063) 2015; 258 Assuncao, Fernandes, Lopes, Normey (bib0072) 2013 Orgaz, Jung, Camacho (bib0008) 2016; 28 Maillo, Ramírez-Gallego, Triguero, Herrera (bib0069) 2017; 117 Al-Fuqaha, Guizani, Mohammadi, Aledhari, Ayyash (bib0007) 2015; 17 Wang, Goh, Wong, Montana (bib0073) 2013; 14 A. Flink, Apache flink, 2017 Zaharia (10.1016/j.inffus.2017.10.001_sbref0030) 2012 Hu (10.1016/j.inffus.2017.10.001_bib0016) 2014; 2 Schapire (10.1016/j.inffus.2017.10.001_bib0075) 1999 Lyubimov (10.1016/j.inffus.2017.10.001_bib0025) 2016 10.1016/j.inffus.2017.10.001_bib0050 Abbasi (10.1016/j.inffus.2017.10.001_bib0013) 2016; 17 Chen (10.1016/j.inffus.2017.10.001_sbref0014) 2016; 1 10.1016/j.inffus.2017.10.001_bib0052 Minelli (10.1016/j.inffus.2017.10.001_bib0002) 2013 10.1016/j.inffus.2017.10.001_bib0054 10.1016/j.inffus.2017.10.001_bib0055 Jaggi (10.1016/j.inffus.2017.10.001_bib0059) 2014; abs/1409.1458 Zhang (10.1016/j.inffus.2017.10.001_bib0036) 2014 10.1016/j.inffus.2017.10.001_bib0047 P (10.1016/j.inffus.2017.10.001_bib0053) 2009 10.1016/j.inffus.2017.10.001_bib0048 Orgaz (10.1016/j.inffus.2017.10.001_bib0008) 2016; 28 10.1016/j.inffus.2017.10.001_bib0049 Krawczyk (10.1016/j.inffus.2017.10.001_bib0022) 2017; 37 Triguero (10.1016/j.inffus.2017.10.001_bib0065) 2015; 150 Dean (10.1016/j.inffus.2017.10.001_bib0018) 2008; 51 Schapire (10.1016/j.inffus.2017.10.001_bib0076) 1999; 37 Hueske (10.1016/j.inffus.2017.10.001_bib0058) 2012; 5 Hamstra (10.1016/j.inffus.2017.10.001_bib0051) 2015 Polikar (10.1016/j.inffus.2017.10.001_bib0071) 2006; 6 Wang (10.1016/j.inffus.2017.10.001_bib0073) 2013; 14 10.1016/j.inffus.2017.10.001_bib0081 10.1016/j.inffus.2017.10.001_bib0040 10.1016/j.inffus.2017.10.001_bib0085 Chen (10.1016/j.inffus.2017.10.001_bib0033) 2017; 5 10.1016/j.inffus.2017.10.001_bib0044 Hastie (10.1016/j.inffus.2017.10.001_bib0080) 2011 Chen (10.1016/j.inffus.2017.10.001_bib0017) 2014 10.1016/j.inffus.2017.10.001_bib0045 10.1016/j.inffus.2017.10.001_bib0046 Tsai (10.1016/j.inffus.2017.10.001_bib0031) 2014; 16 Buitinck (10.1016/j.inffus.2017.10.001_bib0060) 2013 Triguero (10.1016/j.inffus.2017.10.001_bib0077) 2012; 42 Zaharia (10.1016/j.inffus.2017.10.001_bib0029) 2010 Chen (10.1016/j.inffus.2017.10.001_bib0004) 2014; 275 Galpert (10.1016/j.inffus.2017.10.001_bib0032) 2015; 2015 Liu (10.1016/j.inffus.2017.10.001_bib0082) 1989; 45 Fernandez (10.1016/j.inffus.2017.10.001_bib0086) 2017; 3 Baldi (10.1016/j.inffus.2017.10.001_bib0084) 2014; 5 Alexandrov (10.1016/j.inffus.2017.10.001_bib0057) 2014; 23 Larson (10.1016/j.inffus.2017.10.001_bib0009) 2016; 36 Ewen (10.1016/j.inffus.2017.10.001_bib0056) 2012; 5 Choi (10.1016/j.inffus.2017.10.001_bib0010) 2017; 47 Owen (10.1016/j.inffus.2017.10.001_bib0023) 2011 Sun (10.1016/j.inffus.2017.10.001_bib0035) 2017; 36 Meng (10.1016/j.inffus.2017.10.001_bib0028) 2016; 17 del Río (10.1016/j.inffus.2017.10.001_bib0039) 2015; 8 López (10.1016/j.inffus.2017.10.001_bib0063) 2015; 258 White (10.1016/j.inffus.2017.10.001_bib0027) 2015 Kuncheva (10.1016/j.inffus.2017.10.001_bib0079) 2005; 6 García-Gil (10.1016/j.inffus.2017.10.001_bib0061) 2017; 2 Chen (10.1016/j.inffus.2017.10.001_bib0001) 2012; 36 Fernandez (10.1016/j.inffus.2017.10.001_bib0078) 2016; 9 Lee (10.1016/j.inffus.2017.10.001_bib0020) 2011; 40 Ramírez-Gallego (10.1016/j.inffus.2017.10.001_bib0070) 2017; 32 Meng (10.1016/j.inffus.2017.10.001_bib0037) 2015 Maillo (10.1016/j.inffus.2017.10.001_bib0069) 2017; 117 Davis (10.1016/j.inffus.2017.10.001_bib0006) 2017 Dean (10.1016/j.inffus.2017.10.001_bib0019) 2010; 53 Lam (10.1016/j.inffus.2017.10.001_bib0026) 2011 Valiant (10.1016/j.inffus.2017.10.001_bib0042) 1990; 33 Assuncao (10.1016/j.inffus.2017.10.001_bib0072) 2013 Gandomi (10.1016/j.inffus.2017.10.001_bib0015) 2015; 35 del Río (10.1016/j.inffus.2017.10.001_bib0038) 2014; 285 Zhao (10.1016/j.inffus.2017.10.001_bib0068) 2009; 5931 The Apache Software Foundation (10.1016/j.inffus.2017.10.001_bib0024) 2017 Dean (10.1016/j.inffus.2017.10.001_bib0043) 2004 10.1016/j.inffus.2017.10.001_bib0066 10.1016/j.inffus.2017.10.001_bib0067 Shvachko (10.1016/j.inffus.2017.10.001_bib0021) 2010 Fernández (10.1016/j.inffus.2017.10.001_bib0003) 2014; 4 Wu (10.1016/j.inffus.2017.10.001_bib0011) 2014; 26 Ramírez-Gallego (10.1016/j.inffus.2017.10.001_bib0083) 2017; in press Hwang (10.1016/j.inffus.2017.10.001_bib0005) 2017 Balazs (10.1016/j.inffus.2017.10.001_bib0034) 2016; 27 Palit (10.1016/j.inffus.2017.10.001_bib0064) 2012; 24 Al-Fuqaha (10.1016/j.inffus.2017.10.001_bib0007) 2015; 17 Zaman Khan RZ (10.1016/j.inffus.2017.10.001_bib0041) 2013; 2 Rokach (10.1016/j.inffus.2017.10.001_bib0074) 2016; 27 Wixom (10.1016/j.inffus.2017.10.001_bib0012) 2014; 34 del Río (10.1016/j.inffus.2017.10.001_bib0062) 2015; 8
References_xml	– volume: 17 start-page: 1 year: 2016 end-page: 7 ident: bib0028 article-title: Mllib: machine learning in apache spark publication-title: J. Mach. Learn. Res. – start-page: 425 year: 2013 end-page: 426 ident: bib0072 article-title: Distributed stochastic aware random forests - efficient data mining for big data publication-title: Proceedings - 2013 IEEE International Congress on Big Data, BigData 2013 – year: 2017 ident: bib0005 article-title: Big Data Analytics for Cloud, IoT and Cognitive Computing – reference: A. Tez, Apache tez, 2017, ( – volume: 40 start-page: 11 year: 2011 end-page: 20 ident: bib0020 article-title: Parallel data processing with mapreduce: a survey publication-title: SIGMOD Record – volume: 36 start-page: 700 year: 2016 end-page: 710 ident: bib0009 article-title: A review and future direction of agile, business intelligence, analytics and data science publication-title: Int. J. Inf. Manage. – reference: ). – volume: in press year: 2017 ident: bib0083 article-title: An information theoretic feature selection framework for big data under apache spark publication-title: IEEE Trans. Syst. Man Cybern. – reference: H.D.F. System, Hadoop distributed file system, 2017, ( – start-page: 1 year: 2010 end-page: 7 ident: bib0029 article-title: Spark: cluster computing with working sets publication-title: HotCloud 2010 – volume: 8 start-page: 422 year: 2015 end-page: 437 ident: bib0039 article-title: A mapreduce approach to address big data classification problems based on the fusion of linguistic fuzzy rules publication-title: Int. J. Comput. Intell. Syst. – volume: 117 start-page: 3 year: 2017 end-page: 15 ident: bib0069 article-title: Knn-is: an iterative spark-based design of the k-nearest neighbors classifier for big data. publication-title: Knowl. Based Syst. – volume: 16 start-page: 77 year: 2014 end-page: 97 ident: bib0031 article-title: Data mining for internet of things: a survey publication-title: IEEE Commun. Surv. Tut. – volume: 23 start-page: 939 year: 2014 end-page: 964 ident: bib0057 article-title: The stratosphere platform for big data analytics publication-title: Int. J. Very Large Databases – volume: 51 start-page: 107 year: 2008 end-page: 113 ident: bib0018 article-title: MapReduce: simplified data processing on large clusters publication-title: Commun. ACM – volume: 150 start-page: 331 year: 2015 end-page: 345 ident: bib0065 article-title: Mrpr: a mapreduce solution for prototype reduction in big data classification. publication-title: Neurocomputing – reference: D. Blog, Random forests and boosting in MLlib, 2017, ( – year: 2017 ident: bib0024 article-title: Mahout, an open source project which includes scalable machine learning algorithms – volume: 27 start-page: 111 year: 2016 end-page: 125 ident: bib0074 article-title: Decision forest: twenty years of research publication-title: Inf. Fus. – volume: 2 start-page: 652 year: 2014 end-page: 687 ident: bib0016 article-title: Toward scalable systems for big data analytics: a technology tutorial publication-title: IEEE Access – year: 2011 ident: bib0026 article-title: Hadoop in Action – year: 2010 ident: bib0021 article-title: The hadoop distributed file system publication-title: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies, MSST2010 – reference: A. Spark, Machine learning library (MLlib) guide, 2017, ( – volume: 14 year: 2013 ident: bib0073 article-title: Random forests on hadoop for genome-wide association studies of multivariate neuroimaging phenotypes publication-title: BMC Bioinfor. – reference: A. Spark, Apache spark: lightning-fast cluster computing, 2017, ( – volume: 27 start-page: 95 year: 2016 end-page: 110 ident: bib0034 article-title: Opinion mining and information fusion: a survey publication-title: Inf. Fus. – volume: 8 start-page: 422 year: 2015 end-page: 437 ident: bib0062 article-title: A mapreduce approach to address big data classification problems based on the fusion of linguistic fuzzy rules. publication-title: Int. J. Comput. Intell. Syst. – volume: 37 start-page: 297 year: 1999 end-page: 336 ident: bib0076 article-title: Improved boosting algorithms using confidence-rated predictions publication-title: Mach. Learn. – volume: 5 start-page: 1256 year: 2012 end-page: 1267 ident: bib0058 article-title: Opening the black boxes in data flow optimization publication-title: PVLDB – year: 2017 ident: bib0006 article-title: BIG DATA and DATA ANALYTICS: The Beginner’s Guide to Understanding the Analytical World. – volume: 45 start-page: 503 year: 1989 end-page: 528 ident: bib0082 article-title: On the limited memory BFGS method for large scale optimization publication-title: Math. Program. – volume: 37 start-page: 132 year: 2017 end-page: 156 ident: bib0022 article-title: Ensemble learning for data stream analysis: a survey publication-title: Inf. Fus. – volume: 258 start-page: 5 year: 2015 end-page: 38 ident: bib0063 article-title: Cost-sensitive linguistic fuzzy rule based classification systems under the MapReduce framework for imbalanced big data publication-title: Fuzzy Sets Syst. – volume: 5 start-page: 4308 year: 2014 ident: bib0084 article-title: Searching for exotic particles in high-Energy physics with deep learning publication-title: Nat. Commun. – volume: 42 start-page: 86 year: 2012 end-page: 100 ident: bib0077 article-title: A taxonomy and experimental study on prototype generation for nearest neighbor classification. publication-title: IEEE Trans. Syst. Man Cybern. Part C – reference: A. Flink, Apache flink, 2017, ( – start-page: 1426 year: 2009 end-page: 1437 ident: bib0053 article-title: PLANET: massively parallel learning of tree ensembles with mapreduce publication-title: PVLDB – volume: 36 start-page: 1165 year: 2012 end-page: 1188 ident: bib0001 article-title: Business intelligence and analytics: from big data to big impact publication-title: MIS Q. – volume: 33 start-page: 103 year: 1990 end-page: 111 ident: bib0042 article-title: A bridging model for parallel computation publication-title: Commun. ACM – volume: 17 start-page: 2347 year: 2015 end-page: 2376 ident: bib0007 article-title: Internet of things: a survey on enabling technologies, protocols, and applications publication-title: IEEE Commun. Surv. Tutorials – volume: 34 start-page: 1 year: 2014 end-page: 13 ident: bib0012 article-title: The current state of business intelligence in academia: the arrival of big data publication-title: Commun. Assoc. Inf. Syst. – start-page: 137 year: 2004 end-page: 150 ident: bib0043 article-title: Mapreduce: Simplified data processing on large clusters publication-title: In Proceedings of the 6th Symposium on Operating Systems Design and Implementation (OSDI) – volume: 9 start-page: 69 year: 2016 end-page: 80 ident: bib0078 article-title: A view on fuzzy systems for big data: progress and opportunities publication-title: Int. J. Comput. Intell. Systems – volume: 1 year: 2016 ident: bib0014 article-title: Welcome to the new interdisciplinary journal combining big data and cognitive computing publication-title: Big Data Cognit. Comput. – reference: A. Mahout, Apache mahout, 2017, ( – start-page: 108 year: 2013 end-page: 122 ident: bib0060 article-title: API design for machine learning software: experiences from the scikit-learn project publication-title: ECML PKDD Workshop: Languages for Data Mining and Machine Learning – start-page: 439 year: 2014 end-page: 443 ident: bib0036 article-title: Parallelization of ontology construction and fusion based on mapreduce publication-title: 2014 IEEE 3rd International Conference on Cloud Computing and Intelligence Systems – volume: 275 start-page: 314 year: 2014 end-page: 347 ident: bib0004 article-title: Data-intensive applications, challenges, techniques and technologies: a survey on big data publication-title: Inf. Sci. – volume: 6 start-page: 3 year: 2005 end-page: 4 ident: bib0079 article-title: Diversity in multiple classifier systems publication-title: Inf. Fus. – volume: 28 start-page: 45 year: 2016 end-page: 59 ident: bib0008 article-title: Social big data: recent achievements and new challenges publication-title: Inf. Fus. – reference: A.S. Foundation, Apache project directory, 2017, ( – reference: D. Blog, Scalable decision trees in MLlib, 2017, ( – volume: 17 start-page: 1 year: 2016 end-page: 32 ident: bib0013 article-title: Big data research in information systems: toward an inclusive research agenda publication-title: J. Assoc. Inf. Syst. – year: 2015 ident: bib0051 article-title: Learning Spark: Lightning-Fast Big Data Analytics – reference: A. YARN, Apache YARN, 2017, ( – start-page: 1539 year: 2015 end-page: 1544 ident: bib0037 article-title: Parallel information fusion method for microarray data analysis publication-title: 2015 IEEE International Conference on Big Data (Big Data) – year: 2015 ident: bib0027 article-title: Hadoop: The Definitive Guide – volume: 2015 start-page: 748681:1 year: 2015 end-page: 748681:12 ident: bib0032 article-title: An effective big data supervised imbalanced classification approach for ortholog detection in related yeast species publication-title: Biomed. Res. Int. – volume: 6 start-page: 21 year: 2006 end-page: 45 ident: bib0071 article-title: Ensemble based systems in decision making publication-title: IEEE Circuits Syst. Mag. – year: 2014 ident: bib0017 article-title: Big Data - Related Technologies, Challenges and Future Prospects publication-title: Springer briefs in computer science – volume: 285 start-page: 112 year: 2014 end-page: 137 ident: bib0038 article-title: On the use of mapreduce for imbalanced big data using random forest publication-title: Inf. Sci. (Ny) – volume: 32 start-page: 134 year: 2017 end-page: 152 ident: bib0070 article-title: Fast-mrmr: fast minimum redundancy maximum relevance algorithm for high-dimensional big data. publication-title: Int. J. Intell. Syst. – volume: 47 start-page: 81 year: 2017 end-page: 92 ident: bib0010 article-title: Recent development in big data analytics for business operations and risk management publication-title: IEEE Trans. Cybern. – volume: 24 start-page: 1904 year: 2012 end-page: 1916 ident: bib0064 article-title: Scalable and parallel boosting with mapreduce publication-title: IEEE Trans. Knowl. Data Eng. – year: 2013 ident: bib0002 article-title: Big Data, Big Analytics: Emerging Business Intelligence and Analytic Trends for Today’s Businesses – year: 2011 ident: bib0023 article-title: Mahout in Action – volume: 5931 start-page: 674 year: 2009 end-page: 679 ident: bib0068 article-title: Parallel k-means clustering based on mapreduce publication-title: CloudCom 2009 – volume: abs/1409.1458 year: 2014 ident: bib0059 article-title: Communication-efficient distributed dual coordinate ascent publication-title: CoRR – volume: 4 start-page: 380 year: 2014 end-page: 409 ident: bib0003 article-title: Big data with cloud computing: an insight on the computing environment, mapreduce and programming framework publication-title: WIREs Data Min. Knowl. Discovery – volume: 53 start-page: 72 year: 2010 end-page: 77 ident: bib0019 article-title: MapReduce: a flexible data processing tool publication-title: Commun. ACM – reference: D. Harris, The history of Hadoop: from 4 nodes to the future of data, 2013, ( – year: 2011 ident: bib0080 article-title: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition publication-title: Springer series in statistics – volume: 36 start-page: 10 year: 2017 end-page: 25 ident: bib0035 article-title: A review of natural language processing techniques for opinion mining systems publication-title: Inf. Fus. – volume: 2 start-page: 1 year: 2017 ident: bib0061 article-title: A comparison on scalability for batch big data processing on apache spark and apache flink publication-title: Big Data Anal. – start-page: 1401 year: 1999 end-page: 1406 ident: bib0075 article-title: A brief introduction to boosting publication-title: IJCAI – volume: 2 start-page: 81 year: 2013 end-page: 85 ident: bib0041 article-title: Use of DAG in distributed parallel computing publication-title: Int. J. Appl. Innov. Eng. Manage. – year: 2012 ident: bib0030 article-title: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing publication-title: Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, NSDI’12 – volume: 5 start-page: 8869 year: 2017 end-page: 8879 ident: bib0033 article-title: Disease prediction by machine learning over big data from healthcare communities. publication-title: IEEE Access – volume: 35 start-page: 137 year: 2015 end-page: 144 ident: bib0015 article-title: Beyond the hype: big data concepts, methods, and analytics publication-title: Int. J. Inf. Manage. – reference: M. Sung, SIMD parallel processing michael sung 6.911: architectures anonymous, 2000. – reference: S. Packages, 3rd party spark packages, 2017, ( – volume: 26 start-page: 97 year: 2014 end-page: 107 ident: bib0011 article-title: Data mining with big data publication-title: Knowl. Data Eng. IEEE Trans. – year: 2016 ident: bib0025 article-title: Apache Mahout: Beyond MapReduce – volume: 5 start-page: 1268 year: 2012 end-page: 1279 ident: bib0056 article-title: Spinning fast iterative data flows publication-title: PVLDB – reference: M. Lichman, UCI machine learning repository, 2013. – reference: Apache Flink Project, Peeking into Apache flink’s engine room, 2017, ( – volume: 3 start-page: 105 year: 2017 end-page: 120 ident: bib0086 article-title: An insight into imbalanced big data classification: outcomes and challenges publication-title: Complex Intell. Syst. – volume: 6 start-page: 3 issue: 1 year: 2005 ident: 10.1016/j.inffus.2017.10.001_bib0079 article-title: Diversity in multiple classifier systems publication-title: Inf. Fus. doi: 10.1016/j.inffus.2004.04.009 – volume: 17 start-page: 2347 issue: 4 year: 2015 ident: 10.1016/j.inffus.2017.10.001_bib0007 article-title: Internet of things: a survey on enabling technologies, protocols, and applications publication-title: IEEE Commun. Surv. Tutorials doi: 10.1109/COMST.2015.2444095 – year: 2013 ident: 10.1016/j.inffus.2017.10.001_bib0002 – year: 2015 ident: 10.1016/j.inffus.2017.10.001_bib0027 – volume: 42 start-page: 86 issue: 1 year: 2012 ident: 10.1016/j.inffus.2017.10.001_bib0077 article-title: A taxonomy and experimental study on prototype generation for nearest neighbor classification. publication-title: IEEE Trans. Syst. Man Cybern. Part C doi: 10.1109/TSMCC.2010.2103939 – volume: 5 start-page: 8869 year: 2017 ident: 10.1016/j.inffus.2017.10.001_bib0033 article-title: Disease prediction by machine learning over big data from healthcare communities. publication-title: IEEE Access doi: 10.1109/ACCESS.2017.2694446 – year: 2010 ident: 10.1016/j.inffus.2017.10.001_bib0021 article-title: The hadoop distributed file system – volume: 32 start-page: 134 issue: 2 year: 2017 ident: 10.1016/j.inffus.2017.10.001_bib0070 article-title: Fast-mrmr: fast minimum redundancy maximum relevance algorithm for high-dimensional big data. publication-title: Int. J. Intell. Syst. doi: 10.1002/int.21833 – volume: 4 start-page: 380 issue: 5 year: 2014 ident: 10.1016/j.inffus.2017.10.001_bib0003 article-title: Big data with cloud computing: an insight on the computing environment, mapreduce and programming framework publication-title: WIREs Data Min. Knowl. Discovery doi: 10.1002/widm.1134 – volume: 40 start-page: 11 issue: 4 year: 2011 ident: 10.1016/j.inffus.2017.10.001_bib0020 article-title: Parallel data processing with mapreduce: a survey publication-title: SIGMOD Record doi: 10.1145/2094114.2094118 – ident: 10.1016/j.inffus.2017.10.001_bib0085 – volume: 37 start-page: 297 issue: 3 year: 1999 ident: 10.1016/j.inffus.2017.10.001_bib0076 article-title: Improved boosting algorithms using confidence-rated predictions publication-title: Mach. Learn. doi: 10.1023/A:1007614523901 – volume: 2 start-page: 652 year: 2014 ident: 10.1016/j.inffus.2017.10.001_bib0016 article-title: Toward scalable systems for big data analytics: a technology tutorial publication-title: IEEE Access doi: 10.1109/ACCESS.2014.2332453 – ident: 10.1016/j.inffus.2017.10.001_bib0047 – ident: 10.1016/j.inffus.2017.10.001_bib0081 – ident: 10.1016/j.inffus.2017.10.001_bib0066 – volume: 6 start-page: 21 issue: 3 year: 2006 ident: 10.1016/j.inffus.2017.10.001_bib0071 article-title: Ensemble based systems in decision making publication-title: IEEE Circuits Syst. Mag. doi: 10.1109/MCAS.2006.1688199 – volume: abs/1409.1458 year: 2014 ident: 10.1016/j.inffus.2017.10.001_bib0059 article-title: Communication-efficient distributed dual coordinate ascent publication-title: CoRR – volume: 14 issue: Suppl 16 year: 2013 ident: 10.1016/j.inffus.2017.10.001_bib0073 article-title: Random forests on hadoop for genome-wide association studies of multivariate neuroimaging phenotypes publication-title: BMC Bioinfor. – volume: 285 start-page: 112 year: 2014 ident: 10.1016/j.inffus.2017.10.001_bib0038 article-title: On the use of mapreduce for imbalanced big data using random forest publication-title: Inf. Sci. (Ny) doi: 10.1016/j.ins.2014.03.043 – volume: 5 start-page: 1256 year: 2012 ident: 10.1016/j.inffus.2017.10.001_bib0058 article-title: Opening the black boxes in data flow optimization publication-title: PVLDB – volume: 51 start-page: 107 issue: 1 year: 2008 ident: 10.1016/j.inffus.2017.10.001_bib0018 article-title: MapReduce: simplified data processing on large clusters publication-title: Commun. ACM doi: 10.1145/1327452.1327492 – volume: 8 start-page: 422 issue: 3 year: 2015 ident: 10.1016/j.inffus.2017.10.001_bib0062 article-title: A mapreduce approach to address big data classification problems based on the fusion of linguistic fuzzy rules. publication-title: Int. J. Comput. Intell. Syst. doi: 10.1080/18756891.2015.1017377 – year: 2017 ident: 10.1016/j.inffus.2017.10.001_bib0024 – ident: 10.1016/j.inffus.2017.10.001_bib0052 – year: 2011 ident: 10.1016/j.inffus.2017.10.001_bib0026 – volume: 36 start-page: 10 year: 2017 ident: 10.1016/j.inffus.2017.10.001_bib0035 article-title: A review of natural language processing techniques for opinion mining systems publication-title: Inf. Fus. doi: 10.1016/j.inffus.2016.10.004 – volume: 8 start-page: 422 issue: 3 year: 2015 ident: 10.1016/j.inffus.2017.10.001_bib0039 article-title: A mapreduce approach to address big data classification problems based on the fusion of linguistic fuzzy rules publication-title: Int. J. Comput. Intell. Syst. doi: 10.1080/18756891.2015.1017377 – year: 2016 ident: 10.1016/j.inffus.2017.10.001_bib0025 – year: 2011 ident: 10.1016/j.inffus.2017.10.001_bib0080 article-title: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition – year: 2014 ident: 10.1016/j.inffus.2017.10.001_bib0017 article-title: Big Data - Related Technologies, Challenges and Future Prospects – volume: 23 start-page: 939 issue: 6 year: 2014 ident: 10.1016/j.inffus.2017.10.001_bib0057 article-title: The stratosphere platform for big data analytics publication-title: Int. J. Very Large Databases doi: 10.1007/s00778-014-0357-y – volume: 36 start-page: 700 issue: 5 year: 2016 ident: 10.1016/j.inffus.2017.10.001_bib0009 article-title: A review and future direction of agile, business intelligence, analytics and data science publication-title: Int. J. Inf. Manage. – volume: 258 start-page: 5 year: 2015 ident: 10.1016/j.inffus.2017.10.001_bib0063 article-title: Cost-sensitive linguistic fuzzy rule based classification systems under the MapReduce framework for imbalanced big data publication-title: Fuzzy Sets Syst. doi: 10.1016/j.fss.2014.01.015 – volume: 2015 start-page: 748681:1 year: 2015 ident: 10.1016/j.inffus.2017.10.001_bib0032 article-title: An effective big data supervised imbalanced classification approach for ortholog detection in related yeast species publication-title: Biomed. Res. Int. doi: 10.1155/2015/748681 – ident: 10.1016/j.inffus.2017.10.001_bib0046 – start-page: 1 year: 2010 ident: 10.1016/j.inffus.2017.10.001_bib0029 article-title: Spark: cluster computing with working sets – volume: 47 start-page: 81 issue: 1 year: 2017 ident: 10.1016/j.inffus.2017.10.001_bib0010 article-title: Recent development in big data analytics for business operations and risk management publication-title: IEEE Trans. Cybern. doi: 10.1109/TCYB.2015.2507599 – ident: 10.1016/j.inffus.2017.10.001_bib0067 – volume: 27 start-page: 95 year: 2016 ident: 10.1016/j.inffus.2017.10.001_bib0034 article-title: Opinion mining and information fusion: a survey publication-title: Inf. Fus. doi: 10.1016/j.inffus.2015.06.002 – start-page: 1426 year: 2009 ident: 10.1016/j.inffus.2017.10.001_bib0053 article-title: PLANET: massively parallel learning of tree ensembles with mapreduce publication-title: PVLDB – volume: 275 start-page: 314 year: 2014 ident: 10.1016/j.inffus.2017.10.001_bib0004 article-title: Data-intensive applications, challenges, techniques and technologies: a survey on big data publication-title: Inf. Sci. doi: 10.1016/j.ins.2014.01.015 – volume: 37 start-page: 132 year: 2017 ident: 10.1016/j.inffus.2017.10.001_bib0022 article-title: Ensemble learning for data stream analysis: a survey publication-title: Inf. Fus. doi: 10.1016/j.inffus.2017.02.004 – start-page: 1539 year: 2015 ident: 10.1016/j.inffus.2017.10.001_bib0037 article-title: Parallel information fusion method for microarray data analysis – volume: 5 start-page: 4308 year: 2014 ident: 10.1016/j.inffus.2017.10.001_bib0084 article-title: Searching for exotic particles in high-Energy physics with deep learning publication-title: Nat. Commun. doi: 10.1038/ncomms5308 – volume: 28 start-page: 45 year: 2016 ident: 10.1016/j.inffus.2017.10.001_bib0008 article-title: Social big data: recent achievements and new challenges publication-title: Inf. Fus. doi: 10.1016/j.inffus.2015.08.005 – volume: 33 start-page: 103 issue: 8 year: 1990 ident: 10.1016/j.inffus.2017.10.001_bib0042 article-title: A bridging model for parallel computation publication-title: Commun. ACM doi: 10.1145/79173.79181 – start-page: 137 year: 2004 ident: 10.1016/j.inffus.2017.10.001_bib0043 article-title: Mapreduce: Simplified data processing on large clusters – ident: 10.1016/j.inffus.2017.10.001_bib0054 – volume: 53 start-page: 72 issue: 1 year: 2010 ident: 10.1016/j.inffus.2017.10.001_bib0019 article-title: MapReduce: a flexible data processing tool publication-title: Commun. ACM doi: 10.1145/1629175.1629198 – year: 2012 ident: 10.1016/j.inffus.2017.10.001_sbref0030 article-title: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing – volume: 16 start-page: 77 issue: 1 year: 2014 ident: 10.1016/j.inffus.2017.10.001_bib0031 article-title: Data mining for internet of things: a survey publication-title: IEEE Commun. Surv. Tut. doi: 10.1109/SURV.2013.103013.00206 – volume: in press year: 2017 ident: 10.1016/j.inffus.2017.10.001_bib0083 article-title: An information theoretic feature selection framework for big data under apache spark publication-title: IEEE Trans. Syst. Man Cybern. – year: 2017 ident: 10.1016/j.inffus.2017.10.001_bib0005 – start-page: 425 year: 2013 ident: 10.1016/j.inffus.2017.10.001_bib0072 article-title: Distributed stochastic aware random forests - efficient data mining for big data – volume: 35 start-page: 137 issue: 2 year: 2015 ident: 10.1016/j.inffus.2017.10.001_bib0015 article-title: Beyond the hype: big data concepts, methods, and analytics publication-title: Int. J. Inf. Manage. – volume: 5 start-page: 1268 issue: 11 year: 2012 ident: 10.1016/j.inffus.2017.10.001_bib0056 article-title: Spinning fast iterative data flows publication-title: PVLDB – volume: 9 start-page: 69 issue: 1 year: 2016 ident: 10.1016/j.inffus.2017.10.001_bib0078 article-title: A view on fuzzy systems for big data: progress and opportunities publication-title: Int. J. Comput. Intell. Systems doi: 10.1080/18756891.2016.1180820 – ident: 10.1016/j.inffus.2017.10.001_bib0045 – volume: 17 start-page: 1 issue: 34 year: 2016 ident: 10.1016/j.inffus.2017.10.001_bib0028 article-title: Mllib: machine learning in apache spark publication-title: J. Mach. Learn. Res. – volume: 17 start-page: 1 issue: 2 year: 2016 ident: 10.1016/j.inffus.2017.10.001_bib0013 article-title: Big data research in information systems: toward an inclusive research agenda publication-title: J. Assoc. Inf. Syst. – ident: 10.1016/j.inffus.2017.10.001_bib0049 – ident: 10.1016/j.inffus.2017.10.001_bib0044 – start-page: 439 year: 2014 ident: 10.1016/j.inffus.2017.10.001_bib0036 article-title: Parallelization of ontology construction and fusion based on mapreduce – ident: 10.1016/j.inffus.2017.10.001_bib0040 – volume: 36 start-page: 1165 issue: 4 year: 2012 ident: 10.1016/j.inffus.2017.10.001_bib0001 article-title: Business intelligence and analytics: from big data to big impact publication-title: MIS Q. doi: 10.2307/41703503 – volume: 26 start-page: 97 issue: 1 year: 2014 ident: 10.1016/j.inffus.2017.10.001_bib0011 article-title: Data mining with big data publication-title: Knowl. Data Eng. IEEE Trans. doi: 10.1109/TKDE.2013.109 – start-page: 1401 year: 1999 ident: 10.1016/j.inffus.2017.10.001_bib0075 article-title: A brief introduction to boosting – volume: 3 start-page: 105 issue: 2 year: 2017 ident: 10.1016/j.inffus.2017.10.001_bib0086 article-title: An insight into imbalanced big data classification: outcomes and challenges publication-title: Complex Intell. Syst. doi: 10.1007/s40747-017-0037-9 – year: 2015 ident: 10.1016/j.inffus.2017.10.001_bib0051 – year: 2017 ident: 10.1016/j.inffus.2017.10.001_bib0006 – volume: 150 start-page: 331 year: 2015 ident: 10.1016/j.inffus.2017.10.001_bib0065 article-title: Mrpr: a mapreduce solution for prototype reduction in big data classification. publication-title: Neurocomputing doi: 10.1016/j.neucom.2014.04.078 – ident: 10.1016/j.inffus.2017.10.001_bib0050 – year: 2011 ident: 10.1016/j.inffus.2017.10.001_bib0023 – volume: 1 issue: 1 year: 2016 ident: 10.1016/j.inffus.2017.10.001_sbref0014 article-title: Welcome to the new interdisciplinary journal combining big data and cognitive computing publication-title: Big Data Cognit. Comput. doi: 10.3390/bdcc1010001 – volume: 2 start-page: 81 issue: 11 year: 2013 ident: 10.1016/j.inffus.2017.10.001_bib0041 article-title: Use of DAG in distributed parallel computing publication-title: Int. J. Appl. Innov. Eng. Manage. – ident: 10.1016/j.inffus.2017.10.001_bib0055 – start-page: 108 year: 2013 ident: 10.1016/j.inffus.2017.10.001_bib0060 article-title: API design for machine learning software: experiences from the scikit-learn project – volume: 5931 start-page: 674 year: 2009 ident: 10.1016/j.inffus.2017.10.001_bib0068 article-title: Parallel k-means clustering based on mapreduce – volume: 45 start-page: 503 issue: 3 year: 1989 ident: 10.1016/j.inffus.2017.10.001_bib0082 article-title: On the limited memory BFGS method for large scale optimization publication-title: Math. Program. doi: 10.1007/BF01589116 – volume: 24 start-page: 1904 issue: 10 year: 2012 ident: 10.1016/j.inffus.2017.10.001_bib0064 article-title: Scalable and parallel boosting with mapreduce publication-title: IEEE Trans. Knowl. Data Eng. doi: 10.1109/TKDE.2011.208 – volume: 34 start-page: 1 issue: 1 year: 2014 ident: 10.1016/j.inffus.2017.10.001_bib0012 article-title: The current state of business intelligence in academia: the arrival of big data publication-title: Commun. Assoc. Inf. Syst. – ident: 10.1016/j.inffus.2017.10.001_bib0048 – volume: 117 start-page: 3 year: 2017 ident: 10.1016/j.inffus.2017.10.001_bib0069 article-title: Knn-is: an iterative spark-based design of the k-nearest neighbors classifier for big data. publication-title: Knowl. Based Syst. doi: 10.1016/j.knosys.2016.06.012 – volume: 27 start-page: 111 year: 2016 ident: 10.1016/j.inffus.2017.10.001_bib0074 article-title: Decision forest: twenty years of research publication-title: Inf. Fus. doi: 10.1016/j.inffus.2015.06.005 – volume: 2 start-page: 1 issue: 1 year: 2017 ident: 10.1016/j.inffus.2017.10.001_bib0061 article-title: A comparison on scalability for batch big data processing on apache spark and apache flink publication-title: Big Data Anal. doi: 10.1186/s41044-016-0020-2
SSID	ssj0017031
Score	2.5081642
Snippet	•An overview of the technologies for Big Data analytics is presented.•A taxonomy for the design of information and process fusion in Big Data is...
SourceID	crossref elsevier
SourceType	Enrichment Source Index Database Publisher
StartPage	51
SubjectTerms	Big Data Analytics Information fusion Machine learning MapReduce Spark
Title	Big Data: Tutorial and guidelines on information and process fusion for analytics algorithms with MapReduce
URI	https://dx.doi.org/10.1016/j.inffus.2017.10.001
Volume	42
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
journalDatabaseRights	– providerCode: PRVESC databaseName: Baden-Württemberg Complete Freedom Collection (Elsevier) customDbUrl: eissn: 1872-6305 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0017031 issn: 1566-2535 databaseCode: GBLVA dateStart: 20110101 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier – providerCode: PRVESC databaseName: Elsevier ScienceDirect Freedom Collection Journals customDbUrl: eissn: 1872-6305 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0017031 issn: 1566-2535 databaseCode: ACRLP dateStart: 20000701 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier – providerCode: PRVESC databaseName: Science Direct customDbUrl: eissn: 1872-6305 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0017031 issn: 1566-2535 databaseCode: .~1 dateStart: 20000701 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier – providerCode: PRVESC databaseName: ScienceDirect Journal Collection customDbUrl: eissn: 1872-6305 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0017031 issn: 1566-2535 databaseCode: AIKHN dateStart: 20000701 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier – providerCode: PRVLSH databaseName: Elsevier Journals customDbUrl: mediaType: online eissn: 1872-6305 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0017031 issn: 1566-2535 databaseCode: AKRWK dateStart: 20000701 isFulltext: true providerName: Library Specific Holdings
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LT8JAEN4QvOjB-Iz4IHvwWuj2sW29IUrwATEICbdmd7sgiqWRcvDib3emD6KJ0cRT09lHNrPTeTQz3xByrlz4mqVUBmPSMhzwSEEPCm5YTEXcdCaKR1g73Ovz7si5HbvjCmmXtTCYVlno_lynZ9q6oDQLbjaT2az5iJGH5douAyF1g6zTsON42MWg8bFO82CIz55hpnI4AMwuy-eyHC-4xMkKQbuZ18hyvNjP5umLyenskO3CV6St_Di7pKLjPbLVWwOtLvfJy-VsSq9EKi7oENEIQJqoiCM6XSF8Faa000VMC3RUvINsNMmrAyicCkkwCGQxf8c9qZhPYZ_06XVJ8Sct7YlkgPiu-oCMOtfDdtco-icYCgKB1ABPAQy8BidJBIEGVy7wtRkJm_tc2ZpxLQUztdCeO2GW9DXzJuAPBZK7AkHsuX1IqvEi1keEmgoCi8ALIrD_jo4s4cjIlwEit8jI1qpG7JJtoSrAxbHHxTwss8iew5zZITIbqcDsGjHWq5IcXOOP-V55I-E3IQlB__-68vjfK0_IJrz5eYbuKammbyt9Bn5IKuuZoNXJRqs9uH_A581dt_8JWSzf-w
linkProvider	Elsevier
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV07T8MwELZQGYAB8RTl6YE1bZyHk7BBoSrQdIBW6hbZjlsKJY1oOrDw27nLowIJgcR6fsg6X-6-i-4-E3KuXPiapVQGY9IyHECk4AcFNyymYm46I8Vj7B0Oe7wzcO6G7nCFtKpeGCyrLH1_4dNzb11KmqU2m-lk0nzEzMNybZeBkboBvjS86riWhxlY42NZ58GQoD0nTeVwAphe9c_lRV5wi6MFsnYzr5EXebGf49OXmNPeIpslWKSXxXm2yYpOdshGuGRane-Sl6vJmF6LTFzQPtIRgDlRkcR0vED-Kqxpp7OElvSoeAn5aFq0B1A4FYpgEMRi-o57UjEdwz7Z0-uc4l9aGor0AQle9R4ZtG_6rY5RPqBgKMgEMgOgAkR4DShJBIEGLBf42oyFzX2ubM24loKZWmjPHTFL-pp5IwBEgeSuQBZ7bu-TWjJL9AGhpoLMIvCCGACAo2NLODL2ZYDULTK2taoTu1JbpEp2cXzkYhpVZWTPUaHsCJWNUlB2nRjLVWnBrvHHfK-6keiblUQQAH5defjvlWdkrdMPu1H3tnd_RNZhxC_KdY9JLXtb6BMAJZk8zY3uE6FA3_s
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Big+Data%3A+Tutorial+and+guidelines+on+information+and+process+fusion+for+analytics+algorithms+with+MapReduce&rft.jtitle=Information+fusion&rft.au=Ram%C3%ADrez-Gallego%2C+Sergio&rft.au=Fern%C3%A1ndez%2C+Alberto&rft.au=Garc%C3%ADa%2C+Salvador&rft.au=Chen%2C+Min&rft.date=2018-07-01&rft.pub=Elsevier+B.V&rft.issn=1566-2535&rft.eissn=1872-6305&rft.volume=42&rft.spage=51&rft.epage=61&rft_id=info:doi/10.1016%2Fj.inffus.2017.10.001&rft.externalDocID=S1566253517305912
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1566-2535&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1566-2535&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1566-2535&client=summon