A new approach for instance selection: Algorithms, evaluation, and comparisons

•We design two new algorithms using global density, relevant, irrelevant functions.•We develop a toolkit and its GUI, management and validation capabilities.•We evaluate and test the performance of our algorithms in terms of four metrics.•The experimental results prove our algorithms outperform dens...

Full description

Saved in:

Bibliographic Details
Published in	Expert systems with applications Vol. 149; p. 113297
Main Authors	Malhat, Mohamed, Menshawy, Mohamed El, Mousa, Hamdy, Sisi, Ashraf El
Format	Journal Article
Language	English
Published	New York Elsevier Ltd 01.07.2020 Elsevier BV
Subjects	Accuracy Algorithms Big Data Classification Complexity Computing time Data mining Datasets Global density function Instance selection Polynomials Reduction Standard data Time complexity Toolkits Instance selection Big data Global density function Data mining Time complexity
Online Access	Get full text
ISSN	0957-4174 1873-6793
DOI	10.1016/j.eswa.2020.113297

Cover

Abstract	•We design two new algorithms using global density, relevant, irrelevant functions.•We develop a toolkit and its GUI, management and validation capabilities.•We evaluate and test the performance of our algorithms in terms of four metrics.•The experimental results prove our algorithms outperform density-based approaches.•We test the scalability and compute the polynomial-time complexity of algorithms. Several approaches for instance selection have been put forward as a primary step to increase the efficiency and accuracy of algorithms applied to mine big data. The instance selection task scales indeed big data down by removing irrelevant, redundant, and unreliable data, which, in turn, reduces the computational resources necessary for completing the mining task. The local density-based approaches are recently acknowledged as feasible approaches in terms of reduction rate, effectiveness, and computation time metrics. However, these approaches endure low classification accuracy results compared with other approaches. In this manuscript, we propose a new layered and operational approach to address these limitations as well as advance the state-of-the-art by balancing among classification accuracy, reduction rate, and time complexity. We commence by designing a new algorithm (called GDIS) that selects most relevant instances using a global density and relevance functions. This enable us to consider a global view overall a data set to get a better classification accuracy results than current density-based approaches. We design another novel algorithm (called EGDIS), which maintains the effectiveness results of the GDIS algorithm while improving reduction rate results. Moreover, we compare our algorithms against three state-of-the-art algorithms to validate their performance. We develop a Java toolkit called ISTK on the top of the GDIS and EGDIS algorithms, the density-based approaches, and the state-of-the-art algorithms. We also develop a suitable user interface and its management and validation capabilities to ease-of-use and visualize results and data sets. We evaluate and test the performance of our algorithms in terms of four metrics (reduction rate, classification accuracy, effectiveness, and computation time) using twenty-four standard data sets and conduct an intensive set of experiments. The experimental results proved that the GDIS algorithm outperforms the density-based approaches in terms of classification accuracy and effectiveness, the EGDIS algorithm outperforms the density-based approaches in terms of reduction rate and effectiveness, and the GDIS and EGDIS algorithms outperform the state-of-the-art algorithms in terms of achieving a good results in both the effectiveness and computation time metrics. We finally test the scalability and compute experimentally the polynomial-time complexity of our algorithms.
AbstractList	Several approaches for instance selection have been put forward as a primary step to increase the efficiency and accuracy of algorithms applied to mine big data. The instance selection task scales indeed big data down by removing irrelevant, redundant, and unreliable data, which, in turn, reduces the computational resources necessary for completing the mining task. The local density-based approaches are recently acknowledged as feasible approaches in terms of reduction rate, effectiveness, and computation time metrics. However, these approaches endure low classification accuracy results compared with other approaches. In this manuscript, we propose a new layered and operational approach to address these limitations as well as advance the state-of-the-art by balancing among classification accuracy, reduction rate, and time complexity. We commence by designing a new algorithm (called GDIS) that selects most relevant instances using a global density and relevance functions. This enable us to consider a global view overall a data set to get a better classification accuracy results than current density-based approaches. We design another novel algorithm (called EGDIS), which maintains the effectiveness results of the GDIS algorithm while improving reduction rate results. Moreover, we compare our algorithms against three state-of-the-art algorithms to validate their performance. We develop a Java toolkit called ISTK on the top of the GDIS and EGDIS algorithms, the density-based approaches, and the state-of-the-art algorithms. We also develop a suitable user interface and its management and validation capabilities to ease-of-use and visualize results and data sets. We evaluate and test the performance of our algorithms in terms of four metrics (reduction rate, classification accuracy, effectiveness, and computation time) using twenty-four standard data sets and conduct an intensive set of experiments. The experimental results proved that the GDIS algorithm outperforms the density-based approaches in terms of classification accuracy and effectiveness, the EGDIS algorithm outperforms the density-based approaches in terms of reduction rate and effectiveness, and the GDIS and EGDIS algorithms outperform the state-of-the-art algorithms in terms of achieving a good results in both the effectiveness and computation time metrics. We finally test the scalability and compute experimentally the polynomial-time complexity of our algorithms. •We design two new algorithms using global density, relevant, irrelevant functions.•We develop a toolkit and its GUI, management and validation capabilities.•We evaluate and test the performance of our algorithms in terms of four metrics.•The experimental results prove our algorithms outperform density-based approaches.•We test the scalability and compute the polynomial-time complexity of algorithms. Several approaches for instance selection have been put forward as a primary step to increase the efficiency and accuracy of algorithms applied to mine big data. The instance selection task scales indeed big data down by removing irrelevant, redundant, and unreliable data, which, in turn, reduces the computational resources necessary for completing the mining task. The local density-based approaches are recently acknowledged as feasible approaches in terms of reduction rate, effectiveness, and computation time metrics. However, these approaches endure low classification accuracy results compared with other approaches. In this manuscript, we propose a new layered and operational approach to address these limitations as well as advance the state-of-the-art by balancing among classification accuracy, reduction rate, and time complexity. We commence by designing a new algorithm (called GDIS) that selects most relevant instances using a global density and relevance functions. This enable us to consider a global view overall a data set to get a better classification accuracy results than current density-based approaches. We design another novel algorithm (called EGDIS), which maintains the effectiveness results of the GDIS algorithm while improving reduction rate results. Moreover, we compare our algorithms against three state-of-the-art algorithms to validate their performance. We develop a Java toolkit called ISTK on the top of the GDIS and EGDIS algorithms, the density-based approaches, and the state-of-the-art algorithms. We also develop a suitable user interface and its management and validation capabilities to ease-of-use and visualize results and data sets. We evaluate and test the performance of our algorithms in terms of four metrics (reduction rate, classification accuracy, effectiveness, and computation time) using twenty-four standard data sets and conduct an intensive set of experiments. The experimental results proved that the GDIS algorithm outperforms the density-based approaches in terms of classification accuracy and effectiveness, the EGDIS algorithm outperforms the density-based approaches in terms of reduction rate and effectiveness, and the GDIS and EGDIS algorithms outperform the state-of-the-art algorithms in terms of achieving a good results in both the effectiveness and computation time metrics. We finally test the scalability and compute experimentally the polynomial-time complexity of our algorithms.
ArticleNumber	113297
Author	Malhat, Mohamed Sisi, Ashraf El Menshawy, Mohamed El Mousa, Hamdy
Author_xml	– sequence: 1 givenname: Mohamed surname: Malhat fullname: Malhat, Mohamed email: m.gmalhat@yahoo.com – sequence: 2 givenname: Mohamed El surname: Menshawy fullname: Menshawy, Mohamed El email: mohamed.elmenshawy@ci.menofia.edu.eg – sequence: 3 givenname: Hamdy surname: Mousa fullname: Mousa, Hamdy email: hamdimmm@hotmail.com – sequence: 4 givenname: Ashraf El surname: Sisi fullname: Sisi, Ashraf El email: ashrafelsisi@hotmail.com
BookMark	eNp9kEtPAjEQgBuDiYD-AU9NvLLY19Jd44UQXwnRi56b2W5XullabBeI_94injwwl0km883jG6GB884gdE3JlBI6u22nJu5hyghLBcpZKc_QkBaSZzNZ8gEakjKXmaBSXKBRjC0hVBIih-h1jp3ZY9hsgge9wo0P2LrYg9MGR9MZ3Vvv7vC8-_TB9qt1nGCzg24Lh_oEg6ux9usNBBu9i5fovIEumqu_PEYfjw_vi-ds-fb0spgvM81Z0WfAhdFFiqbIhQSeawp5Vc24loLVWghRl7oEqGleyLzJadUIamhdgCBM1xUfo5vj3HT219bEXrV-G1xaqZgQlCcHOUtdxbFLBx9jMI3Stv89vA9gO0WJOthTrTrYUwd76mgvoewfugl2DeH7NHR_hEx6fWdNUFFbk0zWNiSRqvb2FP4DrnGK7Q
CitedBy_id	crossref_primary_10_1016_j_ijar_2023_108964 crossref_primary_10_3390_app14031224 crossref_primary_10_1109_TBDATA_2023_3342643 crossref_primary_10_3233_JIFS_235290 crossref_primary_10_1016_j_engappai_2023_107839 crossref_primary_10_1007_s10462_024_10971_4 crossref_primary_10_1016_j_cola_2024_101301 crossref_primary_10_1016_j_asoc_2021_107938 crossref_primary_10_1109_TFUZZ_2022_3216990 crossref_primary_10_1016_j_ins_2021_07_015 crossref_primary_10_1016_j_neunet_2023_07_018 crossref_primary_10_1145_3705000 crossref_primary_10_1186_s40537_022_00640_0 crossref_primary_10_1007_s11227_023_05771_6 crossref_primary_10_1016_j_ins_2022_04_036 crossref_primary_10_1016_j_eswa_2023_119536 crossref_primary_10_1142_S021812662450124X crossref_primary_10_1007_s43674_022_00033_z crossref_primary_10_1016_j_ijar_2021_08_006 crossref_primary_10_1016_j_engappai_2024_108080 crossref_primary_10_1108_EL_07_2020_0209 crossref_primary_10_1145_3582000 crossref_primary_10_1038_s41598_022_23036_9
Cites_doi	10.1023/A:1016304305535 10.1109/TIT.1967.1053964 10.1016/j.patcog.2008.02.006 10.1007/s10618-008-0121-2 10.1016/j.datak.2015.11.002 10.1023/A:1014043630878 10.1016/j.is.2014.07.006 10.1023/A:1014047731786 10.1007/s11036-013-0489-0 10.1016/j.knosys.2016.10.031 10.1016/j.is.2006.09.002 10.1109/TIT.1968.1054155 10.1080/713827180 10.1016/j.artint.2010.01.001 10.1007/s10462-010-9165-y 10.1007/s10009-014-0315-4 10.1023/A:1021564703268 10.1016/j.patcog.2014.10.001 10.1007/s41019-016-0022-0 10.1007/s10009-015-0399-5 10.1023/A:1007626913721 10.1016/j.eswa.2012.01.131 10.1109/TSMC.1972.4309137 10.1007/s10009-015-0398-6 10.1007/s13748-017-0117-5
ContentType	Journal Article
Copyright	2020 Elsevier Ltd Copyright Elsevier BV Jul 1, 2020
Copyright_xml	– notice: 2020 Elsevier Ltd – notice: Copyright Elsevier BV Jul 1, 2020
DBID	AAYXX CITATION 7SC 8FD JQ2 L7M L~C L~D
DOI	10.1016/j.eswa.2020.113297
DatabaseName	CrossRef Computer and Information Systems Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional
DatabaseTitle	CrossRef Computer and Information Systems Abstracts Technology Research Database Computer and Information Systems Abstracts – Academic Advanced Technologies Database with Aerospace ProQuest Computer Science Collection Computer and Information Systems Abstracts Professional
DatabaseTitleList	Computer and Information Systems Abstracts
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISSN	1873-6793
ExternalDocumentID	10_1016_j_eswa_2020_113297 S0957417420301226
GroupedDBID	--K --M .DC .~1 0R~ 13V 1B1 1RT 1~. 1~5 4.4 457 4G. 5GY 5VS 7-5 71M 8P~ 9JN 9JO AAAKF AABNK AACTN AAEDT AAEDW AAIAV AAIKJ AAKOC AALRI AAOAW AAQFI AARIN AAXUO AAYFN ABBOA ABFNM ABMAC ABMVD ABUCO ABYKQ ACDAQ ACGFS ACHRH ACNTT ACRLP ACZNC ADBBV ADEZE ADTZH AEBSH AECPX AEKER AENEX AFKWA AFTJW AGHFR AGJBL AGUBO AGUMN AGYEJ AHHHB AHJVU AHZHX AIALX AIEXJ AIKHN AITUG AJOXV ALEQD ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ AOUOD APLSM AXJTR BJAXD BKOJK BLXMC BNSAS CS3 DU5 EBS EFJIC EFLBG EO8 EO9 EP2 EP3 F5P FDB FIRID FNPLU FYGXN G-Q GBLVA GBOLZ HAMUX IHE J1W JJJVA KOM LG9 LY1 LY7 M41 MO0 N9A O-L O9- OAUVE OZT P-8 P-9 P2P PC. PQQKQ Q38 ROL RPZ SDF SDG SDP SDS SES SPC SPCBC SSB SSD SSL SST SSV SSZ T5K TN5 ~G- 29G AAAKG AAQXK AATTM AAXKI AAYWO AAYXX ABJNI ABKBG ABUFD ABWVN ABXDB ACLOT ACNNM ACRPL ACVFH ADCNI ADJOM ADMUD ADNMO AEIPS AEUPX AFJKZ AFPUW AGQPQ AIGII AIIUN AKBMS AKRWK AKYEP ANKPU APXCP ASPBG AVWKF AZFZN CITATION EFKBS EJD FEDTE FGOYB G-2 HLZ HVGLF HZ~ R2- SBC SET SEW WUQ XPP ZMT ~HD 7SC 8FD AFXIZ AGCQF AGRNS BNPGV JQ2 L7M L~C L~D SSH
ID	FETCH-LOGICAL-c328t-a34ec8888f8547a35c1a5bb63c742dc444d9c9aad15875f51bf41e1d8a402cdb3
IEDL.DBID	.~1
ISSN	0957-4174
IngestDate	Fri Jul 25 06:45:05 EDT 2025 Thu Apr 24 22:52:27 EDT 2025 Sat Oct 25 05:06:46 EDT 2025 Fri Feb 23 02:49:58 EST 2024
IsPeerReviewed	true
IsScholarly	true
Keywords	Instance selection Big data Global density function Data mining Time complexity
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c328t-a34ec8888f8547a35c1a5bb63c742dc444d9c9aad15875f51bf41e1d8a402cdb3
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
PQID	2441311352
PQPubID	2045477
ParticipantIDs	proquest_journals_2441311352 crossref_citationtrail_10_1016_j_eswa_2020_113297 crossref_primary_10_1016_j_eswa_2020_113297 elsevier_sciencedirect_doi_10_1016_j_eswa_2020_113297
PublicationCentury	2000
PublicationDate	2020-07-01 2020-07-00 20200701
PublicationDateYYYYMMDD	2020-07-01
PublicationDate_xml	– month: 07 year: 2020 text: 2020-07-01 day: 01
PublicationDecade	2020
PublicationPlace	New York
PublicationPlace_xml	– name: New York
PublicationTitle	Expert systems with applications
PublicationYear	2020
Publisher	Elsevier Ltd Elsevier BV
Publisher_xml	– name: Elsevier Ltd – name: Elsevier BV
References	Hart (bib0015) 1968; 14 Liu, Motoda (bib0021) 1998 Wilson, Martinez (bib0031) 2000; 38 Wilson (bib0030) 1972; SMC-2 García, Luengo, Herrera (bib0012) 2015 Hashem, Yaqoob, Anuar, Mokhtar, Gani, Khan (bib0016) 2015; 47 Gantz, Reinsel (bib0010) 2011 Team (bib0028) 2011 Liu, Hussain, Tan, Dash (bib0020) 2002; 6 Bai, Liang, Dang, Cao (bib0002) 2012; 39 de Haro-García, García-Pedrajas (bib0014) 2009; 18 Arnaiz-González, González-Rogel, Díez-Pastor, López-Nozal (bib0001) 2017; 6 Cover, Hart (bib0008) 1967; 13 Do, Rahm (bib0009) 2007; 32 Carbonera, Abel (bib0005) 2015 Olvera-López, Carrasco-Ochoa, Martínez-Trinidad, Kittler (bib0023) 2010; 34 García, Cano, Herrera (bib0011) 2008; 41 Chen, Mao, Liu (bib0007) 2014; 19 Carbonera, Abel (bib0006) 2016 Kim, Choi, Hong, Kim, Lee (bib0017) 2003; 7 Silva, Souza, Motta (bib0026) 2016; 101 Brighton, Mellish (bib0004) 2002; 6 Sinnott, Voorsluys (bib0027) 2016; 18 Liu, Motoda (bib0022) 2001 Ur Rehman, Liew, Abbas, Jayaraman, Wah, Khan (bib0024) 2016; 1 Liu, Wang, Wang, Lv, Konan (bib0019) 2017; 116 Zhang, Zhang, Yang (bib0032) 2003; 17 Bolt, Leoni, Aalst (bib0003) 2016; 18 García-Osorio, de Haro-García, García-Pedrajas (bib0013) 2010; 174 Turner, Lambert (bib0029) 2015; 17 Reinartz (bib0025) 2002; 6 Leyva, González, Pérez (bib0018) 2015; 48 Bolt (10.1016/j.eswa.2020.113297_bib0003) 2016; 18 Turner (10.1016/j.eswa.2020.113297_bib0029) 2015; 17 Hart (10.1016/j.eswa.2020.113297_bib0015) 1968; 14 Brighton (10.1016/j.eswa.2020.113297_bib0004) 2002; 6 Carbonera (10.1016/j.eswa.2020.113297_bib0005) 2015 Wilson (10.1016/j.eswa.2020.113297_bib0030) 1972; SMC-2 de Haro-García (10.1016/j.eswa.2020.113297_bib0014) 2009; 18 Sinnott (10.1016/j.eswa.2020.113297_bib0027) 2016; 18 Bai (10.1016/j.eswa.2020.113297_bib0002) 2012; 39 Chen (10.1016/j.eswa.2020.113297_bib0007) 2014; 19 Olvera-López (10.1016/j.eswa.2020.113297_bib0023) 2010; 34 Ur Rehman (10.1016/j.eswa.2020.113297_bib0024) 2016; 1 Liu (10.1016/j.eswa.2020.113297_bib0021) 1998 García (10.1016/j.eswa.2020.113297_bib0012) 2015 Cover (10.1016/j.eswa.2020.113297_bib0008) 1967; 13 Reinartz (10.1016/j.eswa.2020.113297_bib0025) 2002; 6 García-Osorio (10.1016/j.eswa.2020.113297_bib0013) 2010; 174 Hashem (10.1016/j.eswa.2020.113297_bib0016) 2015; 47 Silva (10.1016/j.eswa.2020.113297_bib0026) 2016; 101 Do (10.1016/j.eswa.2020.113297_bib0009) 2007; 32 García (10.1016/j.eswa.2020.113297_bib0011) 2008; 41 Arnaiz-González (10.1016/j.eswa.2020.113297_bib0001) 2017; 6 Liu (10.1016/j.eswa.2020.113297_bib0019) 2017; 116 Team (10.1016/j.eswa.2020.113297_bib0028) 2011 Zhang (10.1016/j.eswa.2020.113297_bib0032) 2003; 17 Gantz (10.1016/j.eswa.2020.113297_bib0010) 2011 Carbonera (10.1016/j.eswa.2020.113297_bib0006) 2016 Wilson (10.1016/j.eswa.2020.113297_bib0031) 2000; 38 Leyva (10.1016/j.eswa.2020.113297_bib0018) 2015; 48 Liu (10.1016/j.eswa.2020.113297_bib0020) 2002; 6 Kim (10.1016/j.eswa.2020.113297_bib0017) 2003; 7 Liu (10.1016/j.eswa.2020.113297_bib0022) 2001
References_xml	– volume: 19 start-page: 171 year: 2014 end-page: 209 ident: bib0007 article-title: Big data: A survey publication-title: Mobile Networks and Applications – volume: 6 start-page: 211 year: 2017 end-page: 219 ident: bib0001 article-title: Mr-dis: Democratic instance selection for big data by mapreduce publication-title: Progress in Artificial Intelligence – volume: 17 start-page: 375 year: 2003 end-page: 381 ident: bib0032 article-title: Data preparation for data mining publication-title: Applied Artificial Intelligence – year: 2016 ident: bib0006 article-title: A novel density-based approach for instance selection publication-title: IEEE 28th International conference on tools with artificial intelligence (ICTAI) – volume: 32 start-page: 857 year: 2007 end-page: 885 ident: bib0009 article-title: Matching large schemas: Approaches and evaluation publication-title: Information Systems – volume: 18 start-page: 607 year: 2016 end-page: 628 ident: bib0003 article-title: Scientific workflows for process mining: Building blocks, scenarios, and implementation publication-title: International Journal on Software Tools for Technology Transfer – volume: 6 start-page: 191 year: 2002 end-page: 210 ident: bib0025 article-title: A unifying view on instance selection publication-title: Data Mining and Knowledge Discovery – volume: 174 start-page: 410 year: 2010 end-page: 441 ident: bib0013 article-title: Democratic instance selection: A linear complexity instance selection algorithm based on classifier ensemble concepts publication-title: Artificial Intelligence – volume: 6 start-page: 393 year: 2002 end-page: 423 ident: bib0020 article-title: Discretization: An enabling technique publication-title: Data Mining and Knowledge Discovery – volume: SMC-2 start-page: 408 year: 1972 end-page: 421 ident: bib0030 article-title: Asymptotic properties of nearest neighbor rules using edited data publication-title: IEEE Transactions on Systems, Man, and Cybernetics – year: 2001 ident: bib0022 article-title: Instance selection and construction for data mining – volume: 18 start-page: 392 year: 2009 end-page: 418 ident: bib0014 article-title: A divide-and-conquer recursive approach for scaling up instance selection algorithms publication-title: Data Mining and Knowledge Discovery – year: 1998 ident: bib0021 article-title: Feature selection for knowledge discovery and data mining – volume: 6 start-page: 153 year: 2002 end-page: 172 ident: bib0004 article-title: Advances in instance selection for instance-based learning algorithms publication-title: Data Mining and Knowledge Discovery – volume: 38 start-page: 257 year: 2000 end-page: 286 ident: bib0031 article-title: Reduction techniques for instance-based learning algorithms publication-title: Machine Learning – volume: 41 start-page: 2693 year: 2008 end-page: 2709 ident: bib0011 article-title: A memetic algorithm for evolutionary prototype selection: A scaling up approach publication-title: Pattern Recognition – volume: 13 start-page: 21 year: 1967 end-page: 27 ident: bib0008 article-title: Nearest neighbor pattern classification publication-title: IEEE Transactions on Information Theory – year: 2011 ident: bib0028 article-title: Big data now: current perspectives from OReilly Radar publication-title: Technical Report – year: 2015 ident: bib0005 article-title: A density-based approach for instance selection publication-title: IEEE 27th International conference on tools with artificial intelligence (ICTAI) – volume: 39 start-page: 8022 year: 2012 end-page: 8029 ident: bib0002 article-title: A cluster centers initialization method for clustering categorical data publication-title: Expert Systems with Applications – volume: 48 start-page: 1523 year: 2015 end-page: 1537 ident: bib0018 article-title: Three new instance selection methods based on local sets: A comparative study with several approaches from a bi-objective perspective publication-title: Pattern Recognition – volume: 7 start-page: 81 year: 2003 end-page: 99 ident: bib0017 article-title: A taxonomy of dirty data publication-title: Data Mining and Knowledge Discovery – volume: 14 start-page: 515 year: 1968 end-page: 516 ident: bib0015 article-title: The condensed nearest neighbor rule publication-title: IEEE Transactions on Information Theory – volume: 34 start-page: 133 year: 2010 end-page: 143 ident: bib0023 article-title: A review of instance selection methods publication-title: Artificial Intelligence Review – volume: 47 start-page: 98 year: 2015 end-page: 115 ident: bib0016 article-title: The rise of big data on cloud computing: Review and open research issues publication-title: Information Systems – volume: 1 start-page: 265 year: 2016 end-page: 284 ident: bib0024 article-title: Big data reduction methods: A survey publication-title: Data Science and Engineering – year: 2015 ident: bib0012 article-title: Data preprocessing in data mining – volume: 17 start-page: 321 year: 2015 end-page: 338 ident: bib0029 article-title: Workflows for quantitative data analysis in the social sciences publication-title: International Journal on Software Tools for Technology Transfer – volume: 18 start-page: 587 year: 2016 end-page: 605 ident: bib0027 article-title: A scalable cloud-based system for data-intensive spatial analysis publication-title: International Journal on Software Tools for Technology Transfer – year: 2011 ident: bib0010 article-title: Extracting value from chaos publication-title: Technical Report – volume: 116 start-page: 58 year: 2017 end-page: 73 ident: bib0019 article-title: An efficient instance selection algorithm to reconstruct training set for support vector machine publication-title: Knowledge-Based Systems – volume: 101 start-page: 24 year: 2016 end-page: 41 ident: bib0026 article-title: An instance selection method for large datasets based on Markov geometric diffusion publication-title: Data & Knowledge Engineering – volume: 6 start-page: 393 issue: 4 year: 2002 ident: 10.1016/j.eswa.2020.113297_bib0020 article-title: Discretization: An enabling technique publication-title: Data Mining and Knowledge Discovery doi: 10.1023/A:1016304305535 – year: 1998 ident: 10.1016/j.eswa.2020.113297_bib0021 – volume: 13 start-page: 21 issue: 1 year: 1967 ident: 10.1016/j.eswa.2020.113297_bib0008 article-title: Nearest neighbor pattern classification publication-title: IEEE Transactions on Information Theory doi: 10.1109/TIT.1967.1053964 – volume: 41 start-page: 2693 issue: 8 year: 2008 ident: 10.1016/j.eswa.2020.113297_bib0011 article-title: A memetic algorithm for evolutionary prototype selection: A scaling up approach publication-title: Pattern Recognition doi: 10.1016/j.patcog.2008.02.006 – volume: 18 start-page: 392 issue: 3 year: 2009 ident: 10.1016/j.eswa.2020.113297_bib0014 article-title: A divide-and-conquer recursive approach for scaling up instance selection algorithms publication-title: Data Mining and Knowledge Discovery doi: 10.1007/s10618-008-0121-2 – volume: 101 start-page: 24 year: 2016 ident: 10.1016/j.eswa.2020.113297_bib0026 article-title: An instance selection method for large datasets based on Markov geometric diffusion publication-title: Data & Knowledge Engineering doi: 10.1016/j.datak.2015.11.002 – volume: 6 start-page: 153 issue: 2 year: 2002 ident: 10.1016/j.eswa.2020.113297_bib0004 article-title: Advances in instance selection for instance-based learning algorithms publication-title: Data Mining and Knowledge Discovery doi: 10.1023/A:1014043630878 – year: 2011 ident: 10.1016/j.eswa.2020.113297_bib0028 article-title: Big data now: current perspectives from OReilly Radar – volume: 47 start-page: 98 year: 2015 ident: 10.1016/j.eswa.2020.113297_bib0016 article-title: The rise of big data on cloud computing: Review and open research issues publication-title: Information Systems doi: 10.1016/j.is.2014.07.006 – volume: 6 start-page: 191 issue: 2 year: 2002 ident: 10.1016/j.eswa.2020.113297_bib0025 article-title: A unifying view on instance selection publication-title: Data Mining and Knowledge Discovery doi: 10.1023/A:1014047731786 – volume: 19 start-page: 171 issue: 2 year: 2014 ident: 10.1016/j.eswa.2020.113297_bib0007 article-title: Big data: A survey publication-title: Mobile Networks and Applications doi: 10.1007/s11036-013-0489-0 – volume: 116 start-page: 58 issue: Supplement C year: 2017 ident: 10.1016/j.eswa.2020.113297_bib0019 article-title: An efficient instance selection algorithm to reconstruct training set for support vector machine publication-title: Knowledge-Based Systems doi: 10.1016/j.knosys.2016.10.031 – volume: 32 start-page: 857 issue: 6 year: 2007 ident: 10.1016/j.eswa.2020.113297_bib0009 article-title: Matching large schemas: Approaches and evaluation publication-title: Information Systems doi: 10.1016/j.is.2006.09.002 – volume: 14 start-page: 515 issue: 3 year: 1968 ident: 10.1016/j.eswa.2020.113297_bib0015 article-title: The condensed nearest neighbor rule publication-title: IEEE Transactions on Information Theory doi: 10.1109/TIT.1968.1054155 – volume: 17 start-page: 375 issue: 5-6 year: 2003 ident: 10.1016/j.eswa.2020.113297_bib0032 article-title: Data preparation for data mining publication-title: Applied Artificial Intelligence doi: 10.1080/713827180 – year: 2015 ident: 10.1016/j.eswa.2020.113297_bib0005 article-title: A density-based approach for instance selection – volume: 174 start-page: 410 issue: 5 year: 2010 ident: 10.1016/j.eswa.2020.113297_bib0013 article-title: Democratic instance selection: A linear complexity instance selection algorithm based on classifier ensemble concepts publication-title: Artificial Intelligence doi: 10.1016/j.artint.2010.01.001 – volume: 34 start-page: 133 issue: 2 year: 2010 ident: 10.1016/j.eswa.2020.113297_bib0023 article-title: A review of instance selection methods publication-title: Artificial Intelligence Review doi: 10.1007/s10462-010-9165-y – volume: 17 start-page: 321 issue: 3 year: 2015 ident: 10.1016/j.eswa.2020.113297_bib0029 article-title: Workflows for quantitative data analysis in the social sciences publication-title: International Journal on Software Tools for Technology Transfer doi: 10.1007/s10009-014-0315-4 – volume: 7 start-page: 81 issue: 1 year: 2003 ident: 10.1016/j.eswa.2020.113297_bib0017 article-title: A taxonomy of dirty data publication-title: Data Mining and Knowledge Discovery doi: 10.1023/A:1021564703268 – volume: 48 start-page: 1523 issue: 4 year: 2015 ident: 10.1016/j.eswa.2020.113297_bib0018 article-title: Three new instance selection methods based on local sets: A comparative study with several approaches from a bi-objective perspective publication-title: Pattern Recognition doi: 10.1016/j.patcog.2014.10.001 – year: 2016 ident: 10.1016/j.eswa.2020.113297_bib0006 article-title: A novel density-based approach for instance selection – volume: 1 start-page: 265 issue: 4 year: 2016 ident: 10.1016/j.eswa.2020.113297_bib0024 article-title: Big data reduction methods: A survey publication-title: Data Science and Engineering doi: 10.1007/s41019-016-0022-0 – volume: 18 start-page: 607 issue: 6 year: 2016 ident: 10.1016/j.eswa.2020.113297_bib0003 article-title: Scientific workflows for process mining: Building blocks, scenarios, and implementation publication-title: International Journal on Software Tools for Technology Transfer doi: 10.1007/s10009-015-0399-5 – volume: 38 start-page: 257 issue: 3 year: 2000 ident: 10.1016/j.eswa.2020.113297_bib0031 article-title: Reduction techniques for instance-based learning algorithms publication-title: Machine Learning doi: 10.1023/A:1007626913721 – year: 2011 ident: 10.1016/j.eswa.2020.113297_bib0010 article-title: Extracting value from chaos – volume: 39 start-page: 8022 issue: 9 year: 2012 ident: 10.1016/j.eswa.2020.113297_bib0002 article-title: A cluster centers initialization method for clustering categorical data publication-title: Expert Systems with Applications doi: 10.1016/j.eswa.2012.01.131 – volume: SMC-2 start-page: 408 issue: 3 year: 1972 ident: 10.1016/j.eswa.2020.113297_bib0030 article-title: Asymptotic properties of nearest neighbor rules using edited data publication-title: IEEE Transactions on Systems, Man, and Cybernetics doi: 10.1109/TSMC.1972.4309137 – volume: 18 start-page: 587 issue: 6 year: 2016 ident: 10.1016/j.eswa.2020.113297_bib0027 article-title: A scalable cloud-based system for data-intensive spatial analysis publication-title: International Journal on Software Tools for Technology Transfer doi: 10.1007/s10009-015-0398-6 – volume: 6 start-page: 211 issue: 3 year: 2017 ident: 10.1016/j.eswa.2020.113297_bib0001 article-title: Mr-dis: Democratic instance selection for big data by mapreduce publication-title: Progress in Artificial Intelligence doi: 10.1007/s13748-017-0117-5 – year: 2001 ident: 10.1016/j.eswa.2020.113297_bib0022 – year: 2015 ident: 10.1016/j.eswa.2020.113297_bib0012
SSID	ssj0017007
Score	2.4472737
Snippet	•We design two new algorithms using global density, relevant, irrelevant functions.•We develop a toolkit and its GUI, management and validation... Several approaches for instance selection have been put forward as a primary step to increase the efficiency and accuracy of algorithms applied to mine big...
SourceID	proquest crossref elsevier
SourceType	Aggregation Database Enrichment Source Index Database Publisher
StartPage	113297
SubjectTerms	Accuracy Algorithms Big Data Classification Complexity Computing time Data mining Datasets Global density function Instance selection Polynomials Reduction Standard data Time complexity Toolkits
Title	A new approach for instance selection: Algorithms, evaluation, and comparisons
URI	https://dx.doi.org/10.1016/j.eswa.2020.113297 https://www.proquest.com/docview/2441311352
Volume	149
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
journalDatabaseRights	– providerCode: PRVESC databaseName: Baden-Württemberg Complete Freedom Collection (Elsevier) customDbUrl: eissn: 1873-6793 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0017007 issn: 0957-4174 databaseCode: GBLVA dateStart: 20110101 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier – providerCode: PRVESC databaseName: Elsevier SD Complete Freedom Collection [SCCMFC] customDbUrl: eissn: 1873-6793 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0017007 issn: 0957-4174 databaseCode: ACRLP dateStart: 19950101 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier – providerCode: PRVESC databaseName: Elsevier SD Freedom Collection customDbUrl: eissn: 1873-6793 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0017007 issn: 0957-4174 databaseCode: .~1 dateStart: 19950101 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier – providerCode: PRVESC databaseName: ScienceDirect Freedom Collection Journals customDbUrl: eissn: 1873-6793 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0017007 issn: 0957-4174 databaseCode: AIKHN dateStart: 19950101 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier – providerCode: PRVLSH databaseName: Elsevier Journals customDbUrl: mediaType: online eissn: 1873-6793 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0017007 issn: 0957-4174 databaseCode: AKRWK dateStart: 19900101 isFulltext: true providerName: Library Specific Holdings
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV07T8MwELaqsrDwRjxK5YGNhuLYebFVFVUB0QUqdbP8CgSVtGqK2PjtnBOnCIQ6MMaxo-hs330n3X0fQuch0yTUOvU0SVKPpZp6saS-xxQzkYH4ohLb4PwwCodjdjcJJg3Ur3thbFml8_2VTy-9tRvpOmt251nWfQRwAOEQUjuL6gFF2A52FlkVg8vPVZmHpZ-LKr69yLOzXeNMVeNlig_LPeSX0ia-JX76Ozj9ctNl7BnsoC0HGnGv-q9d1DD5HtquBRmwu5_7aNTDgJJxzROOAZDirMR_yuCiVLyBbbjGvenzbJEtX96KDv6m--5gkWusVsKExQEaD26e-kPPCSZ4ivrx0hOUGQUpbZzGAYsEDRQRgZQhVWAlrRhjOlGJEJoEkKakAZEpI4boWEAWqbSkh6iZz3JzhDAxoZQ0vqICXgZSxaGAzElqqYxMBEuPEaktxZVjE7eiFlNel429cmtdbq3LK-seo4vVmnnFpbF2dlBvAP9xIjg4-7XrWvVucXcfCw4gxvIKAdo8-ednT9GmfaoqdVuouVy8mzPAI0vZLg9cG230bu-Hoy_BTN-f
linkProvider	Elsevier
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3JTsMwELWgHODCjiirD9wgFMd2Fm4VoipLe6GVuFneAkGloKaIG9_OOHGKQIgD13hRNGPPvCeN3yB0FDFDImOywJA0C1hmaJAoGgZMMxtbyC86dQ-ce_2oO2TX9_x-Dl3Ub2FcWaWP_VVML6O1_9Ly1my95nnrDsABpEOgdg7VA4qYRwuMh7FjYKcfszoPpz8XV4J7ceCm-5czVZGXLd6d-FBY9jYJnfLT79npR5wuk09nFS171Ijb1Y-toTk7XkcrdUcG7C_oBuq3McBkXAuFY0CkOC8BoLa4KFvegB_OcXv08DLJp4_PxQn-0vs-wXJssJ51Jiw20bBzObjoBr5jQqBpmEwDSZnVwGmTLOEslpRrIrlSEdVgJqMZYybVqZSGcOApGScqY8QSk0igkdoouoUa45ex3UaY2EgpmpxRCYNc6SSSQJ2UUdqqVLKsiUhtKaG9nLjrajESdd3Yk3DWFc66orJuEx3P1rxWYhp_zua1A8S3IyEg2v-5bq_2lvAXshCAYpywEMDNnX9ue4gWu4Perbi96t_soiU3UpXt7qHGdPJm9wGcTNVBefg-AQlr4TQ
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+new+approach+for+instance+selection%3A+Algorithms%2C+evaluation%2C+and+comparisons&rft.jtitle=Expert+systems+with+applications&rft.au=Malhat%2C+Mohamed&rft.au=Menshawy%2C+Mohamed+El&rft.au=Mousa%2C+Hamdy&rft.au=Sisi%2C+Ashraf+El&rft.date=2020-07-01&rft.issn=0957-4174&rft.volume=149&rft.spage=113297&rft_id=info:doi/10.1016%2Fj.eswa.2020.113297&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_eswa_2020_113297
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0957-4174&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0957-4174&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0957-4174&client=summon