A new approach for instance selection: Algorithms, evaluation, and comparisons

•We design two new algorithms using global density, relevant, irrelevant functions.•We develop a toolkit and its GUI, management and validation capabilities.•We evaluate and test the performance of our algorithms in terms of four metrics.•The experimental results prove our algorithms outperform dens...

Full description

Saved in:
Bibliographic Details
Published inExpert systems with applications Vol. 149; p. 113297
Main Authors Malhat, Mohamed, Menshawy, Mohamed El, Mousa, Hamdy, Sisi, Ashraf El
Format Journal Article
LanguageEnglish
Published New York Elsevier Ltd 01.07.2020
Elsevier BV
Subjects
Online AccessGet full text
ISSN0957-4174
1873-6793
DOI10.1016/j.eswa.2020.113297

Cover

Abstract •We design two new algorithms using global density, relevant, irrelevant functions.•We develop a toolkit and its GUI, management and validation capabilities.•We evaluate and test the performance of our algorithms in terms of four metrics.•The experimental results prove our algorithms outperform density-based approaches.•We test the scalability and compute the polynomial-time complexity of algorithms. Several approaches for instance selection have been put forward as a primary step to increase the efficiency and accuracy of algorithms applied to mine big data. The instance selection task scales indeed big data down by removing irrelevant, redundant, and unreliable data, which, in turn, reduces the computational resources necessary for completing the mining task. The local density-based approaches are recently acknowledged as feasible approaches in terms of reduction rate, effectiveness, and computation time metrics. However, these approaches endure low classification accuracy results compared with other approaches. In this manuscript, we propose a new layered and operational approach to address these limitations as well as advance the state-of-the-art by balancing among classification accuracy, reduction rate, and time complexity. We commence by designing a new algorithm (called GDIS) that selects most relevant instances using a global density and relevance functions. This enable us to consider a global view overall a data set to get a better classification accuracy results than current density-based approaches. We design another novel algorithm (called EGDIS), which maintains the effectiveness results of the GDIS algorithm while improving reduction rate results. Moreover, we compare our algorithms against three state-of-the-art algorithms to validate their performance. We develop a Java toolkit called ISTK on the top of the GDIS and EGDIS algorithms, the density-based approaches, and the state-of-the-art algorithms. We also develop a suitable user interface and its management and validation capabilities to ease-of-use and visualize results and data sets. We evaluate and test the performance of our algorithms in terms of four metrics (reduction rate, classification accuracy, effectiveness, and computation time) using twenty-four standard data sets and conduct an intensive set of experiments. The experimental results proved that the GDIS algorithm outperforms the density-based approaches in terms of classification accuracy and effectiveness, the EGDIS algorithm outperforms the density-based approaches in terms of reduction rate and effectiveness, and the GDIS and EGDIS algorithms outperform the state-of-the-art algorithms in terms of achieving a good results in both the effectiveness and computation time metrics. We finally test the scalability and compute experimentally the polynomial-time complexity of our algorithms.
AbstractList Several approaches for instance selection have been put forward as a primary step to increase the efficiency and accuracy of algorithms applied to mine big data. The instance selection task scales indeed big data down by removing irrelevant, redundant, and unreliable data, which, in turn, reduces the computational resources necessary for completing the mining task. The local density-based approaches are recently acknowledged as feasible approaches in terms of reduction rate, effectiveness, and computation time metrics. However, these approaches endure low classification accuracy results compared with other approaches. In this manuscript, we propose a new layered and operational approach to address these limitations as well as advance the state-of-the-art by balancing among classification accuracy, reduction rate, and time complexity. We commence by designing a new algorithm (called GDIS) that selects most relevant instances using a global density and relevance functions. This enable us to consider a global view overall a data set to get a better classification accuracy results than current density-based approaches. We design another novel algorithm (called EGDIS), which maintains the effectiveness results of the GDIS algorithm while improving reduction rate results. Moreover, we compare our algorithms against three state-of-the-art algorithms to validate their performance. We develop a Java toolkit called ISTK on the top of the GDIS and EGDIS algorithms, the density-based approaches, and the state-of-the-art algorithms. We also develop a suitable user interface and its management and validation capabilities to ease-of-use and visualize results and data sets. We evaluate and test the performance of our algorithms in terms of four metrics (reduction rate, classification accuracy, effectiveness, and computation time) using twenty-four standard data sets and conduct an intensive set of experiments. The experimental results proved that the GDIS algorithm outperforms the density-based approaches in terms of classification accuracy and effectiveness, the EGDIS algorithm outperforms the density-based approaches in terms of reduction rate and effectiveness, and the GDIS and EGDIS algorithms outperform the state-of-the-art algorithms in terms of achieving a good results in both the effectiveness and computation time metrics. We finally test the scalability and compute experimentally the polynomial-time complexity of our algorithms.
•We design two new algorithms using global density, relevant, irrelevant functions.•We develop a toolkit and its GUI, management and validation capabilities.•We evaluate and test the performance of our algorithms in terms of four metrics.•The experimental results prove our algorithms outperform density-based approaches.•We test the scalability and compute the polynomial-time complexity of algorithms. Several approaches for instance selection have been put forward as a primary step to increase the efficiency and accuracy of algorithms applied to mine big data. The instance selection task scales indeed big data down by removing irrelevant, redundant, and unreliable data, which, in turn, reduces the computational resources necessary for completing the mining task. The local density-based approaches are recently acknowledged as feasible approaches in terms of reduction rate, effectiveness, and computation time metrics. However, these approaches endure low classification accuracy results compared with other approaches. In this manuscript, we propose a new layered and operational approach to address these limitations as well as advance the state-of-the-art by balancing among classification accuracy, reduction rate, and time complexity. We commence by designing a new algorithm (called GDIS) that selects most relevant instances using a global density and relevance functions. This enable us to consider a global view overall a data set to get a better classification accuracy results than current density-based approaches. We design another novel algorithm (called EGDIS), which maintains the effectiveness results of the GDIS algorithm while improving reduction rate results. Moreover, we compare our algorithms against three state-of-the-art algorithms to validate their performance. We develop a Java toolkit called ISTK on the top of the GDIS and EGDIS algorithms, the density-based approaches, and the state-of-the-art algorithms. We also develop a suitable user interface and its management and validation capabilities to ease-of-use and visualize results and data sets. We evaluate and test the performance of our algorithms in terms of four metrics (reduction rate, classification accuracy, effectiveness, and computation time) using twenty-four standard data sets and conduct an intensive set of experiments. The experimental results proved that the GDIS algorithm outperforms the density-based approaches in terms of classification accuracy and effectiveness, the EGDIS algorithm outperforms the density-based approaches in terms of reduction rate and effectiveness, and the GDIS and EGDIS algorithms outperform the state-of-the-art algorithms in terms of achieving a good results in both the effectiveness and computation time metrics. We finally test the scalability and compute experimentally the polynomial-time complexity of our algorithms.
ArticleNumber 113297
Author Malhat, Mohamed
Sisi, Ashraf El
Menshawy, Mohamed El
Mousa, Hamdy
Author_xml – sequence: 1
  givenname: Mohamed
  surname: Malhat
  fullname: Malhat, Mohamed
  email: m.gmalhat@yahoo.com
– sequence: 2
  givenname: Mohamed El
  surname: Menshawy
  fullname: Menshawy, Mohamed El
  email: mohamed.elmenshawy@ci.menofia.edu.eg
– sequence: 3
  givenname: Hamdy
  surname: Mousa
  fullname: Mousa, Hamdy
  email: hamdimmm@hotmail.com
– sequence: 4
  givenname: Ashraf El
  surname: Sisi
  fullname: Sisi, Ashraf El
  email: ashrafelsisi@hotmail.com
BookMark eNp9kEtPAjEQgBuDiYD-AU9NvLLY19Jd44UQXwnRi56b2W5XullabBeI_94injwwl0km883jG6GB884gdE3JlBI6u22nJu5hyghLBcpZKc_QkBaSZzNZ8gEakjKXmaBSXKBRjC0hVBIih-h1jp3ZY9hsgge9wo0P2LrYg9MGR9MZ3Vvv7vC8-_TB9qt1nGCzg24Lh_oEg6ux9usNBBu9i5fovIEumqu_PEYfjw_vi-ds-fb0spgvM81Z0WfAhdFFiqbIhQSeawp5Vc24loLVWghRl7oEqGleyLzJadUIamhdgCBM1xUfo5vj3HT219bEXrV-G1xaqZgQlCcHOUtdxbFLBx9jMI3Stv89vA9gO0WJOthTrTrYUwd76mgvoewfugl2DeH7NHR_hEx6fWdNUFFbk0zWNiSRqvb2FP4DrnGK7Q
CitedBy_id crossref_primary_10_1016_j_ijar_2023_108964
crossref_primary_10_3390_app14031224
crossref_primary_10_1109_TBDATA_2023_3342643
crossref_primary_10_3233_JIFS_235290
crossref_primary_10_1016_j_engappai_2023_107839
crossref_primary_10_1007_s10462_024_10971_4
crossref_primary_10_1016_j_cola_2024_101301
crossref_primary_10_1016_j_asoc_2021_107938
crossref_primary_10_1109_TFUZZ_2022_3216990
crossref_primary_10_1016_j_ins_2021_07_015
crossref_primary_10_1016_j_neunet_2023_07_018
crossref_primary_10_1145_3705000
crossref_primary_10_1186_s40537_022_00640_0
crossref_primary_10_1007_s11227_023_05771_6
crossref_primary_10_1016_j_ins_2022_04_036
crossref_primary_10_1016_j_eswa_2023_119536
crossref_primary_10_1142_S021812662450124X
crossref_primary_10_1007_s43674_022_00033_z
crossref_primary_10_1016_j_ijar_2021_08_006
crossref_primary_10_1016_j_engappai_2024_108080
crossref_primary_10_1108_EL_07_2020_0209
crossref_primary_10_1145_3582000
crossref_primary_10_1038_s41598_022_23036_9
Cites_doi 10.1023/A:1016304305535
10.1109/TIT.1967.1053964
10.1016/j.patcog.2008.02.006
10.1007/s10618-008-0121-2
10.1016/j.datak.2015.11.002
10.1023/A:1014043630878
10.1016/j.is.2014.07.006
10.1023/A:1014047731786
10.1007/s11036-013-0489-0
10.1016/j.knosys.2016.10.031
10.1016/j.is.2006.09.002
10.1109/TIT.1968.1054155
10.1080/713827180
10.1016/j.artint.2010.01.001
10.1007/s10462-010-9165-y
10.1007/s10009-014-0315-4
10.1023/A:1021564703268
10.1016/j.patcog.2014.10.001
10.1007/s41019-016-0022-0
10.1007/s10009-015-0399-5
10.1023/A:1007626913721
10.1016/j.eswa.2012.01.131
10.1109/TSMC.1972.4309137
10.1007/s10009-015-0398-6
10.1007/s13748-017-0117-5
ContentType Journal Article
Copyright 2020 Elsevier Ltd
Copyright Elsevier BV Jul 1, 2020
Copyright_xml – notice: 2020 Elsevier Ltd
– notice: Copyright Elsevier BV Jul 1, 2020
DBID AAYXX
CITATION
7SC
8FD
JQ2
L7M
L~C
L~D
DOI 10.1016/j.eswa.2020.113297
DatabaseName CrossRef
Computer and Information Systems Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Computer and Information Systems Abstracts
Technology Research Database
Computer and Information Systems Abstracts – Academic
Advanced Technologies Database with Aerospace
ProQuest Computer Science Collection
Computer and Information Systems Abstracts Professional
DatabaseTitleList Computer and Information Systems Abstracts

DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1873-6793
ExternalDocumentID 10_1016_j_eswa_2020_113297
S0957417420301226
GroupedDBID --K
--M
.DC
.~1
0R~
13V
1B1
1RT
1~.
1~5
4.4
457
4G.
5GY
5VS
7-5
71M
8P~
9JN
9JO
AAAKF
AABNK
AACTN
AAEDT
AAEDW
AAIAV
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AARIN
AAXUO
AAYFN
ABBOA
ABFNM
ABMAC
ABMVD
ABUCO
ABYKQ
ACDAQ
ACGFS
ACHRH
ACNTT
ACRLP
ACZNC
ADBBV
ADEZE
ADTZH
AEBSH
AECPX
AEKER
AENEX
AFKWA
AFTJW
AGHFR
AGJBL
AGUBO
AGUMN
AGYEJ
AHHHB
AHJVU
AHZHX
AIALX
AIEXJ
AIKHN
AITUG
AJOXV
ALEQD
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
AOUOD
APLSM
AXJTR
BJAXD
BKOJK
BLXMC
BNSAS
CS3
DU5
EBS
EFJIC
EFLBG
EO8
EO9
EP2
EP3
F5P
FDB
FIRID
FNPLU
FYGXN
G-Q
GBLVA
GBOLZ
HAMUX
IHE
J1W
JJJVA
KOM
LG9
LY1
LY7
M41
MO0
N9A
O-L
O9-
OAUVE
OZT
P-8
P-9
P2P
PC.
PQQKQ
Q38
ROL
RPZ
SDF
SDG
SDP
SDS
SES
SPC
SPCBC
SSB
SSD
SSL
SST
SSV
SSZ
T5K
TN5
~G-
29G
AAAKG
AAQXK
AATTM
AAXKI
AAYWO
AAYXX
ABJNI
ABKBG
ABUFD
ABWVN
ABXDB
ACLOT
ACNNM
ACRPL
ACVFH
ADCNI
ADJOM
ADMUD
ADNMO
AEIPS
AEUPX
AFJKZ
AFPUW
AGQPQ
AIGII
AIIUN
AKBMS
AKRWK
AKYEP
ANKPU
APXCP
ASPBG
AVWKF
AZFZN
CITATION
EFKBS
EJD
FEDTE
FGOYB
G-2
HLZ
HVGLF
HZ~
R2-
SBC
SET
SEW
WUQ
XPP
ZMT
~HD
7SC
8FD
AFXIZ
AGCQF
AGRNS
BNPGV
JQ2
L7M
L~C
L~D
SSH
ID FETCH-LOGICAL-c328t-a34ec8888f8547a35c1a5bb63c742dc444d9c9aad15875f51bf41e1d8a402cdb3
IEDL.DBID .~1
ISSN 0957-4174
IngestDate Fri Jul 25 06:45:05 EDT 2025
Thu Apr 24 22:52:27 EDT 2025
Sat Oct 25 05:06:46 EDT 2025
Fri Feb 23 02:49:58 EST 2024
IsPeerReviewed true
IsScholarly true
Keywords Instance selection
Big data
Global density function
Data mining
Time complexity
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c328t-a34ec8888f8547a35c1a5bb63c742dc444d9c9aad15875f51bf41e1d8a402cdb3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
PQID 2441311352
PQPubID 2045477
ParticipantIDs proquest_journals_2441311352
crossref_citationtrail_10_1016_j_eswa_2020_113297
crossref_primary_10_1016_j_eswa_2020_113297
elsevier_sciencedirect_doi_10_1016_j_eswa_2020_113297
PublicationCentury 2000
PublicationDate 2020-07-01
2020-07-00
20200701
PublicationDateYYYYMMDD 2020-07-01
PublicationDate_xml – month: 07
  year: 2020
  text: 2020-07-01
  day: 01
PublicationDecade 2020
PublicationPlace New York
PublicationPlace_xml – name: New York
PublicationTitle Expert systems with applications
PublicationYear 2020
Publisher Elsevier Ltd
Elsevier BV
Publisher_xml – name: Elsevier Ltd
– name: Elsevier BV
References Hart (bib0015) 1968; 14
Liu, Motoda (bib0021) 1998
Wilson, Martinez (bib0031) 2000; 38
Wilson (bib0030) 1972; SMC-2
García, Luengo, Herrera (bib0012) 2015
Hashem, Yaqoob, Anuar, Mokhtar, Gani, Khan (bib0016) 2015; 47
Gantz, Reinsel (bib0010) 2011
Team (bib0028) 2011
Liu, Hussain, Tan, Dash (bib0020) 2002; 6
Bai, Liang, Dang, Cao (bib0002) 2012; 39
de Haro-García, García-Pedrajas (bib0014) 2009; 18
Arnaiz-González, González-Rogel, Díez-Pastor, López-Nozal (bib0001) 2017; 6
Cover, Hart (bib0008) 1967; 13
Do, Rahm (bib0009) 2007; 32
Carbonera, Abel (bib0005) 2015
Olvera-López, Carrasco-Ochoa, Martínez-Trinidad, Kittler (bib0023) 2010; 34
García, Cano, Herrera (bib0011) 2008; 41
Chen, Mao, Liu (bib0007) 2014; 19
Carbonera, Abel (bib0006) 2016
Kim, Choi, Hong, Kim, Lee (bib0017) 2003; 7
Silva, Souza, Motta (bib0026) 2016; 101
Brighton, Mellish (bib0004) 2002; 6
Sinnott, Voorsluys (bib0027) 2016; 18
Liu, Motoda (bib0022) 2001
Ur Rehman, Liew, Abbas, Jayaraman, Wah, Khan (bib0024) 2016; 1
Liu, Wang, Wang, Lv, Konan (bib0019) 2017; 116
Zhang, Zhang, Yang (bib0032) 2003; 17
Bolt, Leoni, Aalst (bib0003) 2016; 18
García-Osorio, de Haro-García, García-Pedrajas (bib0013) 2010; 174
Turner, Lambert (bib0029) 2015; 17
Reinartz (bib0025) 2002; 6
Leyva, González, Pérez (bib0018) 2015; 48
Bolt (10.1016/j.eswa.2020.113297_bib0003) 2016; 18
Turner (10.1016/j.eswa.2020.113297_bib0029) 2015; 17
Hart (10.1016/j.eswa.2020.113297_bib0015) 1968; 14
Brighton (10.1016/j.eswa.2020.113297_bib0004) 2002; 6
Carbonera (10.1016/j.eswa.2020.113297_bib0005) 2015
Wilson (10.1016/j.eswa.2020.113297_bib0030) 1972; SMC-2
de Haro-García (10.1016/j.eswa.2020.113297_bib0014) 2009; 18
Sinnott (10.1016/j.eswa.2020.113297_bib0027) 2016; 18
Bai (10.1016/j.eswa.2020.113297_bib0002) 2012; 39
Chen (10.1016/j.eswa.2020.113297_bib0007) 2014; 19
Olvera-López (10.1016/j.eswa.2020.113297_bib0023) 2010; 34
Ur Rehman (10.1016/j.eswa.2020.113297_bib0024) 2016; 1
Liu (10.1016/j.eswa.2020.113297_bib0021) 1998
García (10.1016/j.eswa.2020.113297_bib0012) 2015
Cover (10.1016/j.eswa.2020.113297_bib0008) 1967; 13
Reinartz (10.1016/j.eswa.2020.113297_bib0025) 2002; 6
García-Osorio (10.1016/j.eswa.2020.113297_bib0013) 2010; 174
Hashem (10.1016/j.eswa.2020.113297_bib0016) 2015; 47
Silva (10.1016/j.eswa.2020.113297_bib0026) 2016; 101
Do (10.1016/j.eswa.2020.113297_bib0009) 2007; 32
García (10.1016/j.eswa.2020.113297_bib0011) 2008; 41
Arnaiz-González (10.1016/j.eswa.2020.113297_bib0001) 2017; 6
Liu (10.1016/j.eswa.2020.113297_bib0019) 2017; 116
Team (10.1016/j.eswa.2020.113297_bib0028) 2011
Zhang (10.1016/j.eswa.2020.113297_bib0032) 2003; 17
Gantz (10.1016/j.eswa.2020.113297_bib0010) 2011
Carbonera (10.1016/j.eswa.2020.113297_bib0006) 2016
Wilson (10.1016/j.eswa.2020.113297_bib0031) 2000; 38
Leyva (10.1016/j.eswa.2020.113297_bib0018) 2015; 48
Liu (10.1016/j.eswa.2020.113297_bib0020) 2002; 6
Kim (10.1016/j.eswa.2020.113297_bib0017) 2003; 7
Liu (10.1016/j.eswa.2020.113297_bib0022) 2001
References_xml – volume: 19
  start-page: 171
  year: 2014
  end-page: 209
  ident: bib0007
  article-title: Big data: A survey
  publication-title: Mobile Networks and Applications
– volume: 6
  start-page: 211
  year: 2017
  end-page: 219
  ident: bib0001
  article-title: Mr-dis: Democratic instance selection for big data by mapreduce
  publication-title: Progress in Artificial Intelligence
– volume: 17
  start-page: 375
  year: 2003
  end-page: 381
  ident: bib0032
  article-title: Data preparation for data mining
  publication-title: Applied Artificial Intelligence
– year: 2016
  ident: bib0006
  article-title: A novel density-based approach for instance selection
  publication-title: IEEE 28th International conference on tools with artificial intelligence (ICTAI)
– volume: 32
  start-page: 857
  year: 2007
  end-page: 885
  ident: bib0009
  article-title: Matching large schemas: Approaches and evaluation
  publication-title: Information Systems
– volume: 18
  start-page: 607
  year: 2016
  end-page: 628
  ident: bib0003
  article-title: Scientific workflows for process mining: Building blocks, scenarios, and implementation
  publication-title: International Journal on Software Tools for Technology Transfer
– volume: 6
  start-page: 191
  year: 2002
  end-page: 210
  ident: bib0025
  article-title: A unifying view on instance selection
  publication-title: Data Mining and Knowledge Discovery
– volume: 174
  start-page: 410
  year: 2010
  end-page: 441
  ident: bib0013
  article-title: Democratic instance selection: A linear complexity instance selection algorithm based on classifier ensemble concepts
  publication-title: Artificial Intelligence
– volume: 6
  start-page: 393
  year: 2002
  end-page: 423
  ident: bib0020
  article-title: Discretization: An enabling technique
  publication-title: Data Mining and Knowledge Discovery
– volume: SMC-2
  start-page: 408
  year: 1972
  end-page: 421
  ident: bib0030
  article-title: Asymptotic properties of nearest neighbor rules using edited data
  publication-title: IEEE Transactions on Systems, Man, and Cybernetics
– year: 2001
  ident: bib0022
  article-title: Instance selection and construction for data mining
– volume: 18
  start-page: 392
  year: 2009
  end-page: 418
  ident: bib0014
  article-title: A divide-and-conquer recursive approach for scaling up instance selection algorithms
  publication-title: Data Mining and Knowledge Discovery
– year: 1998
  ident: bib0021
  article-title: Feature selection for knowledge discovery and data mining
– volume: 6
  start-page: 153
  year: 2002
  end-page: 172
  ident: bib0004
  article-title: Advances in instance selection for instance-based learning algorithms
  publication-title: Data Mining and Knowledge Discovery
– volume: 38
  start-page: 257
  year: 2000
  end-page: 286
  ident: bib0031
  article-title: Reduction techniques for instance-based learning algorithms
  publication-title: Machine Learning
– volume: 41
  start-page: 2693
  year: 2008
  end-page: 2709
  ident: bib0011
  article-title: A memetic algorithm for evolutionary prototype selection: A scaling up approach
  publication-title: Pattern Recognition
– volume: 13
  start-page: 21
  year: 1967
  end-page: 27
  ident: bib0008
  article-title: Nearest neighbor pattern classification
  publication-title: IEEE Transactions on Information Theory
– year: 2011
  ident: bib0028
  article-title: Big data now: current perspectives from OReilly Radar
  publication-title: Technical Report
– year: 2015
  ident: bib0005
  article-title: A density-based approach for instance selection
  publication-title: IEEE 27th International conference on tools with artificial intelligence (ICTAI)
– volume: 39
  start-page: 8022
  year: 2012
  end-page: 8029
  ident: bib0002
  article-title: A cluster centers initialization method for clustering categorical data
  publication-title: Expert Systems with Applications
– volume: 48
  start-page: 1523
  year: 2015
  end-page: 1537
  ident: bib0018
  article-title: Three new instance selection methods based on local sets: A comparative study with several approaches from a bi-objective perspective
  publication-title: Pattern Recognition
– volume: 7
  start-page: 81
  year: 2003
  end-page: 99
  ident: bib0017
  article-title: A taxonomy of dirty data
  publication-title: Data Mining and Knowledge Discovery
– volume: 14
  start-page: 515
  year: 1968
  end-page: 516
  ident: bib0015
  article-title: The condensed nearest neighbor rule
  publication-title: IEEE Transactions on Information Theory
– volume: 34
  start-page: 133
  year: 2010
  end-page: 143
  ident: bib0023
  article-title: A review of instance selection methods
  publication-title: Artificial Intelligence Review
– volume: 47
  start-page: 98
  year: 2015
  end-page: 115
  ident: bib0016
  article-title: The rise of big data on cloud computing: Review and open research issues
  publication-title: Information Systems
– volume: 1
  start-page: 265
  year: 2016
  end-page: 284
  ident: bib0024
  article-title: Big data reduction methods: A survey
  publication-title: Data Science and Engineering
– year: 2015
  ident: bib0012
  article-title: Data preprocessing in data mining
– volume: 17
  start-page: 321
  year: 2015
  end-page: 338
  ident: bib0029
  article-title: Workflows for quantitative data analysis in the social sciences
  publication-title: International Journal on Software Tools for Technology Transfer
– volume: 18
  start-page: 587
  year: 2016
  end-page: 605
  ident: bib0027
  article-title: A scalable cloud-based system for data-intensive spatial analysis
  publication-title: International Journal on Software Tools for Technology Transfer
– year: 2011
  ident: bib0010
  article-title: Extracting value from chaos
  publication-title: Technical Report
– volume: 116
  start-page: 58
  year: 2017
  end-page: 73
  ident: bib0019
  article-title: An efficient instance selection algorithm to reconstruct training set for support vector machine
  publication-title: Knowledge-Based Systems
– volume: 101
  start-page: 24
  year: 2016
  end-page: 41
  ident: bib0026
  article-title: An instance selection method for large datasets based on Markov geometric diffusion
  publication-title: Data & Knowledge Engineering
– volume: 6
  start-page: 393
  issue: 4
  year: 2002
  ident: 10.1016/j.eswa.2020.113297_bib0020
  article-title: Discretization: An enabling technique
  publication-title: Data Mining and Knowledge Discovery
  doi: 10.1023/A:1016304305535
– year: 1998
  ident: 10.1016/j.eswa.2020.113297_bib0021
– volume: 13
  start-page: 21
  issue: 1
  year: 1967
  ident: 10.1016/j.eswa.2020.113297_bib0008
  article-title: Nearest neighbor pattern classification
  publication-title: IEEE Transactions on Information Theory
  doi: 10.1109/TIT.1967.1053964
– volume: 41
  start-page: 2693
  issue: 8
  year: 2008
  ident: 10.1016/j.eswa.2020.113297_bib0011
  article-title: A memetic algorithm for evolutionary prototype selection: A scaling up approach
  publication-title: Pattern Recognition
  doi: 10.1016/j.patcog.2008.02.006
– volume: 18
  start-page: 392
  issue: 3
  year: 2009
  ident: 10.1016/j.eswa.2020.113297_bib0014
  article-title: A divide-and-conquer recursive approach for scaling up instance selection algorithms
  publication-title: Data Mining and Knowledge Discovery
  doi: 10.1007/s10618-008-0121-2
– volume: 101
  start-page: 24
  year: 2016
  ident: 10.1016/j.eswa.2020.113297_bib0026
  article-title: An instance selection method for large datasets based on Markov geometric diffusion
  publication-title: Data & Knowledge Engineering
  doi: 10.1016/j.datak.2015.11.002
– volume: 6
  start-page: 153
  issue: 2
  year: 2002
  ident: 10.1016/j.eswa.2020.113297_bib0004
  article-title: Advances in instance selection for instance-based learning algorithms
  publication-title: Data Mining and Knowledge Discovery
  doi: 10.1023/A:1014043630878
– year: 2011
  ident: 10.1016/j.eswa.2020.113297_bib0028
  article-title: Big data now: current perspectives from OReilly Radar
– volume: 47
  start-page: 98
  year: 2015
  ident: 10.1016/j.eswa.2020.113297_bib0016
  article-title: The rise of big data on cloud computing: Review and open research issues
  publication-title: Information Systems
  doi: 10.1016/j.is.2014.07.006
– volume: 6
  start-page: 191
  issue: 2
  year: 2002
  ident: 10.1016/j.eswa.2020.113297_bib0025
  article-title: A unifying view on instance selection
  publication-title: Data Mining and Knowledge Discovery
  doi: 10.1023/A:1014047731786
– volume: 19
  start-page: 171
  issue: 2
  year: 2014
  ident: 10.1016/j.eswa.2020.113297_bib0007
  article-title: Big data: A survey
  publication-title: Mobile Networks and Applications
  doi: 10.1007/s11036-013-0489-0
– volume: 116
  start-page: 58
  issue: Supplement C
  year: 2017
  ident: 10.1016/j.eswa.2020.113297_bib0019
  article-title: An efficient instance selection algorithm to reconstruct training set for support vector machine
  publication-title: Knowledge-Based Systems
  doi: 10.1016/j.knosys.2016.10.031
– volume: 32
  start-page: 857
  issue: 6
  year: 2007
  ident: 10.1016/j.eswa.2020.113297_bib0009
  article-title: Matching large schemas: Approaches and evaluation
  publication-title: Information Systems
  doi: 10.1016/j.is.2006.09.002
– volume: 14
  start-page: 515
  issue: 3
  year: 1968
  ident: 10.1016/j.eswa.2020.113297_bib0015
  article-title: The condensed nearest neighbor rule
  publication-title: IEEE Transactions on Information Theory
  doi: 10.1109/TIT.1968.1054155
– volume: 17
  start-page: 375
  issue: 5-6
  year: 2003
  ident: 10.1016/j.eswa.2020.113297_bib0032
  article-title: Data preparation for data mining
  publication-title: Applied Artificial Intelligence
  doi: 10.1080/713827180
– year: 2015
  ident: 10.1016/j.eswa.2020.113297_bib0005
  article-title: A density-based approach for instance selection
– volume: 174
  start-page: 410
  issue: 5
  year: 2010
  ident: 10.1016/j.eswa.2020.113297_bib0013
  article-title: Democratic instance selection: A linear complexity instance selection algorithm based on classifier ensemble concepts
  publication-title: Artificial Intelligence
  doi: 10.1016/j.artint.2010.01.001
– volume: 34
  start-page: 133
  issue: 2
  year: 2010
  ident: 10.1016/j.eswa.2020.113297_bib0023
  article-title: A review of instance selection methods
  publication-title: Artificial Intelligence Review
  doi: 10.1007/s10462-010-9165-y
– volume: 17
  start-page: 321
  issue: 3
  year: 2015
  ident: 10.1016/j.eswa.2020.113297_bib0029
  article-title: Workflows for quantitative data analysis in the social sciences
  publication-title: International Journal on Software Tools for Technology Transfer
  doi: 10.1007/s10009-014-0315-4
– volume: 7
  start-page: 81
  issue: 1
  year: 2003
  ident: 10.1016/j.eswa.2020.113297_bib0017
  article-title: A taxonomy of dirty data
  publication-title: Data Mining and Knowledge Discovery
  doi: 10.1023/A:1021564703268
– volume: 48
  start-page: 1523
  issue: 4
  year: 2015
  ident: 10.1016/j.eswa.2020.113297_bib0018
  article-title: Three new instance selection methods based on local sets: A comparative study with several approaches from a bi-objective perspective
  publication-title: Pattern Recognition
  doi: 10.1016/j.patcog.2014.10.001
– year: 2016
  ident: 10.1016/j.eswa.2020.113297_bib0006
  article-title: A novel density-based approach for instance selection
– volume: 1
  start-page: 265
  issue: 4
  year: 2016
  ident: 10.1016/j.eswa.2020.113297_bib0024
  article-title: Big data reduction methods: A survey
  publication-title: Data Science and Engineering
  doi: 10.1007/s41019-016-0022-0
– volume: 18
  start-page: 607
  issue: 6
  year: 2016
  ident: 10.1016/j.eswa.2020.113297_bib0003
  article-title: Scientific workflows for process mining: Building blocks, scenarios, and implementation
  publication-title: International Journal on Software Tools for Technology Transfer
  doi: 10.1007/s10009-015-0399-5
– volume: 38
  start-page: 257
  issue: 3
  year: 2000
  ident: 10.1016/j.eswa.2020.113297_bib0031
  article-title: Reduction techniques for instance-based learning algorithms
  publication-title: Machine Learning
  doi: 10.1023/A:1007626913721
– year: 2011
  ident: 10.1016/j.eswa.2020.113297_bib0010
  article-title: Extracting value from chaos
– volume: 39
  start-page: 8022
  issue: 9
  year: 2012
  ident: 10.1016/j.eswa.2020.113297_bib0002
  article-title: A cluster centers initialization method for clustering categorical data
  publication-title: Expert Systems with Applications
  doi: 10.1016/j.eswa.2012.01.131
– volume: SMC-2
  start-page: 408
  issue: 3
  year: 1972
  ident: 10.1016/j.eswa.2020.113297_bib0030
  article-title: Asymptotic properties of nearest neighbor rules using edited data
  publication-title: IEEE Transactions on Systems, Man, and Cybernetics
  doi: 10.1109/TSMC.1972.4309137
– volume: 18
  start-page: 587
  issue: 6
  year: 2016
  ident: 10.1016/j.eswa.2020.113297_bib0027
  article-title: A scalable cloud-based system for data-intensive spatial analysis
  publication-title: International Journal on Software Tools for Technology Transfer
  doi: 10.1007/s10009-015-0398-6
– volume: 6
  start-page: 211
  issue: 3
  year: 2017
  ident: 10.1016/j.eswa.2020.113297_bib0001
  article-title: Mr-dis: Democratic instance selection for big data by mapreduce
  publication-title: Progress in Artificial Intelligence
  doi: 10.1007/s13748-017-0117-5
– year: 2001
  ident: 10.1016/j.eswa.2020.113297_bib0022
– year: 2015
  ident: 10.1016/j.eswa.2020.113297_bib0012
SSID ssj0017007
Score 2.4472737
Snippet •We design two new algorithms using global density, relevant, irrelevant functions.•We develop a toolkit and its GUI, management and validation...
Several approaches for instance selection have been put forward as a primary step to increase the efficiency and accuracy of algorithms applied to mine big...
SourceID proquest
crossref
elsevier
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 113297
SubjectTerms Accuracy
Algorithms
Big Data
Classification
Complexity
Computing time
Data mining
Datasets
Global density function
Instance selection
Polynomials
Reduction
Standard data
Time complexity
Toolkits
Title A new approach for instance selection: Algorithms, evaluation, and comparisons
URI https://dx.doi.org/10.1016/j.eswa.2020.113297
https://www.proquest.com/docview/2441311352
Volume 149
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVESC
  databaseName: Baden-Württemberg Complete Freedom Collection (Elsevier)
  customDbUrl:
  eissn: 1873-6793
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0017007
  issn: 0957-4174
  databaseCode: GBLVA
  dateStart: 20110101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
– providerCode: PRVESC
  databaseName: Elsevier SD Complete Freedom Collection [SCCMFC]
  customDbUrl:
  eissn: 1873-6793
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0017007
  issn: 0957-4174
  databaseCode: ACRLP
  dateStart: 19950101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
– providerCode: PRVESC
  databaseName: Elsevier SD Freedom Collection
  customDbUrl:
  eissn: 1873-6793
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0017007
  issn: 0957-4174
  databaseCode: .~1
  dateStart: 19950101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
– providerCode: PRVESC
  databaseName: ScienceDirect Freedom Collection Journals
  customDbUrl:
  eissn: 1873-6793
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0017007
  issn: 0957-4174
  databaseCode: AIKHN
  dateStart: 19950101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
– providerCode: PRVLSH
  databaseName: Elsevier Journals
  customDbUrl:
  mediaType: online
  eissn: 1873-6793
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0017007
  issn: 0957-4174
  databaseCode: AKRWK
  dateStart: 19900101
  isFulltext: true
  providerName: Library Specific Holdings
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV07T8MwELaqsrDwRjxK5YGNhuLYebFVFVUB0QUqdbP8CgSVtGqK2PjtnBOnCIQ6MMaxo-hs330n3X0fQuch0yTUOvU0SVKPpZp6saS-xxQzkYH4ohLb4PwwCodjdjcJJg3Ur3thbFml8_2VTy-9tRvpOmt251nWfQRwAOEQUjuL6gFF2A52FlkVg8vPVZmHpZ-LKr69yLOzXeNMVeNlig_LPeSX0ia-JX76Ozj9ctNl7BnsoC0HGnGv-q9d1DD5HtquBRmwu5_7aNTDgJJxzROOAZDirMR_yuCiVLyBbbjGvenzbJEtX96KDv6m--5gkWusVsKExQEaD26e-kPPCSZ4ivrx0hOUGQUpbZzGAYsEDRQRgZQhVWAlrRhjOlGJEJoEkKakAZEpI4boWEAWqbSkh6iZz3JzhDAxoZQ0vqICXgZSxaGAzElqqYxMBEuPEaktxZVjE7eiFlNel429cmtdbq3LK-seo4vVmnnFpbF2dlBvAP9xIjg4-7XrWvVucXcfCw4gxvIKAdo8-ednT9GmfaoqdVuouVy8mzPAI0vZLg9cG230bu-Hoy_BTN-f
linkProvider Elsevier
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3JTsMwELWgHODCjiirD9wgFMd2Fm4VoipLe6GVuFneAkGloKaIG9_OOHGKQIgD13hRNGPPvCeN3yB0FDFDImOywJA0C1hmaJAoGgZMMxtbyC86dQ-ce_2oO2TX9_x-Dl3Ub2FcWaWP_VVML6O1_9Ly1my95nnrDsABpEOgdg7VA4qYRwuMh7FjYKcfszoPpz8XV4J7ceCm-5czVZGXLd6d-FBY9jYJnfLT79npR5wuk09nFS171Ijb1Y-toTk7XkcrdUcG7C_oBuq3McBkXAuFY0CkOC8BoLa4KFvegB_OcXv08DLJp4_PxQn-0vs-wXJssJ51Jiw20bBzObjoBr5jQqBpmEwDSZnVwGmTLOEslpRrIrlSEdVgJqMZYybVqZSGcOApGScqY8QSk0igkdoouoUa45ex3UaY2EgpmpxRCYNc6SSSQJ2UUdqqVLKsiUhtKaG9nLjrajESdd3Yk3DWFc66orJuEx3P1rxWYhp_zua1A8S3IyEg2v-5bq_2lvAXshCAYpywEMDNnX9ue4gWu4Perbi96t_soiU3UpXt7qHGdPJm9wGcTNVBefg-AQlr4TQ
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+new+approach+for+instance+selection%3A+Algorithms%2C+evaluation%2C+and+comparisons&rft.jtitle=Expert+systems+with+applications&rft.au=Malhat%2C+Mohamed&rft.au=Menshawy%2C+Mohamed+El&rft.au=Mousa%2C+Hamdy&rft.au=Sisi%2C+Ashraf+El&rft.date=2020-07-01&rft.issn=0957-4174&rft.volume=149&rft.spage=113297&rft_id=info:doi/10.1016%2Fj.eswa.2020.113297&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_eswa_2020_113297
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0957-4174&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0957-4174&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0957-4174&client=summon