A data clustering algorithm for stratified data partitioning in artificial neural network

The statistical properties of training, validation and test data play an important role in assuring optimal performance in artificial neural networks (ANNs). Researchers have proposed optimized data partitioning (ODP) and stratified data partitioning (SDP) methods to partition of input data into tra...

Full description

Saved in:
Bibliographic Details
Published inExpert systems with applications Vol. 39; no. 8; pp. 7004 - 7014
Main Authors Sahoo, Ajit K., Zuo, Ming J., Tiwari, M.K.
Format Journal Article
LanguageEnglish
Published Elsevier Ltd 15.06.2012
Subjects
Online AccessGet full text
ISSN0957-4174
1873-6793
DOI10.1016/j.eswa.2012.01.047

Cover

Abstract The statistical properties of training, validation and test data play an important role in assuring optimal performance in artificial neural networks (ANNs). Researchers have proposed optimized data partitioning (ODP) and stratified data partitioning (SDP) methods to partition of input data into training, validation and test datasets. ODP methods based on genetic algorithm (GA) are computationally expensive as the random search space can be in the power of twenty or more for an average sized dataset. For SDP methods, clustering algorithms such as self organizing map (SOM) and fuzzy clustering (FC) are used to form strata. It is assumed that data points in any individual stratum are in close statistical agreement. Reported clustering algorithms are designed to form natural clusters. In the case of large multivariate datasets, some of these natural clusters can be big enough such that the furthest data vectors are statistically far away from the mean. Further, these algorithms are computationally expensive as well. We propose a custom design clustering algorithm (CDCA) to overcome these shortcomings. Comparisons are made using three benchmark case studies, one each from classification, function approximation and prediction domains. The proposed CDCA data partitioning method is evaluated in comparison with SOM, FC and GA based data partitioning methods. It is found that the CDCA data partitioning method not only perform well but also reduces the average CPU time.
AbstractList The statistical properties of training, validation and test data play an important role in assuring optimal performance in artificial neural networks (ANNs). Researchers have proposed optimized data partitioning (ODP) and stratified data partitioning (SDP) methods to partition of input data into training, validation and test datasets. ODP methods based on genetic algorithm (GA) are computationally expensive as the random search space can be in the power of twenty or more for an average sized dataset. For SDP methods, clustering algorithms such as self organizing map (SOM) and fuzzy clustering (FC) are used to form strata. It is assumed that data points in any individual stratum are in close statistical agreement. Reported clustering algorithms are designed to form natural clusters. In the case of large multivariate datasets, some of these natural clusters can be big enough such that the furthest data vectors are statistically far away from the mean. Further, these algorithms are computationally expensive as well. We propose a custom design clustering algorithm (CDCA) to overcome these shortcomings. Comparisons are made using three benchmark case studies, one each from classification, function approximation and prediction domains. The proposed CDCA data partitioning method is evaluated in comparison with SOM, FC and GA based data partitioning methods. It is found that the CDCA data partitioning method not only perform well but also reduces the average CPU time.
Author Zuo, Ming J.
Sahoo, Ajit K.
Tiwari, M.K.
Author_xml – sequence: 1
  givenname: Ajit K.
  surname: Sahoo
  fullname: Sahoo, Ajit K.
  email: sahoo@ualberta.ca
  organization: Department of Mechanical Engineering, University of Alberta, Edmonton, Canada
– sequence: 2
  givenname: Ming J.
  surname: Zuo
  fullname: Zuo, Ming J.
  email: ming.zuo@ualberta.ca
  organization: Department of Mechanical Engineering, University of Alberta, Edmonton, Canada
– sequence: 3
  givenname: M.K.
  surname: Tiwari
  fullname: Tiwari, M.K.
  email: mkt09@hotmail.com
  organization: Department of Industrial Engineering and Management, Indian Institute of Technology, Kharagpur, India
BookMark eNqFkD1PwzAQQC0EEqXwB5gysiTYsWMnEgtCfElILDAwWa5zhitpXGwXxL_HoUwMMJ1svXfSvQOyO_oRCDlmtGKUydNlBfHDVDVldUVZRYXaITPWKl5K1fFdMqNdo0rBlNgnBzEuKWWKUjUjT-dFb5Ip7LCJCQKOz4UZnn3A9LIqnA9FTMEkdAj9FlybkDChHycUx2J6OrRohmKETfge6cOH10Oy58wQ4ehnzsnj1eXDxU15d399e3F-V1ouZSqVkow3ojGqaxxzRtD8YY3tHCyg7wVznILteMsWcgFctlZwCdx0oukMB8Xn5GS7dx382wZi0iuMFobBjOA3UedDGaONaOX_KK3rNi8WTUbbLWqDjzGA0xaTme7OPXDIqJ7C66WewuspvKZM5_BZrX-p64ArEz7_ls62EuRU7whBR4swWugxgE269_iX_gV3hZ-j
CitedBy_id crossref_primary_10_1016_j_cageo_2018_02_003
crossref_primary_10_4028_www_scientific_net_AEF_6_7_924
crossref_primary_10_1007_s10845_017_1337_z
crossref_primary_10_1016_j_cie_2023_109502
crossref_primary_10_1016_j_eswa_2016_02_009
crossref_primary_10_17097_ataunizfd_365231
crossref_primary_10_1007_s00500_014_1288_7
crossref_primary_10_4028_www_scientific_net_AMR_798_799_680
crossref_primary_10_1016_j_estger_2014_02_005
crossref_primary_10_1016_j_energy_2019_116589
crossref_primary_10_1016_j_jhydrol_2020_125605
crossref_primary_10_1016_j_envpol_2022_120720
crossref_primary_10_1002_2012WR012713
crossref_primary_10_12989_cac_2015_15_1_089
crossref_primary_10_1177_09544070211064472
crossref_primary_10_1016_j_ssci_2019_04_026
crossref_primary_10_1007_s00521_016_2534_y
crossref_primary_10_1007_s13201_017_0541_5
crossref_primary_10_1016_j_isatra_2020_02_018
crossref_primary_10_1007_s00500_024_09765_1
crossref_primary_10_1177_0954406213511032
crossref_primary_10_3390_math10142538
Cites_doi 10.1016/j.eswa.2007.08.009
10.1016/S1007-0214(05)70060-2
10.1016/j.patcog.2009.09.013
10.1109/CCECE.2008.4564844
10.1023/B:NARR.0000046920.95725.1b
10.1109/41.847906
10.1029/2001WR000266
10.1016/j.neunet.2009.11.009
10.1016/j.eswa.2009.08.013
10.1016/S0043-1354(00)00067-1
10.1109/TCOM.1980.1094577
10.1016/0377-0427(87)90125-7
10.1016/j.infsof.2009.08.005
10.1016/0378-7206(93)90064-Z
10.1109/ICPR.1992.201716
10.1016/j.enconman.2007.08.007
10.1016/S1364-8152(99)00007-9
10.1080/02626669609491511
10.1016/j.patcog.2009.09.003
10.1109/COGINF.2004.1327476
10.1016/0169-7439(93)E0065-C
10.1541/ieejeiss.129.302
10.1061/(ASCE)0887-3801(2004)18:2(105)
10.1109/5326.704579
ContentType Journal Article
Copyright 2012 Elsevier Ltd
Copyright_xml – notice: 2012 Elsevier Ltd
DBID AAYXX
CITATION
7SC
8FD
JQ2
L7M
L~C
L~D
DOI 10.1016/j.eswa.2012.01.047
DatabaseName CrossRef
Computer and Information Systems Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Computer and Information Systems Abstracts
Technology Research Database
Computer and Information Systems Abstracts – Academic
Advanced Technologies Database with Aerospace
ProQuest Computer Science Collection
Computer and Information Systems Abstracts Professional
DatabaseTitleList
Computer and Information Systems Abstracts
Computer and Information Systems Abstracts
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1873-6793
EndPage 7014
ExternalDocumentID 10_1016_j_eswa_2012_01_047
S0957417412000607
GroupedDBID --K
--M
.DC
.~1
0R~
13V
1B1
1RT
1~.
1~5
4.4
457
4G.
5GY
5VS
7-5
71M
8P~
9JN
9JO
AAAKF
AABNK
AACTN
AAEDT
AAEDW
AAIAV
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AARIN
AAXUO
AAYFN
ABBOA
ABFNM
ABMAC
ABMVD
ABUCO
ABXDB
ABYKQ
ACDAQ
ACGFS
ACHRH
ACNNM
ACRLP
ACZNC
ADBBV
ADEZE
ADMUD
ADTZH
AEBSH
AECPX
AEKER
AENEX
AFKWA
AFTJW
AGHFR
AGJBL
AGUBO
AGUMN
AGYEJ
AHHHB
AHJVU
AHZHX
AIALX
AIEXJ
AIKHN
AITUG
AJBFU
AJOXV
ALEQD
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
AOUOD
APLSM
AXJTR
BJAXD
BKOJK
BLXMC
BNSAS
CS3
DU5
EBS
EFJIC
EFLBG
EJD
EO8
EO9
EP2
EP3
F5P
FDB
FIRID
FNPLU
FYGXN
G-Q
GBLVA
GBOLZ
HAMUX
HZ~
IHE
J1W
JJJVA
KOM
LG9
LY1
LY7
M41
MO0
N9A
O-L
O9-
OAUVE
OZT
P-8
P-9
P2P
PC.
PQQKQ
Q38
RIG
ROL
RPZ
SDF
SDG
SDP
SDS
SES
SPC
SPCBC
SSB
SSD
SSL
SST
SSV
SSZ
T5K
TN5
~G-
29G
AAAKG
AAQXK
AATTM
AAXKI
AAYWO
AAYXX
ABJNI
ABKBG
ABWVN
ACLOT
ACNTT
ACRPL
ACVFH
ADCNI
ADJOM
ADNMO
AEIPS
AEUPX
AFJKZ
AFPUW
AGQPQ
AIGII
AIIUN
AKBMS
AKRWK
AKYEP
ANKPU
APXCP
ASPBG
AVWKF
AZFZN
CITATION
EFKBS
FEDTE
FGOYB
G-2
HLZ
HVGLF
R2-
SBC
SET
SEW
WUQ
XPP
ZMT
~HD
7SC
8FD
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c366t-77613545a795f1fa40761cac9febedd41f30ec9381b6be368c436e3a9459a3e73
IEDL.DBID .~1
ISSN 0957-4174
IngestDate Wed Oct 01 08:25:43 EDT 2025
Sat Sep 27 21:06:38 EDT 2025
Thu Apr 24 23:02:20 EDT 2025
Wed Oct 01 03:51:35 EDT 2025
Fri Feb 23 02:26:31 EST 2024
IsPeerReviewed true
IsScholarly true
Issue 8
Keywords Data partitioning
Fuzzy clustering
Self organizing map
Genetic algorithm
Data clustering
Artificial neural network
Custom design clustering algorithm
Language English
License https://www.elsevier.com/tdm/userlicense/1.0
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c366t-77613545a795f1fa40761cac9febedd41f30ec9381b6be368c436e3a9459a3e73
Notes ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 23
ObjectType-Article-1
ObjectType-Feature-2
PQID 1022894545
PQPubID 23500
PageCount 11
ParticipantIDs proquest_miscellaneous_1701105486
proquest_miscellaneous_1022894545
crossref_citationtrail_10_1016_j_eswa_2012_01_047
crossref_primary_10_1016_j_eswa_2012_01_047
elsevier_sciencedirect_doi_10_1016_j_eswa_2012_01_047
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2012-06-15
PublicationDateYYYYMMDD 2012-06-15
PublicationDate_xml – month: 06
  year: 2012
  text: 2012-06-15
  day: 15
PublicationDecade 2010
PublicationTitle Expert systems with applications
PublicationYear 2012
Publisher Elsevier Ltd
Publisher_xml – name: Elsevier Ltd
References He, Huang, Zeng, Lu (b0045) 2008; 35
Tokar, Johnson (b0165) 1999; 4
Stein (b0155) 1993; 8
Samanta, Bandopadhyay, Ganguli, Dutta (b0130) 2004; 13
Kohonen (b0065) 2001; Vol. 30
MacQueen (b0070) 1967
Stein (b0150) 1993; 8
Joo, Choi, Park (b0050) 2000; 34
Maier, Dandy (b0075) 2000; 15
Rousseeuw (b0115) 1987; 20
Linde, Buzo, Gray (b0055) 1980; 28
M.Sc. thesis, Department of Mechanical Engineering, University of Alberta, Spring.
Irvine, CA: Dept. Inform. Comput. Sci., Univ. California.
Duran, Odell (b0030) 1974
(pp. 1041–1045).
Sahoo, A. K. (2011).
Bowden, Maier, Dandy (b0015) 2002; 38
Sahoo, A. K., Zhang, Y., & Zuo, M. J. (2008).
Yu, Wang, Lai (b0195) 2007
Zhang, Sun (b0200) 2008; 49
Kaufman, Rousseeuw (b0060) 1990
Tarassenko, L. (1998).
Cochran (b0025) 1977
Hagan, Demuth, Beale (b0040) 1996
Nguyen, H. H., & Chan, C. W. (2004). A comparison of data preprocessing strategies for neural network modeling of oil production prediction. In
Shahin, Maier, Jaksa (b0140) 2004; 18
May, Maier, Dandy (b0080) 2010; 23
Yen, Lin (b0185) 2000; 47
Chen, Sugi, Shirakawa, Zou, Nakamura (b0020) 2009; 129
.
Samanta, Bandopadhyay, Ganguli (b0135) 2004; 11
Twomey, Smith (b0175) 1998; 28
Minns, Hall (b0085) 1996; 41
Bezdec (b0005) 1981
Nedeljkovic, V., & Milosavljevic, M. (1992). On the influence of the training set data preprocessing on neural networks training. In
Nguyen, Torre (b0100) 2010; 43
Sjoberg (b0145) 1992
Tong, Liu (b0170) 2005; 10
Xu, Zhang, Yang (b0180) 2010; 43
Yoon, Bae (b0190) 2010; 52
Fletcher, Goss (b0035) 1993; 24
Blake, C. L. & Merz, C. J. (1998).
Noord (b0105) 1994; 23
Neural Computing Applications Forum.
Park, Shin, Jang (b0110) 2010; 37
Linde (10.1016/j.eswa.2012.01.047_b0055) 1980; 28
Bowden (10.1016/j.eswa.2012.01.047_b0015) 2002; 38
Stein (10.1016/j.eswa.2012.01.047_b0155) 1993; 8
Zhang (10.1016/j.eswa.2012.01.047_b0200) 2008; 49
Kaufman (10.1016/j.eswa.2012.01.047_b0060) 1990
May (10.1016/j.eswa.2012.01.047_b0080) 2010; 23
Tokar (10.1016/j.eswa.2012.01.047_b0165) 1999; 4
Yoon (10.1016/j.eswa.2012.01.047_b0190) 2010; 52
Fletcher (10.1016/j.eswa.2012.01.047_b0035) 1993; 24
MacQueen (10.1016/j.eswa.2012.01.047_b0070) 1967
10.1016/j.eswa.2012.01.047_b0160
Xu (10.1016/j.eswa.2012.01.047_b0180) 2010; 43
10.1016/j.eswa.2012.01.047_b0010
Tong (10.1016/j.eswa.2012.01.047_b0170) 2005; 10
Sjoberg (10.1016/j.eswa.2012.01.047_b0145) 1992
He (10.1016/j.eswa.2012.01.047_b0045) 2008; 35
Kohonen (10.1016/j.eswa.2012.01.047_b0065) 2001; Vol. 30
Joo (10.1016/j.eswa.2012.01.047_b0050) 2000; 34
Noord (10.1016/j.eswa.2012.01.047_b0105) 1994; 23
Samanta (10.1016/j.eswa.2012.01.047_b0135) 2004; 11
Yen (10.1016/j.eswa.2012.01.047_b0185) 2000; 47
Maier (10.1016/j.eswa.2012.01.047_b0075) 2000; 15
Bezdec (10.1016/j.eswa.2012.01.047_b0005) 1981
Minns (10.1016/j.eswa.2012.01.047_b0085) 1996; 41
Park (10.1016/j.eswa.2012.01.047_b0110) 2010; 37
Samanta (10.1016/j.eswa.2012.01.047_b0130) 2004; 13
Shahin (10.1016/j.eswa.2012.01.047_b0140) 2004; 18
10.1016/j.eswa.2012.01.047_b0090
Cochran (10.1016/j.eswa.2012.01.047_b0025) 1977
Chen (10.1016/j.eswa.2012.01.047_b0020) 2009; 129
10.1016/j.eswa.2012.01.047_b0095
10.1016/j.eswa.2012.01.047_b0120
Yu (10.1016/j.eswa.2012.01.047_b0195) 2007
Nguyen (10.1016/j.eswa.2012.01.047_b0100) 2010; 43
Rousseeuw (10.1016/j.eswa.2012.01.047_b0115) 1987; 20
Hagan (10.1016/j.eswa.2012.01.047_b0040) 1996
10.1016/j.eswa.2012.01.047_b0125
Twomey (10.1016/j.eswa.2012.01.047_b0175) 1998; 28
Stein (10.1016/j.eswa.2012.01.047_b0150) 1993; 8
Duran (10.1016/j.eswa.2012.01.047_b0030) 1974
References_xml – volume: 8
  start-page: 32
  year: 1993
  end-page: 37
  ident: b0155
  article-title: Preprocessing data for neural networks
  publication-title: AI Expert
– reference: Nedeljkovic, V., & Milosavljevic, M. (1992). On the influence of the training set data preprocessing on neural networks training. In
– volume: 47
  start-page: 650
  year: 2000
  end-page: 667
  ident: b0185
  article-title: Wavelet packet feature extraction for vibration monitoring
  publication-title: IEEE Transactions on Industrial Electronics
– volume: 23
  start-page: 65
  year: 1994
  end-page: 70
  ident: b0105
  article-title: The influence of data preprocessing on the robustness and parsimony of multivariate calibration models
  publication-title: Chemometrics and Intelligent Laboratory Systems
– reference: Blake, C. L. & Merz, C. J. (1998).
– volume: 11
  start-page: 69
  year: 2004
  end-page: 76
  ident: b0135
  article-title: Data segmentation and genetic algorithms for sparse data division in nome placer gold grade estimation using neural network and geostatistics
  publication-title: Exploration and Mining Geology
– reference: Sahoo, A. K. (2011).
– volume: 37
  start-page: 2654
  year: 2010
  end-page: 2660
  ident: b0110
  article-title: A novel efficient technique for extracting valid feature information
  publication-title: Expert Systems with Applications
– volume: 28
  start-page: 417
  year: 1998
  end-page: 430
  ident: b0175
  article-title: Bias and variance of validation methods for function approximation neural networks under conditions of sparse data
  publication-title: IEEE Transactions on Systems, Man and Cybernetics, Part C: Applications and Reviews
– year: 1996
  ident: b0040
  article-title: Neural network design
– year: 1974
  ident: b0030
  article-title: Cluster Analysis
– year: 1977
  ident: b0025
  article-title: Sampling Techniques
– volume: 13
  start-page: 189
  year: 2004
  end-page: 200
  ident: b0130
  article-title: Sparse data division using data segmentation and Kohonen network for neural network and geostatistical ore grade modeling in nome offshore placer deposit
  publication-title: Natural Resources Research
– volume: 35
  start-page: 1301
  year: 2008
  end-page: 1310
  ident: b0045
  article-title: Wavelet-based multi resolution analysis for data cleaning and its application to water quality management systems
  publication-title: Expert Systems with Applications
– volume: 38
  start-page: 2-1
  year: 2002
  end-page: 2-11
  ident: b0015
  article-title: Optimal division of data for neural network models in water resources applications
  publication-title: Water Resources Research
– reference: Sahoo, A. K., Zhang, Y., & Zuo, M. J. (2008).
– volume: 18
  start-page: 105
  year: 2004
  end-page: 114
  ident: b0140
  article-title: Data division for developing neural networks applied to geotechnical engineering
  publication-title: Journal of Computing in Civil Engineering
– reference: . Neural Computing Applications Forum.
– volume: 28
  start-page: 84
  year: 1980
  end-page: 95
  ident: b0055
  article-title: An algorithm for vector quantizer design
  publication-title: IEEE Transactions on Communications
– volume: 8
  start-page: 42
  year: 1993
  end-page: 47
  ident: b0150
  article-title: Selecting data for neural networks
  publication-title: AI Expert
– volume: 52
  start-page: 137
  year: 2010
  end-page: 151
  ident: b0190
  article-title: A pattern-based outlier detection method identifying abnormal attributes in software project data
  publication-title: Information and Software Technology
– year: 1981
  ident: b0005
  article-title: Pattern Recognition with Fuzzy Objective Function Algorithms
– volume: 15
  start-page: 101
  year: 2000
  end-page: 124
  ident: b0075
  article-title: Neural networks for the prediction and forecasting of water resources variables: A review of modeling issues and applications
  publication-title: Environmental Model Software
– year: 1990
  ident: b0060
  article-title: Finding groups in data. An introduction to cluster analysis
– volume: Vol. 30
  year: 2001
  ident: b0065
  article-title: Self organizing maps
  publication-title: Springer series in information sciences
– reference: . Irvine, CA: Dept. Inform. Comput. Sci., Univ. California.
– volume: 4
  start-page: 232
  year: 1999
  end-page: 239
  ident: b0165
  article-title: Rainfall–runoff modeling using artificial neural network
  publication-title: Journal of Hydraulic Engineering
– volume: 10
  start-page: 233
  year: 2005
  end-page: 239
  ident: b0170
  article-title: Samples selection for artificial neural network training in preliminary structural design
  publication-title: Tsinghua Science and Technology
– year: 2007
  ident: b0195
  article-title: Foreign-exchange-rate forecasting with artificial neural networks
– reference: Nguyen, H. H., & Chan, C. W. (2004). A comparison of data preprocessing strategies for neural network modeling of oil production prediction. In
– reference: (pp. 1041–1045).
– reference: .
– reference: . M.Sc. thesis, Department of Mechanical Engineering, University of Alberta, Spring.
– volume: 41
  start-page: 399
  year: 1996
  end-page: 417
  ident: b0085
  article-title: Artificial neural networks as rainfall–runoff models
  publication-title: Hydrological Sciences Journal
– start-page: 31
  year: 1992
  end-page: 35
  ident: b0145
  article-title: Regularization as a substitute for preprocessing of data in neural network training
  publication-title: Artificial Intelligence in Real-Time Control
– volume: 43
  start-page: 584
  year: 2010
  end-page: 591
  ident: b0100
  article-title: Optimal feature selection for support vector machines
  publication-title: Pattern Recognition
– volume: 20
  start-page: 53
  year: 1987
  end-page: 65
  ident: b0115
  article-title: Silhouettes: A graphical aid to the interpretation and validation of cluster analysis
  publication-title: Computational and Applied Mathematics
– volume: 49
  start-page: 564
  year: 2008
  end-page: 569
  ident: b0200
  article-title: Dynamic intelligent cleaning model of dirty electric load data
  publication-title: Energy Conversion and Management
– volume: 43
  start-page: 1106
  year: 2010
  end-page: 1115
  ident: b0180
  article-title: A feature extraction method for use with bimodal biometrics
  publication-title: Pattern Recognition
– volume: 129
  start-page: 302
  year: 2009
  end-page: 307
  ident: b0020
  article-title: Feature extraction for mental fatigue and relaxation states based on systematic evaluation considering individual difference
  publication-title: IEEJ Transactions on Electronics, Information and Systems
– volume: 23
  start-page: 283
  year: 2010
  end-page: 294
  ident: b0080
  article-title: Data splitting for artificial neural networks using SOM-based stratified sampling
  publication-title: Neural Networks
– volume: 24
  start-page: 159
  year: 1993
  end-page: 167
  ident: b0035
  article-title: Forecasting with neural networks: An application using bankruptcy data
  publication-title: Information & Management
– volume: 34
  start-page: 3295
  year: 2000
  end-page: 3302
  ident: b0050
  article-title: The effects of data preprocessing in the determination of coagulant dosing rate
  publication-title: Water Research
– start-page: 281
  year: 1967
  end-page: 297
  ident: b0070
  article-title: Some methods for classification and analysis of multivariate observations
  publication-title: Proceedings of 5th Berkeley symposium on mathematical statistics and probability
– reference: Tarassenko, L. (1998).
– volume: 35
  start-page: 1301
  issue: 3
  year: 2008
  ident: 10.1016/j.eswa.2012.01.047_b0045
  article-title: Wavelet-based multi resolution analysis for data cleaning and its application to water quality management systems
  publication-title: Expert Systems with Applications
  doi: 10.1016/j.eswa.2007.08.009
– volume: 10
  start-page: 233
  issue: 2
  year: 2005
  ident: 10.1016/j.eswa.2012.01.047_b0170
  article-title: Samples selection for artificial neural network training in preliminary structural design
  publication-title: Tsinghua Science and Technology
  doi: 10.1016/S1007-0214(05)70060-2
– year: 1996
  ident: 10.1016/j.eswa.2012.01.047_b0040
– volume: 43
  start-page: 1106
  issue: 3
  year: 2010
  ident: 10.1016/j.eswa.2012.01.047_b0180
  article-title: A feature extraction method for use with bimodal biometrics
  publication-title: Pattern Recognition
  doi: 10.1016/j.patcog.2009.09.013
– year: 1977
  ident: 10.1016/j.eswa.2012.01.047_b0025
– ident: 10.1016/j.eswa.2012.01.047_b0120
  doi: 10.1109/CCECE.2008.4564844
– start-page: 31
  year: 1992
  ident: 10.1016/j.eswa.2012.01.047_b0145
  article-title: Regularization as a substitute for preprocessing of data in neural network training
  publication-title: Artificial Intelligence in Real-Time Control
– year: 2007
  ident: 10.1016/j.eswa.2012.01.047_b0195
– volume: 13
  start-page: 189
  issue: 3
  year: 2004
  ident: 10.1016/j.eswa.2012.01.047_b0130
  article-title: Sparse data division using data segmentation and Kohonen network for neural network and geostatistical ore grade modeling in nome offshore placer deposit
  publication-title: Natural Resources Research
  doi: 10.1023/B:NARR.0000046920.95725.1b
– volume: 47
  start-page: 650
  issue: 3
  year: 2000
  ident: 10.1016/j.eswa.2012.01.047_b0185
  article-title: Wavelet packet feature extraction for vibration monitoring
  publication-title: IEEE Transactions on Industrial Electronics
  doi: 10.1109/41.847906
– volume: 38
  start-page: 2-1
  issue: 2
  year: 2002
  ident: 10.1016/j.eswa.2012.01.047_b0015
  article-title: Optimal division of data for neural network models in water resources applications
  publication-title: Water Resources Research
  doi: 10.1029/2001WR000266
– ident: 10.1016/j.eswa.2012.01.047_b0010
– volume: 23
  start-page: 283
  issue: 2
  year: 2010
  ident: 10.1016/j.eswa.2012.01.047_b0080
  article-title: Data splitting for artificial neural networks using SOM-based stratified sampling
  publication-title: Neural Networks
  doi: 10.1016/j.neunet.2009.11.009
– volume: 37
  start-page: 2654
  issue: 3
  year: 2010
  ident: 10.1016/j.eswa.2012.01.047_b0110
  article-title: A novel efficient technique for extracting valid feature information
  publication-title: Expert Systems with Applications
  doi: 10.1016/j.eswa.2009.08.013
– ident: 10.1016/j.eswa.2012.01.047_b0125
– volume: 34
  start-page: 3295
  year: 2000
  ident: 10.1016/j.eswa.2012.01.047_b0050
  article-title: The effects of data preprocessing in the determination of coagulant dosing rate
  publication-title: Water Research
  doi: 10.1016/S0043-1354(00)00067-1
– volume: 28
  start-page: 84
  issue: 1
  year: 1980
  ident: 10.1016/j.eswa.2012.01.047_b0055
  article-title: An algorithm for vector quantizer design
  publication-title: IEEE Transactions on Communications
  doi: 10.1109/TCOM.1980.1094577
– volume: 20
  start-page: 53
  year: 1987
  ident: 10.1016/j.eswa.2012.01.047_b0115
  article-title: Silhouettes: A graphical aid to the interpretation and validation of cluster analysis
  publication-title: Computational and Applied Mathematics
  doi: 10.1016/0377-0427(87)90125-7
– volume: 4
  start-page: 232
  issue: 3
  year: 1999
  ident: 10.1016/j.eswa.2012.01.047_b0165
  article-title: Rainfall–runoff modeling using artificial neural network
  publication-title: Journal of Hydraulic Engineering
– year: 1974
  ident: 10.1016/j.eswa.2012.01.047_b0030
– volume: 52
  start-page: 137
  issue: 2
  year: 2010
  ident: 10.1016/j.eswa.2012.01.047_b0190
  article-title: A pattern-based outlier detection method identifying abnormal attributes in software project data
  publication-title: Information and Software Technology
  doi: 10.1016/j.infsof.2009.08.005
– volume: 8
  start-page: 32
  issue: 3
  year: 1993
  ident: 10.1016/j.eswa.2012.01.047_b0155
  article-title: Preprocessing data for neural networks
  publication-title: AI Expert
– volume: 24
  start-page: 159
  issue: 3
  year: 1993
  ident: 10.1016/j.eswa.2012.01.047_b0035
  article-title: Forecasting with neural networks: An application using bankruptcy data
  publication-title: Information & Management
  doi: 10.1016/0378-7206(93)90064-Z
– ident: 10.1016/j.eswa.2012.01.047_b0090
  doi: 10.1109/ICPR.1992.201716
– volume: 11
  start-page: 69
  issue: 1–4
  year: 2004
  ident: 10.1016/j.eswa.2012.01.047_b0135
  article-title: Data segmentation and genetic algorithms for sparse data division in nome placer gold grade estimation using neural network and geostatistics
  publication-title: Exploration and Mining Geology
– volume: Vol. 30
  year: 2001
  ident: 10.1016/j.eswa.2012.01.047_b0065
  article-title: Self organizing maps
– volume: 49
  start-page: 564
  issue: 4
  year: 2008
  ident: 10.1016/j.eswa.2012.01.047_b0200
  article-title: Dynamic intelligent cleaning model of dirty electric load data
  publication-title: Energy Conversion and Management
  doi: 10.1016/j.enconman.2007.08.007
– volume: 15
  start-page: 101
  year: 2000
  ident: 10.1016/j.eswa.2012.01.047_b0075
  article-title: Neural networks for the prediction and forecasting of water resources variables: A review of modeling issues and applications
  publication-title: Environmental Model Software
  doi: 10.1016/S1364-8152(99)00007-9
– ident: 10.1016/j.eswa.2012.01.047_b0160
– volume: 41
  start-page: 399
  issue: 3
  year: 1996
  ident: 10.1016/j.eswa.2012.01.047_b0085
  article-title: Artificial neural networks as rainfall–runoff models
  publication-title: Hydrological Sciences Journal
  doi: 10.1080/02626669609491511
– volume: 43
  start-page: 584
  issue: 3
  year: 2010
  ident: 10.1016/j.eswa.2012.01.047_b0100
  article-title: Optimal feature selection for support vector machines
  publication-title: Pattern Recognition
  doi: 10.1016/j.patcog.2009.09.003
– ident: 10.1016/j.eswa.2012.01.047_b0095
  doi: 10.1109/COGINF.2004.1327476
– volume: 23
  start-page: 65
  year: 1994
  ident: 10.1016/j.eswa.2012.01.047_b0105
  article-title: The influence of data preprocessing on the robustness and parsimony of multivariate calibration models
  publication-title: Chemometrics and Intelligent Laboratory Systems
  doi: 10.1016/0169-7439(93)E0065-C
– volume: 8
  start-page: 42
  issue: 2
  year: 1993
  ident: 10.1016/j.eswa.2012.01.047_b0150
  article-title: Selecting data for neural networks
  publication-title: AI Expert
– volume: 129
  start-page: 302
  issue: 2
  year: 2009
  ident: 10.1016/j.eswa.2012.01.047_b0020
  article-title: Feature extraction for mental fatigue and relaxation states based on systematic evaluation considering individual difference
  publication-title: IEEJ Transactions on Electronics, Information and Systems
  doi: 10.1541/ieejeiss.129.302
– year: 1990
  ident: 10.1016/j.eswa.2012.01.047_b0060
– start-page: 281
  year: 1967
  ident: 10.1016/j.eswa.2012.01.047_b0070
  article-title: Some methods for classification and analysis of multivariate observations
– year: 1981
  ident: 10.1016/j.eswa.2012.01.047_b0005
– volume: 18
  start-page: 105
  issue: 2
  year: 2004
  ident: 10.1016/j.eswa.2012.01.047_b0140
  article-title: Data division for developing neural networks applied to geotechnical engineering
  publication-title: Journal of Computing in Civil Engineering
  doi: 10.1061/(ASCE)0887-3801(2004)18:2(105)
– volume: 28
  start-page: 417
  issue: 3
  year: 1998
  ident: 10.1016/j.eswa.2012.01.047_b0175
  article-title: Bias and variance of validation methods for function approximation neural networks under conditions of sparse data
  publication-title: IEEE Transactions on Systems, Man and Cybernetics, Part C: Applications and Reviews
  doi: 10.1109/5326.704579
SSID ssj0017007
Score 2.1759527
Snippet The statistical properties of training, validation and test data play an important role in assuring optimal performance in artificial neural networks (ANNs)....
SourceID proquest
crossref
elsevier
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 7004
SubjectTerms Algorithms
Artificial neural network
Clustering
Clusters
Custom design clustering algorithm
Data clustering
Data partitioning
Fuzzy clustering
Genetic algorithm
Genetic algorithms
Mathematical analysis
ODP
Partitioning
Self organizing map
Training
Title A data clustering algorithm for stratified data partitioning in artificial neural network
URI https://dx.doi.org/10.1016/j.eswa.2012.01.047
https://www.proquest.com/docview/1022894545
https://www.proquest.com/docview/1701105486
Volume 39
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVESC
  databaseName: Baden-Württemberg Complete Freedom Collection (Elsevier)
  customDbUrl:
  eissn: 1873-6793
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0017007
  issn: 0957-4174
  databaseCode: GBLVA
  dateStart: 20110101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
– providerCode: PRVESC
  databaseName: Elsevier SD Complete Freedom Collection [SCCMFC]
  customDbUrl:
  eissn: 1873-6793
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0017007
  issn: 0957-4174
  databaseCode: ACRLP
  dateStart: 19950101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
– providerCode: PRVESC
  databaseName: Elsevier SD Freedom Collection Journals [SCFCJ]
  customDbUrl:
  eissn: 1873-6793
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0017007
  issn: 0957-4174
  databaseCode: AIKHN
  dateStart: 19950101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
– providerCode: PRVESC
  databaseName: ScienceDirect (Elsevier)
  customDbUrl:
  eissn: 1873-6793
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0017007
  issn: 0957-4174
  databaseCode: .~1
  dateStart: 19950101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
– providerCode: PRVLSH
  databaseName: Elsevier Journals
  customDbUrl:
  mediaType: online
  eissn: 1873-6793
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0017007
  issn: 0957-4174
  databaseCode: AKRWK
  dateStart: 19900101
  isFulltext: true
  providerName: Library Specific Holdings
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3dS8MwEA9jvvjitzg_RgTfpFuzJP14HMMxFfeig_kU0jbRyeyG2_DNv927Nh0osgehUFIuTXNJ7i7p3e8IuWIax9JqL-Md44mwI70oSrVnZJIlFvQlK847HobBYCTuxnJcI70qFgbdKp3sL2V6Ia3dk7bjZns-mbQfwTgAdQhXEW1SRJQLaAzmdOtr7eaB8HNhibcXekjtAmdKHy-z-ETsITwPZC0fU6z8rZx-ielC9_T3yI4zGmm3_K59UjP5AdmtEjJQtz4PyXOXoscnTacrxD8ArUT19GUG-__XdwrWKS1Bci2YnSXhHLvqjmTpJKdYLCElKAJdFrfCTfyIjPo3T72B53IneCkPgiUYzaCnwTrSYSwts1rgeUWq09jCqGWZYJb7Jo1BXydBYngQpYIHhutYyFhzE_JjUs9nuTkh1GdWwjvjLIbNpBZGJyLBpOtMR5wnvm0QVjFNpQ5YHPNbTFXlQfamkNEKGa18poDRDXK9rjMvYTU2UstqLNSPyaFA7m-sd1kNnIJVg79CdG5mq4XCfW4EfRVyA02IthHs6ILTf7Z_RraxhJ5lTJ6T-vJjZS7AhlkmzWKSNslW9_Z-MPwGYiHxRw
linkProvider Elsevier
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LT8MwDI6mcYALb8R4Bokb6tYsSR9HNDGN1y6ABKcobRMoGt00NnHjt2O3KRII7YBUqWrrPuIktuPanwk5ZRr70mov413jibArvShKtWdkkiUW9CUr_R23w2DwIK4e5WOD9OpcGAyrdLK_kumltHZnOo6bnUmed-7AOAB1CFuZbYIZ5UtCdkNcgbU_v-M8EH8urAD3Qg_JXeZMFeRl3j8QfAgdgqztY42Vv7XTLzldKp_-Oll1ViM9rz5sgzRMsUnW6ooM1E3QLfJ0TjHkk6ajOQIggFqievQ8nuazlzcK5imtUHIt2J0V4QTb6nyyNC8oHlaYEhSRLstdGSe-TR76F_e9geeKJ3gpD4IZWM2gqME80mEsLbNaoMMi1WlsoduyTDDLfZPGoLCTIDE8iFLBA8N1LGSsuQn5DmkW48LsEuozK-GZcRbDalILoxORYNV1piPOE9-2CKuZplKHLI4FLkaqDiF7VchohYxWPlPA6BY5-75nUuFqLKSWdV-oH6NDgeBfeN9J3XEKpg3-C9GFGc_fFS50I2irkAtoQjSOYEkX7P3z_cdkeXB_e6NuLofX-2QFr2CYGZMHpDmbzs0hGDSz5KgcsF_71_Lc
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+data+clustering+algorithm+for+stratified+data+partitioning+in+artificial+neural+network&rft.jtitle=Expert+systems+with+applications&rft.au=Sahoo%2C+Ajit+K&rft.au=Zuo%2C+Ming+J&rft.au=Tiwari%2C+M+K&rft.date=2012-06-15&rft.issn=0957-4174&rft.volume=39&rft.issue=8&rft.spage=7004&rft.epage=7014&rft_id=info:doi/10.1016%2Fj.eswa.2012.01.047&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0957-4174&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0957-4174&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0957-4174&client=summon