A mixed data clustering algorithm with noise-filtered distribution centroid and iterative weight adjustment strategy

•A novel mixed data clustering optimization approach is applied for cluster analysis.•Clustering performance is optimized by noise filtering distribution centroid and similarity measurement.•An Iterative weight adjustment is applied to quantify the influence of various attributes on clustering.•The...

Full description

Saved in:
Bibliographic Details
Published inInformation sciences Vol. 577; pp. 697 - 721
Main Authors Li, Xiangjun, Wu, Zijie, Zhao, Zhibin, Ding, Feng, He, Daojing
Format Journal Article
LanguageEnglish
Published Elsevier Inc 01.10.2021
Subjects
Online AccessGet full text
ISSN0020-0255
1872-6291
DOI10.1016/j.ins.2021.07.039

Cover

Abstract •A novel mixed data clustering optimization approach is applied for cluster analysis.•Clustering performance is optimized by noise filtering distribution centroid and similarity measurement.•An Iterative weight adjustment is applied to quantify the influence of various attributes on clustering.•The MCFCAW is an effective method to cluster mixed data. Clustering is an important technology for data analysis. Cluster analysis for mixed data remains challenging. This paper proposes a mixed data clustering algorithm with noise-filtered distribution centroid and iterative weight adjustment strategy. The proposed algorithm defines noise-filtered distribution centroid for categorical attributes. We combine both mean and noise-filtered distribution centroid to represent the cluster center with mixed attributes, the noise-filtered distribution centroid records the frequency of occurrences for each possible value of the categorical attributes in a cluster more accurately. Furthermore, because the “noise values” are filtered, the measure to calculate the dissimilarity between data objects and cluster centers could be improved. In addition, the algorithm introduces an iterative weight adjustment strategy with combined intra-cluster and inter-cluster information. The unified weight measurement method is used for refining numeric attributes and categorical attributes. Then attributes with higher intra-cluster homogeneity and inter-clusters heterogeneity are considered as attributes with higher priority. They tend to be assigned with relatively heavier weights during clustering. Experimental results on different datasets from the UCI repository show that the MCFCIW algorithm outperforms the existing partition-based clustering algorithm and clustering algorithm based on data conversion for mixed data on both cluster validity indices and convergence speed.
AbstractList •A novel mixed data clustering optimization approach is applied for cluster analysis.•Clustering performance is optimized by noise filtering distribution centroid and similarity measurement.•An Iterative weight adjustment is applied to quantify the influence of various attributes on clustering.•The MCFCAW is an effective method to cluster mixed data. Clustering is an important technology for data analysis. Cluster analysis for mixed data remains challenging. This paper proposes a mixed data clustering algorithm with noise-filtered distribution centroid and iterative weight adjustment strategy. The proposed algorithm defines noise-filtered distribution centroid for categorical attributes. We combine both mean and noise-filtered distribution centroid to represent the cluster center with mixed attributes, the noise-filtered distribution centroid records the frequency of occurrences for each possible value of the categorical attributes in a cluster more accurately. Furthermore, because the “noise values” are filtered, the measure to calculate the dissimilarity between data objects and cluster centers could be improved. In addition, the algorithm introduces an iterative weight adjustment strategy with combined intra-cluster and inter-cluster information. The unified weight measurement method is used for refining numeric attributes and categorical attributes. Then attributes with higher intra-cluster homogeneity and inter-clusters heterogeneity are considered as attributes with higher priority. They tend to be assigned with relatively heavier weights during clustering. Experimental results on different datasets from the UCI repository show that the MCFCIW algorithm outperforms the existing partition-based clustering algorithm and clustering algorithm based on data conversion for mixed data on both cluster validity indices and convergence speed.
Author Li, Xiangjun
He, Daojing
Zhao, Zhibin
Ding, Feng
Wu, Zijie
Author_xml – sequence: 1
  givenname: Xiangjun
  surname: Li
  fullname: Li, Xiangjun
  email: lxjun_alex@163.com
  organization: School of Software, Nanchang University, Nanchang 330046, China
– sequence: 2
  givenname: Zijie
  surname: Wu
  fullname: Wu, Zijie
  email: jiekyw@163.com
  organization: School of Software, Nanchang University, Nanchang 330046, China
– sequence: 3
  givenname: Zhibin
  surname: Zhao
  fullname: Zhao, Zhibin
  email: zhaozhibin@ncu.edu.cn
  organization: School of Software, Nanchang University, Nanchang 330046, China
– sequence: 4
  givenname: Feng
  surname: Ding
  fullname: Ding, Feng
  organization: School of Software, Nanchang University, Nanchang 330046, China
– sequence: 5
  givenname: Daojing
  surname: He
  fullname: He, Daojing
  organization: School of Software, Nanchang University, Nanchang 330046, China
BookMark eNp9kM9OwzAMxiMEEhvwANzyAi1J1zZUnNDEP2kSFzhHbuIMT12Kkmxjb0-mceKwiy3L38_y903ZuR89MnYrRSmFbO9WJflYVqKSpVClmHVnbCLvVVW0VSfP2USIShSiappLNo1xJYSoVdtOWHrka_pByy0k4GbYxISB_JLDsBwDpa813-XK_UgRC0dDXh_UFFOgfpNo9NygT2Eky8FbTlkAibbId0jLr8TBrvLRddbwzEDC5f6aXTgYIt789Sv2-fz0MX8tFu8vb_PHRWGqTqUCXOMagN6CNSBROVMbJbEWpmnvnepN7bDvu87UnZOgADCPKI1rhVP1DGdXTB7vmjDGGNDp70BrCHsthT7Eplc6x6YPsWmhdI4tM-ofYyjBwWd-noaT5MORxGxpSxh0NITeoKWAJmk70gn6F1JJj_8
CitedBy_id crossref_primary_10_1016_j_patcog_2023_109353
crossref_primary_10_1155_2022_4003245
crossref_primary_10_1016_j_eswa_2022_117018
crossref_primary_10_1016_j_eswa_2023_122307
crossref_primary_10_1016_j_patcog_2024_111062
crossref_primary_10_1109_ACCESS_2024_3496929
crossref_primary_10_3390_app12062826
crossref_primary_10_1016_j_bdr_2023_100413
crossref_primary_10_1007_s11036_023_02249_w
Cites_doi 10.1016/j.ins.2016.01.071
10.1016/j.patrec.2008.01.021
10.1016/j.seps.2020.100850
10.1007/s11042-019-08009-x
10.1016/j.ins.2021.04.076
10.1016/j.ins.2021.02.045
10.1109/ACCESS.2019.2903568
10.1080/01621459.1983.10478008
10.1016/j.neucom.2013.04.011
10.1023/A:1024016609528
10.1016/j.ins.2019.12.019
10.1109/TPAMI.2005.95
10.1016/j.patcog.2011.05.016
10.1016/j.ins.2019.07.100
10.1016/j.ins.2007.05.003
10.1016/j.patcog.2013.01.027
10.1016/j.ins.2020.12.051
10.1016/j.ins.2021.04.083
10.1023/A:1022631118932
10.1109/TNNLS.2017.2728138
10.1016/j.patrec.2004.04.004
10.1016/j.datak.2007.03.016
10.1109/ACCESS.2021.3069684
10.1016/j.eswa.2005.11.017
10.1109/ACCESS.2021.3057113
10.2307/2528823
10.1007/BF01908075
10.1109/TPAMI.2007.53
10.1007/s12046-018-0823-0
10.1016/j.csda.2019.106866
10.1109/91.784206
10.1111/insr.12274
10.1109/TIT.1982.1056489
10.1002/j.1538-7305.1948.tb01338.x
10.1023/A:1009769707641
10.1016/j.asoc.2016.06.019
10.1023/A:1009982220290
ContentType Journal Article
Copyright 2021 Elsevier Inc.
Copyright_xml – notice: 2021 Elsevier Inc.
DBID AAYXX
CITATION
DOI 10.1016/j.ins.2021.07.039
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Library & Information Science
EISSN 1872-6291
EndPage 721
ExternalDocumentID 10_1016_j_ins_2021_07_039
S0020025521007295
GroupedDBID --K
--M
--Z
-~X
.DC
.~1
0R~
1B1
1OL
1RT
1~.
1~5
29I
4.4
457
4G.
5GY
5VS
7-5
71M
8P~
9JN
9JO
AAAKF
AAAKG
AABNK
AACTN
AAEDT
AAEDW
AAIAV
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AAQXK
AARIN
AAXUO
AAYFN
ABAOU
ABBOA
ABEFU
ABFNM
ABJNI
ABMAC
ABTAH
ABUCO
ABXDB
ABYKQ
ACAZW
ACDAQ
ACGFS
ACNNM
ACRLP
ACZNC
ADBBV
ADEZE
ADGUI
ADJOM
ADMUD
ADTZH
AEBSH
AECPX
AEKER
AENEX
AFFNX
AFKWA
AFTJW
AGHFR
AGUBO
AGYEJ
AHHHB
AHJVU
AHZHX
AIALX
AIEXJ
AIGVJ
AIKHN
AITUG
AJBFU
AJOXV
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
AOUOD
APLSM
ARUGR
ASPBG
AVWKF
AXJTR
AZFZN
BJAXD
BKOJK
BLXMC
CS3
DU5
EBS
EFJIC
EFLBG
EJD
EO8
EO9
EP2
EP3
F5P
FDB
FEDTE
FGOYB
FIRID
FNPLU
FYGXN
G-Q
GBLVA
GBOLZ
HAMUX
HLZ
HVGLF
HZ~
H~9
IHE
J1W
JJJVA
KOM
LG9
LY1
M41
MHUIS
MO0
MS~
N9A
O-L
O9-
OAUVE
OZT
P-8
P-9
P2P
PC.
Q38
R2-
RIG
ROL
RPZ
SBC
SDF
SDG
SDP
SDS
SES
SEW
SPC
SPCBC
SSB
SSD
SST
SSV
SSW
SSZ
T5K
TN5
TWZ
UHS
WH7
WUQ
XPP
YYP
ZMT
ZY4
~02
~G-
77I
AATTM
AAXKI
AAYWO
AAYXX
ABWVN
ACLOT
ACRPL
ACVFH
ADCNI
ADNMO
ADVLN
AEIPS
AEUPX
AFJKZ
AFPUW
AGQPQ
AIGII
AIIUN
AKBMS
AKRWK
AKYEP
ANKPU
APXCP
CITATION
EFKBS
~HD
ID FETCH-LOGICAL-c297t-af5f5aabdadca1e7fc4c71e40c568f7bc4febb99c49f1a7aaeebbe1cf60f743e3
IEDL.DBID .~1
ISSN 0020-0255
IngestDate Wed Oct 01 05:18:23 EDT 2025
Thu Apr 24 23:05:16 EDT 2025
Fri Feb 23 02:44:13 EST 2024
IsPeerReviewed true
IsScholarly true
Keywords Noise-filtered distribution centroid
Mixed data clustering
Intra-cluster homogeneity
Iterative weight adjustment strategy
Inter-cluster heterogeneity
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c297t-af5f5aabdadca1e7fc4c71e40c568f7bc4febb99c49f1a7aaeebbe1cf60f743e3
PageCount 25
ParticipantIDs crossref_primary_10_1016_j_ins_2021_07_039
crossref_citationtrail_10_1016_j_ins_2021_07_039
elsevier_sciencedirect_doi_10_1016_j_ins_2021_07_039
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate October 2021
2021-10-00
PublicationDateYYYYMMDD 2021-10-01
PublicationDate_xml – month: 10
  year: 2021
  text: October 2021
PublicationDecade 2020
PublicationTitle Information sciences
PublicationYear 2021
Publisher Elsevier Inc
Publisher_xml – name: Elsevier Inc
References Ng, Li, Huang, He (b0185) 2007; 29
Selosse, Jacques, Biernacki (b0200) 2020; 144
Zhou, Liu, Zhu (b0245) 2019; 78
Joshua Zhexue Huang, Michael K. Ng, Hongqiang Rong, Zichen Li, Automated variable weighting in k-means type clustering, IEEE Transactions on Pattern Analysis and Machine Intelligence 27 (5) (2005) 657–668
Holte (b0080) 1993; 11
Wang, Li (b0220) 2021; 564
Ahmad, Khan (b0015) 2019; 7
Huang (b0105) 1998; 2
Huang, Ng (b0110) 1999; 7
Chen, He (b0030) 2016; 345
Dinh, Huynh, Sriboonchitta (b0045) 2021; 571
D’urso, Massari (b0055) 2019; 505
Foss, Markatou, Ray (b0065) 2019; 87
Lloyd (b0170) 1982; 28
Ahmad, Dey (b0005) 2007; 63
Szepannek (b0210) 2018; 10
Ahmad, Hashmi (b0010) 2016; 48
Francisco De, Carvalho, Lechevallier, De Melo (b0040) 2012; 45
Thierry Van de Merckt, Decision trees in numerical attribute spaces, in: International Joint Conference on Artificial Intelligence,OpenReview, 1993, pp. 1016–1016
Yuan, Chen, Li, Zeng, Sang, Luo (b0240) 2021; 572
Ditzler, Polikar (b0050) 2011
Ji, Bai, Zhou, Ma, Wang (b0120) 2013; 120
Arthur Asuncion, David Newman, Uci machine learning repository, [EB/OL], 22 December 2020. URL: https://archive.ics.uci.edu/ml
Kriegel, Kröger, Zimek (b0160) 2012; 2
Ji, Li, Pang, He, Feng, Zhao (b0125) 2021; 9
Jin, Zhao, Zhang, Gao, Dou, Mengkang (b0135) 2020; 38
Popoola, Tapamo, Assounga (b0190) 2021; 9
Wikipedia and Free Encyclopedia, Coefficient of variation. [EB/OL], 22 December 2020. URL: https://en.wikipedia.org/wiki/Coefficient_of_variation
Yang (b0235) 1999; 1
Guangxia, Zhang, Ma, Liu (b0230) 2020; 515
Jia, Cheung (b0130) 2017; 29
Modha, Scott Spangler (b0180) 2003; 52
Zhexue Huang, Clustering large data sets with mixed numeric and categorical values, in: Proceedings of the 1st Pacific-asia Conference on Knowledge Discovery and Data Mining,(PAKDD), Citeseer, 1997, pp. 21–34
Caruso, Gattone, Fortuna, Di Battista (b0025) 2021; 73
Fayyad, Irani (b0060) 1993; 1993
Sangam, Om (b0195) 2018; 43
Kim, Lee, Lee (b0150) 2004; 25
Hubert, Arabie (b0115) 1985; 2
Kim (b0155) 2017; 32
Gower (b0075) 1971; 27
Kaufman, Rousseeuw (b0140) 2009
Ren-Jieh Kuo, Y.R. Zheng, Thi Phuong Quyen Nguyen, Metaheuristic-based possibilistic fuzzy k-modes algorithms for categorical data clustering, Information Sciences 557 (2021) 1–15
Kerber (b0145) 1992
McCane, Albert (b0175) 2008; 29
Fowlkes, Mallows (b0070) 1983; 78
Hsu, Chen (b0090) 2007; 32
Shannon (b0205) 1948; 27
Cheung, Jia (b0035) 2013; 46
Hsu, Chen, Yu-Wei (b0085) 2007; 177
Gower (10.1016/j.ins.2021.07.039_b0075) 1971; 27
Ng (10.1016/j.ins.2021.07.039_b0185) 2007; 29
Cheung (10.1016/j.ins.2021.07.039_b0035) 2013; 46
Selosse (10.1016/j.ins.2021.07.039_b0200) 2020; 144
Huang (10.1016/j.ins.2021.07.039_b0110) 1999; 7
Ditzler (10.1016/j.ins.2021.07.039_b0050) 2011
Szepannek (10.1016/j.ins.2021.07.039_b0210) 2018; 10
10.1016/j.ins.2021.07.039_b0215
Fowlkes (10.1016/j.ins.2021.07.039_b0070) 1983; 78
Kaufman (10.1016/j.ins.2021.07.039_b0140) 2009
Modha (10.1016/j.ins.2021.07.039_b0180) 2003; 52
Jia (10.1016/j.ins.2021.07.039_b0130) 2017; 29
Francisco De (10.1016/j.ins.2021.07.039_b0040) 2012; 45
Lloyd (10.1016/j.ins.2021.07.039_b0170) 1982; 28
10.1016/j.ins.2021.07.039_b0095
Chen (10.1016/j.ins.2021.07.039_b0030) 2016; 345
Foss (10.1016/j.ins.2021.07.039_b0065) 2019; 87
Caruso (10.1016/j.ins.2021.07.039_b0025) 2021; 73
McCane (10.1016/j.ins.2021.07.039_b0175) 2008; 29
Dinh (10.1016/j.ins.2021.07.039_b0045) 2021; 571
Huang (10.1016/j.ins.2021.07.039_b0105) 1998; 2
10.1016/j.ins.2021.07.039_b0165
Yuan (10.1016/j.ins.2021.07.039_b0240) 2021; 572
Hsu (10.1016/j.ins.2021.07.039_b0090) 2007; 32
Hubert (10.1016/j.ins.2021.07.039_b0115) 1985; 2
Ahmad (10.1016/j.ins.2021.07.039_b0010) 2016; 48
Ahmad (10.1016/j.ins.2021.07.039_b0015) 2019; 7
Ji (10.1016/j.ins.2021.07.039_b0120) 2013; 120
Zhou (10.1016/j.ins.2021.07.039_b0245) 2019; 78
Kim (10.1016/j.ins.2021.07.039_b0150) 2004; 25
Kriegel (10.1016/j.ins.2021.07.039_b0160) 2012; 2
Wang (10.1016/j.ins.2021.07.039_b0220) 2021; 564
Guangxia (10.1016/j.ins.2021.07.039_b0230) 2020; 515
Kim (10.1016/j.ins.2021.07.039_b0155) 2017; 32
Ahmad (10.1016/j.ins.2021.07.039_b0005) 2007; 63
Popoola (10.1016/j.ins.2021.07.039_b0190) 2021; 9
Holte (10.1016/j.ins.2021.07.039_b0080) 1993; 11
D’urso (10.1016/j.ins.2021.07.039_b0055) 2019; 505
10.1016/j.ins.2021.07.039_b0225
Yang (10.1016/j.ins.2021.07.039_b0235) 1999; 1
Shannon (10.1016/j.ins.2021.07.039_b0205) 1948; 27
Ji (10.1016/j.ins.2021.07.039_b0125) 2021; 9
Sangam (10.1016/j.ins.2021.07.039_b0195) 2018; 43
10.1016/j.ins.2021.07.039_b0020
Hsu (10.1016/j.ins.2021.07.039_b0085) 2007; 177
Kerber (10.1016/j.ins.2021.07.039_b0145) 1992
Fayyad (10.1016/j.ins.2021.07.039_b0060) 1993; 1993
10.1016/j.ins.2021.07.039_b0100
Jin (10.1016/j.ins.2021.07.039_b0135) 2020; 38
References_xml – volume: 27
  start-page: 379
  year: 1948
  end-page: 423
  ident: b0205
  article-title: A mathematical theory of communication
  publication-title: The Bell System Technical Journal
– volume: 505
  start-page: 513
  year: 2019
  end-page: 534
  ident: b0055
  article-title: Fuzzy clustering of mixed data
  publication-title: Information Sciences
– volume: 177
  start-page: 4474
  year: 2007
  end-page: 4492
  ident: b0085
  article-title: Hierarchical clustering of mixed data based on distance hierarchy
  publication-title: Information Sciences
– volume: 32
  start-page: 979
  year: 2017
  end-page: 990
  ident: b0155
  article-title: A weighted k-modes clustering using new weighting method based on within-cluster and between-cluster impurity measures
  publication-title: Journal of Intelligent & Fuzzy Systems
– volume: 9
  start-page: 52125
  year: 2021
  end-page: 52143
  ident: b0190
  article-title: Cluster analysis of mixed and missing chronic kidney disease data in kwazulu-natal province, south africa
  publication-title: IEEE Access
– volume: 515
  start-page: 280
  year: 2020
  end-page: 293
  ident: b0230
  article-title: A mixed attributes oriented dynamic som fuzzy cluster algorithm for mobile user classification
  publication-title: Information Sciences
– volume: 2
  start-page: 351
  year: 2012
  end-page: 364
  ident: b0160
  article-title: Subspace clustering
  publication-title: Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
– volume: 11
  start-page: 63
  year: 1993
  end-page: 90
  ident: b0080
  article-title: Very simple classification rules perform well on most commonly used datasets
  publication-title: Machine Learning
– volume: 32
  start-page: 12
  year: 2007
  end-page: 23
  ident: b0090
  article-title: Mining of mixed data with application to catalog marketing
  publication-title: Expert Systems with Applications
– volume: 1
  start-page: 69
  year: 1999
  end-page: 90
  ident: b0235
  article-title: An evaluation of statistical approaches to text categorization
  publication-title: Information Retrieval
– volume: 38
  start-page: 3319
  year: 2020
  end-page: 3330
  ident: b0135
  article-title: Adaptive soft subspace clustering combining within-cluster and between-cluster information
  publication-title: Journal of Intelligent & Fuzzy Systems
– volume: 48
  start-page: 39
  year: 2016
  end-page: 49
  ident: b0010
  article-title: K-harmonic means type clustering algorithm for mixed datasets
  publication-title: Applied Soft Computing
– year: 2009
  ident: b0140
  article-title: Finding Groups in Data: An Introduction to Cluster Analysis
– reference: Ren-Jieh Kuo, Y.R. Zheng, Thi Phuong Quyen Nguyen, Metaheuristic-based possibilistic fuzzy k-modes algorithms for categorical data clustering, Information Sciences 557 (2021) 1–15
– reference: Joshua Zhexue Huang, Michael K. Ng, Hongqiang Rong, Zichen Li, Automated variable weighting in k-means type clustering, IEEE Transactions on Pattern Analysis and Machine Intelligence 27 (5) (2005) 657–668
– volume: 29
  start-page: 3308
  year: 2017
  end-page: 3325
  ident: b0130
  article-title: Subspace clustering of categorical and numerical data with an unknown number of clusters
  publication-title: IEEE Transactions on Neural Networks and Learning Systems
– volume: 73
  year: 2021
  ident: b0025
  article-title: Cluster analysis for mixed data: An application to credit risk evaluation
  publication-title: Socio-Economic Planning Sciences
– volume: 29
  start-page: 986
  year: 2008
  end-page: 993
  ident: b0175
  article-title: Distance functions for categorical and mixed variables
  publication-title: Pattern Recognition Letters
– volume: 120
  start-page: 590
  year: 2013
  end-page: 596
  ident: b0120
  article-title: An improved k-prototypes clustering algorithm for mixed numeric and categorical data
  publication-title: Neurocomputing
– volume: 45
  start-page: 447
  year: 2012
  end-page: 464
  ident: b0040
  article-title: Partitioning hard clustering algorithms based on multiple dissimilarity matrices
  publication-title: Pattern Recognition
– volume: 28
  start-page: 129
  year: 1982
  end-page: 137
  ident: b0170
  article-title: Least squares quantization in pcm
  publication-title: IEEE Transactions on Information Theory
– volume: 7
  start-page: 446
  year: 1999
  end-page: 452
  ident: b0110
  article-title: A fuzzy k-modes algorithm for clustering categorical data
  publication-title: IEEE Transactions on Fuzzy Systems
– volume: 52
  start-page: 217
  year: 2003
  end-page: 237
  ident: b0180
  article-title: Feature weighting in k-means clustering
  publication-title: Machine Learning
– volume: 10
  start-page: 200
  year: 2018
  ident: b0210
  article-title: clustmixtype: User-friendly clustering of mixed-type data in r
  publication-title: R Journal
– volume: 43
  start-page: 37
  year: 2018
  ident: b0195
  article-title: An equi-biased k-prototypes algorithm for clustering mixed-type data
  publication-title: Sādhanā
– volume: 63
  start-page: 503
  year: 2007
  end-page: 527
  ident: b0005
  article-title: A k-mean clustering algorithm for mixed numeric and categorical data
  publication-title: Data & Knowledge Engineering
– volume: 564
  start-page: 396
  year: 2021
  end-page: 415
  ident: b0220
  article-title: Outlier detection based on weighted neighbourhood information network for mixed-valued datasets
  publication-title: Information Sciences
– volume: 7
  start-page: 31883
  year: 2019
  end-page: 31902
  ident: b0015
  article-title: Survey of state-of-the-art mixed data clustering algorithms
  publication-title: IEEE Access
– volume: 144
  year: 2020
  ident: b0200
  article-title: Model-based co-clustering for mixed type data
  publication-title: Computational Statistics & Data Analysis
– volume: 572
  start-page: 67
  year: 2021
  end-page: 87
  ident: b0240
  article-title: Unsupervised attribute reduction for mixed data based on fuzzy rough sets
  publication-title: Information Sciences
– volume: 29
  start-page: 503
  year: 2007
  end-page: 507
  ident: b0185
  article-title: On the impact of dissimilarity measure in k-modes clustering algorithm
  publication-title: IEEE Transactions on Pattern Analysis and Machine Intelligence
– volume: 78
  start-page: 553
  year: 1983
  end-page: 569
  ident: b0070
  article-title: A method for comparing two hierarchical clusterings
  publication-title: Journal of the American statistical association
– volume: 2
  start-page: 193
  year: 1985
  end-page: 218
  ident: b0115
  article-title: Comparing partitions
  publication-title: Journal of Classification
– volume: 1993
  start-page: 1022
  year: 1993
  end-page: 1027
  ident: b0060
  article-title: Multi-interval discretization of continuous-valued attributes for classification learning
  publication-title: Machine Learning
– reference: Zhexue Huang, Clustering large data sets with mixed numeric and categorical values, in: Proceedings of the 1st Pacific-asia Conference on Knowledge Discovery and Data Mining,(PAKDD), Citeseer, 1997, pp. 21–34
– volume: 46
  start-page: 2228
  year: 2013
  end-page: 2238
  ident: b0035
  article-title: Categorical-and-numerical-attribute data clustering based on a unified similarity metric without knowing cluster number
  publication-title: Pattern Recognition
– volume: 2
  start-page: 283
  year: 1998
  end-page: 304
  ident: b0105
  article-title: Extensions to the k-means algorithm for clustering large data sets with categorical values
  publication-title: Data Mining and Knowledge Discovery
– start-page: 123
  year: 1992
  end-page: 128
  ident: b0145
  article-title: Chimerge: Discretization of numeric attributes
  publication-title: Proceedings of the Tenth National Conference on Artificial Intelligence
– volume: 78
  start-page: 33415
  year: 2019
  end-page: 33434
  ident: b0245
  article-title: Weighted adjacent matrix for k-means clustering
  publication-title: Multimedia Tools and Applications
– volume: 9
  start-page: 24913
  year: 2021
  end-page: 24924
  ident: b0125
  article-title: A multi-view clustering algorithm for mixed numeric and categorical data
  publication-title: IEEE Access
– reference: Thierry Van de Merckt, Decision trees in numerical attribute spaces, in: International Joint Conference on Artificial Intelligence,OpenReview, 1993, pp. 1016–1016
– volume: 25
  start-page: 1263
  year: 2004
  end-page: 1271
  ident: b0150
  article-title: Fuzzy clustering of categorical data using fuzzy centroids
  publication-title: Pattern Recognition Letters
– reference: Arthur Asuncion, David Newman, Uci machine learning repository, [EB/OL], 22 December 2020. URL: https://archive.ics.uci.edu/ml
– volume: 345
  start-page: 271
  year: 2016
  end-page: 293
  ident: b0030
  article-title: A fast density-based data stream clustering algorithm with cluster centers self-determined for mixed data
  publication-title: Information Sciences
– volume: 571
  start-page: 418
  year: 2021
  end-page: 442
  ident: b0045
  article-title: Clustering mixed numerical and categorical data with missing values
  publication-title: Information Sciences
– start-page: 41
  year: 2011
  end-page: 48
  ident: b0050
  article-title: Hellinger distance based drift detection for nonstationary environments
  publication-title: 2011 IEEE Symposium on Computational Intelligence in Dynamic and Uncertain Environments (CIDUE)
– reference: Wikipedia and Free Encyclopedia, Coefficient of variation. [EB/OL], 22 December 2020. URL: https://en.wikipedia.org/wiki/Coefficient_of_variation
– volume: 27
  start-page: 857
  year: 1971
  end-page: 871
  ident: b0075
  article-title: A general coefficient of similarity and some of its properties
  publication-title: Biometrics
– volume: 87
  start-page: 80
  year: 2019
  end-page: 109
  ident: b0065
  article-title: Distance metrics and clustering methods for mixed-type data
  publication-title: International Statistical Review
– volume: 345
  start-page: 271
  year: 2016
  ident: 10.1016/j.ins.2021.07.039_b0030
  article-title: A fast density-based data stream clustering algorithm with cluster centers self-determined for mixed data
  publication-title: Information Sciences
  doi: 10.1016/j.ins.2016.01.071
– volume: 29
  start-page: 986
  issue: 7
  year: 2008
  ident: 10.1016/j.ins.2021.07.039_b0175
  article-title: Distance functions for categorical and mixed variables
  publication-title: Pattern Recognition Letters
  doi: 10.1016/j.patrec.2008.01.021
– volume: 73
  year: 2021
  ident: 10.1016/j.ins.2021.07.039_b0025
  article-title: Cluster analysis for mixed data: An application to credit risk evaluation
  publication-title: Socio-Economic Planning Sciences
  doi: 10.1016/j.seps.2020.100850
– volume: 2
  start-page: 351
  issue: 4
  year: 2012
  ident: 10.1016/j.ins.2021.07.039_b0160
  article-title: Subspace clustering
  publication-title: Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
– volume: 78
  start-page: 33415
  issue: 23
  year: 2019
  ident: 10.1016/j.ins.2021.07.039_b0245
  article-title: Weighted adjacent matrix for k-means clustering
  publication-title: Multimedia Tools and Applications
  doi: 10.1007/s11042-019-08009-x
– volume: 571
  start-page: 418
  year: 2021
  ident: 10.1016/j.ins.2021.07.039_b0045
  article-title: Clustering mixed numerical and categorical data with missing values
  publication-title: Information Sciences
  doi: 10.1016/j.ins.2021.04.076
– volume: 10
  start-page: 200
  issue: 2
  year: 2018
  ident: 10.1016/j.ins.2021.07.039_b0210
  article-title: clustmixtype: User-friendly clustering of mixed-type data in r
  publication-title: R Journal
– ident: 10.1016/j.ins.2021.07.039_b0020
– volume: 564
  start-page: 396
  year: 2021
  ident: 10.1016/j.ins.2021.07.039_b0220
  article-title: Outlier detection based on weighted neighbourhood information network for mixed-valued datasets
  publication-title: Information Sciences
  doi: 10.1016/j.ins.2021.02.045
– volume: 7
  start-page: 31883
  year: 2019
  ident: 10.1016/j.ins.2021.07.039_b0015
  article-title: Survey of state-of-the-art mixed data clustering algorithms
  publication-title: IEEE Access
  doi: 10.1109/ACCESS.2019.2903568
– volume: 78
  start-page: 553
  issue: 383
  year: 1983
  ident: 10.1016/j.ins.2021.07.039_b0070
  article-title: A method for comparing two hierarchical clusterings
  publication-title: Journal of the American statistical association
  doi: 10.1080/01621459.1983.10478008
– volume: 120
  start-page: 590
  year: 2013
  ident: 10.1016/j.ins.2021.07.039_b0120
  article-title: An improved k-prototypes clustering algorithm for mixed numeric and categorical data
  publication-title: Neurocomputing
  doi: 10.1016/j.neucom.2013.04.011
– volume: 52
  start-page: 217
  issue: 3
  year: 2003
  ident: 10.1016/j.ins.2021.07.039_b0180
  article-title: Feature weighting in k-means clustering
  publication-title: Machine Learning
  doi: 10.1023/A:1024016609528
– volume: 515
  start-page: 280
  year: 2020
  ident: 10.1016/j.ins.2021.07.039_b0230
  article-title: A mixed attributes oriented dynamic som fuzzy cluster algorithm for mobile user classification
  publication-title: Information Sciences
  doi: 10.1016/j.ins.2019.12.019
– volume: 32
  start-page: 979
  issue: 1
  year: 2017
  ident: 10.1016/j.ins.2021.07.039_b0155
  article-title: A weighted k-modes clustering using new weighting method based on within-cluster and between-cluster impurity measures
  publication-title: Journal of Intelligent & Fuzzy Systems
– ident: 10.1016/j.ins.2021.07.039_b0095
  doi: 10.1109/TPAMI.2005.95
– ident: 10.1016/j.ins.2021.07.039_b0100
– volume: 45
  start-page: 447
  issue: 1
  year: 2012
  ident: 10.1016/j.ins.2021.07.039_b0040
  article-title: Partitioning hard clustering algorithms based on multiple dissimilarity matrices
  publication-title: Pattern Recognition
  doi: 10.1016/j.patcog.2011.05.016
– volume: 505
  start-page: 513
  year: 2019
  ident: 10.1016/j.ins.2021.07.039_b0055
  article-title: Fuzzy clustering of mixed data
  publication-title: Information Sciences
  doi: 10.1016/j.ins.2019.07.100
– volume: 177
  start-page: 4474
  issue: 20
  year: 2007
  ident: 10.1016/j.ins.2021.07.039_b0085
  article-title: Hierarchical clustering of mixed data based on distance hierarchy
  publication-title: Information Sciences
  doi: 10.1016/j.ins.2007.05.003
– volume: 46
  start-page: 2228
  issue: 8
  year: 2013
  ident: 10.1016/j.ins.2021.07.039_b0035
  article-title: Categorical-and-numerical-attribute data clustering based on a unified similarity metric without knowing cluster number
  publication-title: Pattern Recognition
  doi: 10.1016/j.patcog.2013.01.027
– ident: 10.1016/j.ins.2021.07.039_b0165
  doi: 10.1016/j.ins.2020.12.051
– ident: 10.1016/j.ins.2021.07.039_b0215
– volume: 572
  start-page: 67
  year: 2021
  ident: 10.1016/j.ins.2021.07.039_b0240
  article-title: Unsupervised attribute reduction for mixed data based on fuzzy rough sets
  publication-title: Information Sciences
  doi: 10.1016/j.ins.2021.04.083
– volume: 11
  start-page: 63
  issue: 1
  year: 1993
  ident: 10.1016/j.ins.2021.07.039_b0080
  article-title: Very simple classification rules perform well on most commonly used datasets
  publication-title: Machine Learning
  doi: 10.1023/A:1022631118932
– year: 2009
  ident: 10.1016/j.ins.2021.07.039_b0140
– start-page: 41
  year: 2011
  ident: 10.1016/j.ins.2021.07.039_b0050
  article-title: Hellinger distance based drift detection for nonstationary environments
– volume: 1993
  start-page: 1022
  year: 1993
  ident: 10.1016/j.ins.2021.07.039_b0060
  article-title: Multi-interval discretization of continuous-valued attributes for classification learning
  publication-title: Machine Learning
– volume: 29
  start-page: 3308
  issue: 8
  year: 2017
  ident: 10.1016/j.ins.2021.07.039_b0130
  article-title: Subspace clustering of categorical and numerical data with an unknown number of clusters
  publication-title: IEEE Transactions on Neural Networks and Learning Systems
  doi: 10.1109/TNNLS.2017.2728138
– ident: 10.1016/j.ins.2021.07.039_b0225
– volume: 25
  start-page: 1263
  issue: 11
  year: 2004
  ident: 10.1016/j.ins.2021.07.039_b0150
  article-title: Fuzzy clustering of categorical data using fuzzy centroids
  publication-title: Pattern Recognition Letters
  doi: 10.1016/j.patrec.2004.04.004
– volume: 63
  start-page: 503
  issue: 2
  year: 2007
  ident: 10.1016/j.ins.2021.07.039_b0005
  article-title: A k-mean clustering algorithm for mixed numeric and categorical data
  publication-title: Data & Knowledge Engineering
  doi: 10.1016/j.datak.2007.03.016
– volume: 9
  start-page: 52125
  year: 2021
  ident: 10.1016/j.ins.2021.07.039_b0190
  article-title: Cluster analysis of mixed and missing chronic kidney disease data in kwazulu-natal province, south africa
  publication-title: IEEE Access
  doi: 10.1109/ACCESS.2021.3069684
– volume: 32
  start-page: 12
  issue: 1
  year: 2007
  ident: 10.1016/j.ins.2021.07.039_b0090
  article-title: Mining of mixed data with application to catalog marketing
  publication-title: Expert Systems with Applications
  doi: 10.1016/j.eswa.2005.11.017
– volume: 9
  start-page: 24913
  year: 2021
  ident: 10.1016/j.ins.2021.07.039_b0125
  article-title: A multi-view clustering algorithm for mixed numeric and categorical data
  publication-title: IEEE Access
  doi: 10.1109/ACCESS.2021.3057113
– volume: 27
  start-page: 857
  year: 1971
  ident: 10.1016/j.ins.2021.07.039_b0075
  article-title: A general coefficient of similarity and some of its properties
  publication-title: Biometrics
  doi: 10.2307/2528823
– volume: 2
  start-page: 193
  issue: 1
  year: 1985
  ident: 10.1016/j.ins.2021.07.039_b0115
  article-title: Comparing partitions
  publication-title: Journal of Classification
  doi: 10.1007/BF01908075
– volume: 29
  start-page: 503
  issue: 3
  year: 2007
  ident: 10.1016/j.ins.2021.07.039_b0185
  article-title: On the impact of dissimilarity measure in k-modes clustering algorithm
  publication-title: IEEE Transactions on Pattern Analysis and Machine Intelligence
  doi: 10.1109/TPAMI.2007.53
– volume: 43
  start-page: 37
  issue: 3
  year: 2018
  ident: 10.1016/j.ins.2021.07.039_b0195
  article-title: An equi-biased k-prototypes algorithm for clustering mixed-type data
  publication-title: Sādhanā
  doi: 10.1007/s12046-018-0823-0
– volume: 144
  year: 2020
  ident: 10.1016/j.ins.2021.07.039_b0200
  article-title: Model-based co-clustering for mixed type data
  publication-title: Computational Statistics & Data Analysis
  doi: 10.1016/j.csda.2019.106866
– volume: 7
  start-page: 446
  issue: 4
  year: 1999
  ident: 10.1016/j.ins.2021.07.039_b0110
  article-title: A fuzzy k-modes algorithm for clustering categorical data
  publication-title: IEEE Transactions on Fuzzy Systems
  doi: 10.1109/91.784206
– volume: 38
  start-page: 3319
  issue: 3
  year: 2020
  ident: 10.1016/j.ins.2021.07.039_b0135
  article-title: Adaptive soft subspace clustering combining within-cluster and between-cluster information
  publication-title: Journal of Intelligent & Fuzzy Systems
– volume: 87
  start-page: 80
  issue: 1
  year: 2019
  ident: 10.1016/j.ins.2021.07.039_b0065
  article-title: Distance metrics and clustering methods for mixed-type data
  publication-title: International Statistical Review
  doi: 10.1111/insr.12274
– volume: 28
  start-page: 129
  issue: 2
  year: 1982
  ident: 10.1016/j.ins.2021.07.039_b0170
  article-title: Least squares quantization in pcm
  publication-title: IEEE Transactions on Information Theory
  doi: 10.1109/TIT.1982.1056489
– volume: 27
  start-page: 379
  issue: 3
  year: 1948
  ident: 10.1016/j.ins.2021.07.039_b0205
  article-title: A mathematical theory of communication
  publication-title: The Bell System Technical Journal
  doi: 10.1002/j.1538-7305.1948.tb01338.x
– start-page: 123
  year: 1992
  ident: 10.1016/j.ins.2021.07.039_b0145
  article-title: Chimerge: Discretization of numeric attributes
– volume: 2
  start-page: 283
  issue: 3
  year: 1998
  ident: 10.1016/j.ins.2021.07.039_b0105
  article-title: Extensions to the k-means algorithm for clustering large data sets with categorical values
  publication-title: Data Mining and Knowledge Discovery
  doi: 10.1023/A:1009769707641
– volume: 48
  start-page: 39
  year: 2016
  ident: 10.1016/j.ins.2021.07.039_b0010
  article-title: K-harmonic means type clustering algorithm for mixed datasets
  publication-title: Applied Soft Computing
  doi: 10.1016/j.asoc.2016.06.019
– volume: 1
  start-page: 69
  issue: 1–2
  year: 1999
  ident: 10.1016/j.ins.2021.07.039_b0235
  article-title: An evaluation of statistical approaches to text categorization
  publication-title: Information Retrieval
  doi: 10.1023/A:1009982220290
SSID ssj0004766
Score 2.4355118
Snippet •A novel mixed data clustering optimization approach is applied for cluster analysis.•Clustering performance is optimized by noise filtering distribution...
SourceID crossref
elsevier
SourceType Enrichment Source
Index Database
Publisher
StartPage 697
SubjectTerms Inter-cluster heterogeneity
Intra-cluster homogeneity
Iterative weight adjustment strategy
Mixed data clustering
Noise-filtered distribution centroid
Title A mixed data clustering algorithm with noise-filtered distribution centroid and iterative weight adjustment strategy
URI https://dx.doi.org/10.1016/j.ins.2021.07.039
Volume 577
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVESC
  databaseName: Baden-Württemberg Complete Freedom Collection (Elsevier)
  customDbUrl:
  eissn: 1872-6291
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0004766
  issn: 0020-0255
  databaseCode: GBLVA
  dateStart: 20110101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
– providerCode: PRVESC
  databaseName: Elsevier Freedom Collection
  customDbUrl:
  eissn: 1872-6291
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0004766
  issn: 0020-0255
  databaseCode: AIKHN
  dateStart: 19950101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
– providerCode: PRVESC
  databaseName: Elsevier SD Complete Freedom Collection
  customDbUrl:
  eissn: 1872-6291
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0004766
  issn: 0020-0255
  databaseCode: ACRLP
  dateStart: 19950101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
– providerCode: PRVESC
  databaseName: ScienceDirect (Elsevier)
  customDbUrl:
  eissn: 1872-6291
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0004766
  issn: 0020-0255
  databaseCode: .~1
  dateStart: 19950101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
– providerCode: PRVLSH
  databaseName: Elsevier Journals
  customDbUrl:
  mediaType: online
  eissn: 1872-6291
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0004766
  issn: 0020-0255
  databaseCode: AKRWK
  dateStart: 19681201
  isFulltext: true
  providerName: Library Specific Holdings
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV07T8MwELaqssCAeIry0g2IASm0CU7djFVFVUB0olK3yE8IKklVgoCF347PcXhIwMDoyBdFd875fP7uO0KOlJCMC3tMtfaNA8qRAzIScWASqYWhRipXHn097o4m9HIaTxtkUNfCIKzS-_7Kpztv7Z-0vTbb8yzDGt_IRcRR6OivsdCcUoZdDE7fPmEelFX3lXhMwtn1zabDeGU5MnZHoePvxH7hP-1NX_ab4RpZ9YEi9KtvWScNnW-QlS_0gRvkwBcdwDH4qiLUMvjfdZOUfXjIXrQChIGCnD0hKYKVBD67LRZZefcAmIaFvMgedWAyvDjH2cil69tggQNvFpkCniuoKJitf4Rnl1EFru7tSzHBCI8Vze3rFpkMz28Go8B3WQhklLAy4CY2MedCcSV5qJmRVLJQ046Muz3DhKRGC5EkkiYm5IxzbYc6lKbbMTb80GfbpJkXud4hgOR_uoeLIsLOZiJRUneTMAo5ss7RqEU6tX5T6SnIsRPGLK2xZvepNUmKJkk7LLUmaZGTD5F5xb_x12RaGy39tohSuz_8Lrb7P7E9soyjCtm3T5rl4kkf2AilFIduCR6Spf7F1Wj8DtcW6rk
linkProvider Elsevier
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LT9wwEB7xOJQeENCiQqHMAfVQKWXjOmtyRKhoeZ5A4mb5CVktWcRmVbjw2-txHAoS5cAxiSeKPM54ZvzNNwDbVhuhdAhTg36LjCvigGS6yHxpnPbcGxvLo0_P-oMLfnRZXM7AflcLQ7DKZPtbmx6tdbqzk2Zz57aqqMaXRY-Y5ZH-upiFeV4wQRHYz8d_OA8u2gNLipNoeHe0GUFeVU2U3SyPBJ7UMPy1zenZhnOwBIvJU8S99mOWYcbVK_DxGX_gCmymqgP8jqmsiKYZ0__6CZo9vKnunUXCgaIZTYkVIUiiGl2N76rm-gYpD4v1uJq4zFd0ck6jiUw39cHCiN4cVxZVbbHlYA4GEv_ElCoqOwwvpQwjTlqe24fPcHHw-3x_kKU2C5lhpWgy5QtfKKWtskblTnjDjcgd75miv-uFNtw7rcvS8NLnSijlwqXLje_3fPA_3K9VmKvHtfsCSOx_bpdWBaPWZrq0xvXLnOWKaOc4W4NeN7_SJA5yaoUxkh3YbCiDSiSpRPaEDCpZgx9PIrctAcdbg3mnNPliFcmwQfxfbP19YlvwYXB-eiJPDs-Ov8ICPWlhfhsw19xN3WZwVxr9LS7Hv0_07E4
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+mixed+data+clustering+algorithm+with+noise-filtered+distribution+centroid+and+iterative+weight+adjustment+strategy&rft.jtitle=Information+sciences&rft.au=Li%2C+Xiangjun&rft.au=Wu%2C+Zijie&rft.au=Zhao%2C+Zhibin&rft.au=Ding%2C+Feng&rft.date=2021-10-01&rft.pub=Elsevier+Inc&rft.issn=0020-0255&rft.eissn=1872-6291&rft.volume=577&rft.spage=697&rft.epage=721&rft_id=info:doi/10.1016%2Fj.ins.2021.07.039&rft.externalDocID=S0020025521007295
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0020-0255&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0020-0255&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0020-0255&client=summon