A mixed data clustering algorithm with noise-filtered distribution centroid and iterative weight adjustment strategy
•A novel mixed data clustering optimization approach is applied for cluster analysis.•Clustering performance is optimized by noise filtering distribution centroid and similarity measurement.•An Iterative weight adjustment is applied to quantify the influence of various attributes on clustering.•The...
Saved in:
| Published in | Information sciences Vol. 577; pp. 697 - 721 |
|---|---|
| Main Authors | , , , , |
| Format | Journal Article |
| Language | English |
| Published |
Elsevier Inc
01.10.2021
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 0020-0255 1872-6291 |
| DOI | 10.1016/j.ins.2021.07.039 |
Cover
| Abstract | •A novel mixed data clustering optimization approach is applied for cluster analysis.•Clustering performance is optimized by noise filtering distribution centroid and similarity measurement.•An Iterative weight adjustment is applied to quantify the influence of various attributes on clustering.•The MCFCAW is an effective method to cluster mixed data.
Clustering is an important technology for data analysis. Cluster analysis for mixed data remains challenging. This paper proposes a mixed data clustering algorithm with noise-filtered distribution centroid and iterative weight adjustment strategy. The proposed algorithm defines noise-filtered distribution centroid for categorical attributes. We combine both mean and noise-filtered distribution centroid to represent the cluster center with mixed attributes, the noise-filtered distribution centroid records the frequency of occurrences for each possible value of the categorical attributes in a cluster more accurately. Furthermore, because the “noise values” are filtered, the measure to calculate the dissimilarity between data objects and cluster centers could be improved. In addition, the algorithm introduces an iterative weight adjustment strategy with combined intra-cluster and inter-cluster information. The unified weight measurement method is used for refining numeric attributes and categorical attributes. Then attributes with higher intra-cluster homogeneity and inter-clusters heterogeneity are considered as attributes with higher priority. They tend to be assigned with relatively heavier weights during clustering. Experimental results on different datasets from the UCI repository show that the MCFCIW algorithm outperforms the existing partition-based clustering algorithm and clustering algorithm based on data conversion for mixed data on both cluster validity indices and convergence speed. |
|---|---|
| AbstractList | •A novel mixed data clustering optimization approach is applied for cluster analysis.•Clustering performance is optimized by noise filtering distribution centroid and similarity measurement.•An Iterative weight adjustment is applied to quantify the influence of various attributes on clustering.•The MCFCAW is an effective method to cluster mixed data.
Clustering is an important technology for data analysis. Cluster analysis for mixed data remains challenging. This paper proposes a mixed data clustering algorithm with noise-filtered distribution centroid and iterative weight adjustment strategy. The proposed algorithm defines noise-filtered distribution centroid for categorical attributes. We combine both mean and noise-filtered distribution centroid to represent the cluster center with mixed attributes, the noise-filtered distribution centroid records the frequency of occurrences for each possible value of the categorical attributes in a cluster more accurately. Furthermore, because the “noise values” are filtered, the measure to calculate the dissimilarity between data objects and cluster centers could be improved. In addition, the algorithm introduces an iterative weight adjustment strategy with combined intra-cluster and inter-cluster information. The unified weight measurement method is used for refining numeric attributes and categorical attributes. Then attributes with higher intra-cluster homogeneity and inter-clusters heterogeneity are considered as attributes with higher priority. They tend to be assigned with relatively heavier weights during clustering. Experimental results on different datasets from the UCI repository show that the MCFCIW algorithm outperforms the existing partition-based clustering algorithm and clustering algorithm based on data conversion for mixed data on both cluster validity indices and convergence speed. |
| Author | Li, Xiangjun He, Daojing Zhao, Zhibin Ding, Feng Wu, Zijie |
| Author_xml | – sequence: 1 givenname: Xiangjun surname: Li fullname: Li, Xiangjun email: lxjun_alex@163.com organization: School of Software, Nanchang University, Nanchang 330046, China – sequence: 2 givenname: Zijie surname: Wu fullname: Wu, Zijie email: jiekyw@163.com organization: School of Software, Nanchang University, Nanchang 330046, China – sequence: 3 givenname: Zhibin surname: Zhao fullname: Zhao, Zhibin email: zhaozhibin@ncu.edu.cn organization: School of Software, Nanchang University, Nanchang 330046, China – sequence: 4 givenname: Feng surname: Ding fullname: Ding, Feng organization: School of Software, Nanchang University, Nanchang 330046, China – sequence: 5 givenname: Daojing surname: He fullname: He, Daojing organization: School of Software, Nanchang University, Nanchang 330046, China |
| BookMark | eNp9kM9OwzAMxiMEEhvwANzyAi1J1zZUnNDEP2kSFzhHbuIMT12Kkmxjb0-mceKwiy3L38_y903ZuR89MnYrRSmFbO9WJflYVqKSpVClmHVnbCLvVVW0VSfP2USIShSiappLNo1xJYSoVdtOWHrka_pByy0k4GbYxISB_JLDsBwDpa813-XK_UgRC0dDXh_UFFOgfpNo9NygT2Eky8FbTlkAibbId0jLr8TBrvLRddbwzEDC5f6aXTgYIt789Sv2-fz0MX8tFu8vb_PHRWGqTqUCXOMagN6CNSBROVMbJbEWpmnvnepN7bDvu87UnZOgADCPKI1rhVP1DGdXTB7vmjDGGNDp70BrCHsthT7Eplc6x6YPsWmhdI4tM-ofYyjBwWd-noaT5MORxGxpSxh0NITeoKWAJmk70gn6F1JJj_8 |
| CitedBy_id | crossref_primary_10_1016_j_patcog_2023_109353 crossref_primary_10_1155_2022_4003245 crossref_primary_10_1016_j_eswa_2022_117018 crossref_primary_10_1016_j_eswa_2023_122307 crossref_primary_10_1016_j_patcog_2024_111062 crossref_primary_10_1109_ACCESS_2024_3496929 crossref_primary_10_3390_app12062826 crossref_primary_10_1016_j_bdr_2023_100413 crossref_primary_10_1007_s11036_023_02249_w |
| Cites_doi | 10.1016/j.ins.2016.01.071 10.1016/j.patrec.2008.01.021 10.1016/j.seps.2020.100850 10.1007/s11042-019-08009-x 10.1016/j.ins.2021.04.076 10.1016/j.ins.2021.02.045 10.1109/ACCESS.2019.2903568 10.1080/01621459.1983.10478008 10.1016/j.neucom.2013.04.011 10.1023/A:1024016609528 10.1016/j.ins.2019.12.019 10.1109/TPAMI.2005.95 10.1016/j.patcog.2011.05.016 10.1016/j.ins.2019.07.100 10.1016/j.ins.2007.05.003 10.1016/j.patcog.2013.01.027 10.1016/j.ins.2020.12.051 10.1016/j.ins.2021.04.083 10.1023/A:1022631118932 10.1109/TNNLS.2017.2728138 10.1016/j.patrec.2004.04.004 10.1016/j.datak.2007.03.016 10.1109/ACCESS.2021.3069684 10.1016/j.eswa.2005.11.017 10.1109/ACCESS.2021.3057113 10.2307/2528823 10.1007/BF01908075 10.1109/TPAMI.2007.53 10.1007/s12046-018-0823-0 10.1016/j.csda.2019.106866 10.1109/91.784206 10.1111/insr.12274 10.1109/TIT.1982.1056489 10.1002/j.1538-7305.1948.tb01338.x 10.1023/A:1009769707641 10.1016/j.asoc.2016.06.019 10.1023/A:1009982220290 |
| ContentType | Journal Article |
| Copyright | 2021 Elsevier Inc. |
| Copyright_xml | – notice: 2021 Elsevier Inc. |
| DBID | AAYXX CITATION |
| DOI | 10.1016/j.ins.2021.07.039 |
| DatabaseName | CrossRef |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering Library & Information Science |
| EISSN | 1872-6291 |
| EndPage | 721 |
| ExternalDocumentID | 10_1016_j_ins_2021_07_039 S0020025521007295 |
| GroupedDBID | --K --M --Z -~X .DC .~1 0R~ 1B1 1OL 1RT 1~. 1~5 29I 4.4 457 4G. 5GY 5VS 7-5 71M 8P~ 9JN 9JO AAAKF AAAKG AABNK AACTN AAEDT AAEDW AAIAV AAIKJ AAKOC AALRI AAOAW AAQFI AAQXK AARIN AAXUO AAYFN ABAOU ABBOA ABEFU ABFNM ABJNI ABMAC ABTAH ABUCO ABXDB ABYKQ ACAZW ACDAQ ACGFS ACNNM ACRLP ACZNC ADBBV ADEZE ADGUI ADJOM ADMUD ADTZH AEBSH AECPX AEKER AENEX AFFNX AFKWA AFTJW AGHFR AGUBO AGYEJ AHHHB AHJVU AHZHX AIALX AIEXJ AIGVJ AIKHN AITUG AJBFU AJOXV ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ AOUOD APLSM ARUGR ASPBG AVWKF AXJTR AZFZN BJAXD BKOJK BLXMC CS3 DU5 EBS EFJIC EFLBG EJD EO8 EO9 EP2 EP3 F5P FDB FEDTE FGOYB FIRID FNPLU FYGXN G-Q GBLVA GBOLZ HAMUX HLZ HVGLF HZ~ H~9 IHE J1W JJJVA KOM LG9 LY1 M41 MHUIS MO0 MS~ N9A O-L O9- OAUVE OZT P-8 P-9 P2P PC. Q38 R2- RIG ROL RPZ SBC SDF SDG SDP SDS SES SEW SPC SPCBC SSB SSD SST SSV SSW SSZ T5K TN5 TWZ UHS WH7 WUQ XPP YYP ZMT ZY4 ~02 ~G- 77I AATTM AAXKI AAYWO AAYXX ABWVN ACLOT ACRPL ACVFH ADCNI ADNMO ADVLN AEIPS AEUPX AFJKZ AFPUW AGQPQ AIGII AIIUN AKBMS AKRWK AKYEP ANKPU APXCP CITATION EFKBS ~HD |
| ID | FETCH-LOGICAL-c297t-af5f5aabdadca1e7fc4c71e40c568f7bc4febb99c49f1a7aaeebbe1cf60f743e3 |
| IEDL.DBID | .~1 |
| ISSN | 0020-0255 |
| IngestDate | Wed Oct 01 05:18:23 EDT 2025 Thu Apr 24 23:05:16 EDT 2025 Fri Feb 23 02:44:13 EST 2024 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Keywords | Noise-filtered distribution centroid Mixed data clustering Intra-cluster homogeneity Iterative weight adjustment strategy Inter-cluster heterogeneity |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c297t-af5f5aabdadca1e7fc4c71e40c568f7bc4febb99c49f1a7aaeebbe1cf60f743e3 |
| PageCount | 25 |
| ParticipantIDs | crossref_primary_10_1016_j_ins_2021_07_039 crossref_citationtrail_10_1016_j_ins_2021_07_039 elsevier_sciencedirect_doi_10_1016_j_ins_2021_07_039 |
| ProviderPackageCode | CITATION AAYXX |
| PublicationCentury | 2000 |
| PublicationDate | October 2021 2021-10-00 |
| PublicationDateYYYYMMDD | 2021-10-01 |
| PublicationDate_xml | – month: 10 year: 2021 text: October 2021 |
| PublicationDecade | 2020 |
| PublicationTitle | Information sciences |
| PublicationYear | 2021 |
| Publisher | Elsevier Inc |
| Publisher_xml | – name: Elsevier Inc |
| References | Ng, Li, Huang, He (b0185) 2007; 29 Selosse, Jacques, Biernacki (b0200) 2020; 144 Zhou, Liu, Zhu (b0245) 2019; 78 Joshua Zhexue Huang, Michael K. Ng, Hongqiang Rong, Zichen Li, Automated variable weighting in k-means type clustering, IEEE Transactions on Pattern Analysis and Machine Intelligence 27 (5) (2005) 657–668 Holte (b0080) 1993; 11 Wang, Li (b0220) 2021; 564 Ahmad, Khan (b0015) 2019; 7 Huang (b0105) 1998; 2 Huang, Ng (b0110) 1999; 7 Chen, He (b0030) 2016; 345 Dinh, Huynh, Sriboonchitta (b0045) 2021; 571 D’urso, Massari (b0055) 2019; 505 Foss, Markatou, Ray (b0065) 2019; 87 Lloyd (b0170) 1982; 28 Ahmad, Dey (b0005) 2007; 63 Szepannek (b0210) 2018; 10 Ahmad, Hashmi (b0010) 2016; 48 Francisco De, Carvalho, Lechevallier, De Melo (b0040) 2012; 45 Thierry Van de Merckt, Decision trees in numerical attribute spaces, in: International Joint Conference on Artificial Intelligence,OpenReview, 1993, pp. 1016–1016 Yuan, Chen, Li, Zeng, Sang, Luo (b0240) 2021; 572 Ditzler, Polikar (b0050) 2011 Ji, Bai, Zhou, Ma, Wang (b0120) 2013; 120 Arthur Asuncion, David Newman, Uci machine learning repository, [EB/OL], 22 December 2020. URL: https://archive.ics.uci.edu/ml Kriegel, Kröger, Zimek (b0160) 2012; 2 Ji, Li, Pang, He, Feng, Zhao (b0125) 2021; 9 Jin, Zhao, Zhang, Gao, Dou, Mengkang (b0135) 2020; 38 Popoola, Tapamo, Assounga (b0190) 2021; 9 Wikipedia and Free Encyclopedia, Coefficient of variation. [EB/OL], 22 December 2020. URL: https://en.wikipedia.org/wiki/Coefficient_of_variation Yang (b0235) 1999; 1 Guangxia, Zhang, Ma, Liu (b0230) 2020; 515 Jia, Cheung (b0130) 2017; 29 Modha, Scott Spangler (b0180) 2003; 52 Zhexue Huang, Clustering large data sets with mixed numeric and categorical values, in: Proceedings of the 1st Pacific-asia Conference on Knowledge Discovery and Data Mining,(PAKDD), Citeseer, 1997, pp. 21–34 Caruso, Gattone, Fortuna, Di Battista (b0025) 2021; 73 Fayyad, Irani (b0060) 1993; 1993 Sangam, Om (b0195) 2018; 43 Kim, Lee, Lee (b0150) 2004; 25 Hubert, Arabie (b0115) 1985; 2 Kim (b0155) 2017; 32 Gower (b0075) 1971; 27 Kaufman, Rousseeuw (b0140) 2009 Ren-Jieh Kuo, Y.R. Zheng, Thi Phuong Quyen Nguyen, Metaheuristic-based possibilistic fuzzy k-modes algorithms for categorical data clustering, Information Sciences 557 (2021) 1–15 Kerber (b0145) 1992 McCane, Albert (b0175) 2008; 29 Fowlkes, Mallows (b0070) 1983; 78 Hsu, Chen (b0090) 2007; 32 Shannon (b0205) 1948; 27 Cheung, Jia (b0035) 2013; 46 Hsu, Chen, Yu-Wei (b0085) 2007; 177 Gower (10.1016/j.ins.2021.07.039_b0075) 1971; 27 Ng (10.1016/j.ins.2021.07.039_b0185) 2007; 29 Cheung (10.1016/j.ins.2021.07.039_b0035) 2013; 46 Selosse (10.1016/j.ins.2021.07.039_b0200) 2020; 144 Huang (10.1016/j.ins.2021.07.039_b0110) 1999; 7 Ditzler (10.1016/j.ins.2021.07.039_b0050) 2011 Szepannek (10.1016/j.ins.2021.07.039_b0210) 2018; 10 10.1016/j.ins.2021.07.039_b0215 Fowlkes (10.1016/j.ins.2021.07.039_b0070) 1983; 78 Kaufman (10.1016/j.ins.2021.07.039_b0140) 2009 Modha (10.1016/j.ins.2021.07.039_b0180) 2003; 52 Jia (10.1016/j.ins.2021.07.039_b0130) 2017; 29 Francisco De (10.1016/j.ins.2021.07.039_b0040) 2012; 45 Lloyd (10.1016/j.ins.2021.07.039_b0170) 1982; 28 10.1016/j.ins.2021.07.039_b0095 Chen (10.1016/j.ins.2021.07.039_b0030) 2016; 345 Foss (10.1016/j.ins.2021.07.039_b0065) 2019; 87 Caruso (10.1016/j.ins.2021.07.039_b0025) 2021; 73 McCane (10.1016/j.ins.2021.07.039_b0175) 2008; 29 Dinh (10.1016/j.ins.2021.07.039_b0045) 2021; 571 Huang (10.1016/j.ins.2021.07.039_b0105) 1998; 2 10.1016/j.ins.2021.07.039_b0165 Yuan (10.1016/j.ins.2021.07.039_b0240) 2021; 572 Hsu (10.1016/j.ins.2021.07.039_b0090) 2007; 32 Hubert (10.1016/j.ins.2021.07.039_b0115) 1985; 2 Ahmad (10.1016/j.ins.2021.07.039_b0010) 2016; 48 Ahmad (10.1016/j.ins.2021.07.039_b0015) 2019; 7 Ji (10.1016/j.ins.2021.07.039_b0120) 2013; 120 Zhou (10.1016/j.ins.2021.07.039_b0245) 2019; 78 Kim (10.1016/j.ins.2021.07.039_b0150) 2004; 25 Kriegel (10.1016/j.ins.2021.07.039_b0160) 2012; 2 Wang (10.1016/j.ins.2021.07.039_b0220) 2021; 564 Guangxia (10.1016/j.ins.2021.07.039_b0230) 2020; 515 Kim (10.1016/j.ins.2021.07.039_b0155) 2017; 32 Ahmad (10.1016/j.ins.2021.07.039_b0005) 2007; 63 Popoola (10.1016/j.ins.2021.07.039_b0190) 2021; 9 Holte (10.1016/j.ins.2021.07.039_b0080) 1993; 11 D’urso (10.1016/j.ins.2021.07.039_b0055) 2019; 505 10.1016/j.ins.2021.07.039_b0225 Yang (10.1016/j.ins.2021.07.039_b0235) 1999; 1 Shannon (10.1016/j.ins.2021.07.039_b0205) 1948; 27 Ji (10.1016/j.ins.2021.07.039_b0125) 2021; 9 Sangam (10.1016/j.ins.2021.07.039_b0195) 2018; 43 10.1016/j.ins.2021.07.039_b0020 Hsu (10.1016/j.ins.2021.07.039_b0085) 2007; 177 Kerber (10.1016/j.ins.2021.07.039_b0145) 1992 Fayyad (10.1016/j.ins.2021.07.039_b0060) 1993; 1993 10.1016/j.ins.2021.07.039_b0100 Jin (10.1016/j.ins.2021.07.039_b0135) 2020; 38 |
| References_xml | – volume: 27 start-page: 379 year: 1948 end-page: 423 ident: b0205 article-title: A mathematical theory of communication publication-title: The Bell System Technical Journal – volume: 505 start-page: 513 year: 2019 end-page: 534 ident: b0055 article-title: Fuzzy clustering of mixed data publication-title: Information Sciences – volume: 177 start-page: 4474 year: 2007 end-page: 4492 ident: b0085 article-title: Hierarchical clustering of mixed data based on distance hierarchy publication-title: Information Sciences – volume: 32 start-page: 979 year: 2017 end-page: 990 ident: b0155 article-title: A weighted k-modes clustering using new weighting method based on within-cluster and between-cluster impurity measures publication-title: Journal of Intelligent & Fuzzy Systems – volume: 9 start-page: 52125 year: 2021 end-page: 52143 ident: b0190 article-title: Cluster analysis of mixed and missing chronic kidney disease data in kwazulu-natal province, south africa publication-title: IEEE Access – volume: 515 start-page: 280 year: 2020 end-page: 293 ident: b0230 article-title: A mixed attributes oriented dynamic som fuzzy cluster algorithm for mobile user classification publication-title: Information Sciences – volume: 2 start-page: 351 year: 2012 end-page: 364 ident: b0160 article-title: Subspace clustering publication-title: Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery – volume: 11 start-page: 63 year: 1993 end-page: 90 ident: b0080 article-title: Very simple classification rules perform well on most commonly used datasets publication-title: Machine Learning – volume: 32 start-page: 12 year: 2007 end-page: 23 ident: b0090 article-title: Mining of mixed data with application to catalog marketing publication-title: Expert Systems with Applications – volume: 1 start-page: 69 year: 1999 end-page: 90 ident: b0235 article-title: An evaluation of statistical approaches to text categorization publication-title: Information Retrieval – volume: 38 start-page: 3319 year: 2020 end-page: 3330 ident: b0135 article-title: Adaptive soft subspace clustering combining within-cluster and between-cluster information publication-title: Journal of Intelligent & Fuzzy Systems – volume: 48 start-page: 39 year: 2016 end-page: 49 ident: b0010 article-title: K-harmonic means type clustering algorithm for mixed datasets publication-title: Applied Soft Computing – year: 2009 ident: b0140 article-title: Finding Groups in Data: An Introduction to Cluster Analysis – reference: Ren-Jieh Kuo, Y.R. Zheng, Thi Phuong Quyen Nguyen, Metaheuristic-based possibilistic fuzzy k-modes algorithms for categorical data clustering, Information Sciences 557 (2021) 1–15 – reference: Joshua Zhexue Huang, Michael K. Ng, Hongqiang Rong, Zichen Li, Automated variable weighting in k-means type clustering, IEEE Transactions on Pattern Analysis and Machine Intelligence 27 (5) (2005) 657–668 – volume: 29 start-page: 3308 year: 2017 end-page: 3325 ident: b0130 article-title: Subspace clustering of categorical and numerical data with an unknown number of clusters publication-title: IEEE Transactions on Neural Networks and Learning Systems – volume: 73 year: 2021 ident: b0025 article-title: Cluster analysis for mixed data: An application to credit risk evaluation publication-title: Socio-Economic Planning Sciences – volume: 29 start-page: 986 year: 2008 end-page: 993 ident: b0175 article-title: Distance functions for categorical and mixed variables publication-title: Pattern Recognition Letters – volume: 120 start-page: 590 year: 2013 end-page: 596 ident: b0120 article-title: An improved k-prototypes clustering algorithm for mixed numeric and categorical data publication-title: Neurocomputing – volume: 45 start-page: 447 year: 2012 end-page: 464 ident: b0040 article-title: Partitioning hard clustering algorithms based on multiple dissimilarity matrices publication-title: Pattern Recognition – volume: 28 start-page: 129 year: 1982 end-page: 137 ident: b0170 article-title: Least squares quantization in pcm publication-title: IEEE Transactions on Information Theory – volume: 7 start-page: 446 year: 1999 end-page: 452 ident: b0110 article-title: A fuzzy k-modes algorithm for clustering categorical data publication-title: IEEE Transactions on Fuzzy Systems – volume: 52 start-page: 217 year: 2003 end-page: 237 ident: b0180 article-title: Feature weighting in k-means clustering publication-title: Machine Learning – volume: 10 start-page: 200 year: 2018 ident: b0210 article-title: clustmixtype: User-friendly clustering of mixed-type data in r publication-title: R Journal – volume: 43 start-page: 37 year: 2018 ident: b0195 article-title: An equi-biased k-prototypes algorithm for clustering mixed-type data publication-title: Sādhanā – volume: 63 start-page: 503 year: 2007 end-page: 527 ident: b0005 article-title: A k-mean clustering algorithm for mixed numeric and categorical data publication-title: Data & Knowledge Engineering – volume: 564 start-page: 396 year: 2021 end-page: 415 ident: b0220 article-title: Outlier detection based on weighted neighbourhood information network for mixed-valued datasets publication-title: Information Sciences – volume: 7 start-page: 31883 year: 2019 end-page: 31902 ident: b0015 article-title: Survey of state-of-the-art mixed data clustering algorithms publication-title: IEEE Access – volume: 144 year: 2020 ident: b0200 article-title: Model-based co-clustering for mixed type data publication-title: Computational Statistics & Data Analysis – volume: 572 start-page: 67 year: 2021 end-page: 87 ident: b0240 article-title: Unsupervised attribute reduction for mixed data based on fuzzy rough sets publication-title: Information Sciences – volume: 29 start-page: 503 year: 2007 end-page: 507 ident: b0185 article-title: On the impact of dissimilarity measure in k-modes clustering algorithm publication-title: IEEE Transactions on Pattern Analysis and Machine Intelligence – volume: 78 start-page: 553 year: 1983 end-page: 569 ident: b0070 article-title: A method for comparing two hierarchical clusterings publication-title: Journal of the American statistical association – volume: 2 start-page: 193 year: 1985 end-page: 218 ident: b0115 article-title: Comparing partitions publication-title: Journal of Classification – volume: 1993 start-page: 1022 year: 1993 end-page: 1027 ident: b0060 article-title: Multi-interval discretization of continuous-valued attributes for classification learning publication-title: Machine Learning – reference: Zhexue Huang, Clustering large data sets with mixed numeric and categorical values, in: Proceedings of the 1st Pacific-asia Conference on Knowledge Discovery and Data Mining,(PAKDD), Citeseer, 1997, pp. 21–34 – volume: 46 start-page: 2228 year: 2013 end-page: 2238 ident: b0035 article-title: Categorical-and-numerical-attribute data clustering based on a unified similarity metric without knowing cluster number publication-title: Pattern Recognition – volume: 2 start-page: 283 year: 1998 end-page: 304 ident: b0105 article-title: Extensions to the k-means algorithm for clustering large data sets with categorical values publication-title: Data Mining and Knowledge Discovery – start-page: 123 year: 1992 end-page: 128 ident: b0145 article-title: Chimerge: Discretization of numeric attributes publication-title: Proceedings of the Tenth National Conference on Artificial Intelligence – volume: 78 start-page: 33415 year: 2019 end-page: 33434 ident: b0245 article-title: Weighted adjacent matrix for k-means clustering publication-title: Multimedia Tools and Applications – volume: 9 start-page: 24913 year: 2021 end-page: 24924 ident: b0125 article-title: A multi-view clustering algorithm for mixed numeric and categorical data publication-title: IEEE Access – reference: Thierry Van de Merckt, Decision trees in numerical attribute spaces, in: International Joint Conference on Artificial Intelligence,OpenReview, 1993, pp. 1016–1016 – volume: 25 start-page: 1263 year: 2004 end-page: 1271 ident: b0150 article-title: Fuzzy clustering of categorical data using fuzzy centroids publication-title: Pattern Recognition Letters – reference: Arthur Asuncion, David Newman, Uci machine learning repository, [EB/OL], 22 December 2020. URL: https://archive.ics.uci.edu/ml – volume: 345 start-page: 271 year: 2016 end-page: 293 ident: b0030 article-title: A fast density-based data stream clustering algorithm with cluster centers self-determined for mixed data publication-title: Information Sciences – volume: 571 start-page: 418 year: 2021 end-page: 442 ident: b0045 article-title: Clustering mixed numerical and categorical data with missing values publication-title: Information Sciences – start-page: 41 year: 2011 end-page: 48 ident: b0050 article-title: Hellinger distance based drift detection for nonstationary environments publication-title: 2011 IEEE Symposium on Computational Intelligence in Dynamic and Uncertain Environments (CIDUE) – reference: Wikipedia and Free Encyclopedia, Coefficient of variation. [EB/OL], 22 December 2020. URL: https://en.wikipedia.org/wiki/Coefficient_of_variation – volume: 27 start-page: 857 year: 1971 end-page: 871 ident: b0075 article-title: A general coefficient of similarity and some of its properties publication-title: Biometrics – volume: 87 start-page: 80 year: 2019 end-page: 109 ident: b0065 article-title: Distance metrics and clustering methods for mixed-type data publication-title: International Statistical Review – volume: 345 start-page: 271 year: 2016 ident: 10.1016/j.ins.2021.07.039_b0030 article-title: A fast density-based data stream clustering algorithm with cluster centers self-determined for mixed data publication-title: Information Sciences doi: 10.1016/j.ins.2016.01.071 – volume: 29 start-page: 986 issue: 7 year: 2008 ident: 10.1016/j.ins.2021.07.039_b0175 article-title: Distance functions for categorical and mixed variables publication-title: Pattern Recognition Letters doi: 10.1016/j.patrec.2008.01.021 – volume: 73 year: 2021 ident: 10.1016/j.ins.2021.07.039_b0025 article-title: Cluster analysis for mixed data: An application to credit risk evaluation publication-title: Socio-Economic Planning Sciences doi: 10.1016/j.seps.2020.100850 – volume: 2 start-page: 351 issue: 4 year: 2012 ident: 10.1016/j.ins.2021.07.039_b0160 article-title: Subspace clustering publication-title: Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery – volume: 78 start-page: 33415 issue: 23 year: 2019 ident: 10.1016/j.ins.2021.07.039_b0245 article-title: Weighted adjacent matrix for k-means clustering publication-title: Multimedia Tools and Applications doi: 10.1007/s11042-019-08009-x – volume: 571 start-page: 418 year: 2021 ident: 10.1016/j.ins.2021.07.039_b0045 article-title: Clustering mixed numerical and categorical data with missing values publication-title: Information Sciences doi: 10.1016/j.ins.2021.04.076 – volume: 10 start-page: 200 issue: 2 year: 2018 ident: 10.1016/j.ins.2021.07.039_b0210 article-title: clustmixtype: User-friendly clustering of mixed-type data in r publication-title: R Journal – ident: 10.1016/j.ins.2021.07.039_b0020 – volume: 564 start-page: 396 year: 2021 ident: 10.1016/j.ins.2021.07.039_b0220 article-title: Outlier detection based on weighted neighbourhood information network for mixed-valued datasets publication-title: Information Sciences doi: 10.1016/j.ins.2021.02.045 – volume: 7 start-page: 31883 year: 2019 ident: 10.1016/j.ins.2021.07.039_b0015 article-title: Survey of state-of-the-art mixed data clustering algorithms publication-title: IEEE Access doi: 10.1109/ACCESS.2019.2903568 – volume: 78 start-page: 553 issue: 383 year: 1983 ident: 10.1016/j.ins.2021.07.039_b0070 article-title: A method for comparing two hierarchical clusterings publication-title: Journal of the American statistical association doi: 10.1080/01621459.1983.10478008 – volume: 120 start-page: 590 year: 2013 ident: 10.1016/j.ins.2021.07.039_b0120 article-title: An improved k-prototypes clustering algorithm for mixed numeric and categorical data publication-title: Neurocomputing doi: 10.1016/j.neucom.2013.04.011 – volume: 52 start-page: 217 issue: 3 year: 2003 ident: 10.1016/j.ins.2021.07.039_b0180 article-title: Feature weighting in k-means clustering publication-title: Machine Learning doi: 10.1023/A:1024016609528 – volume: 515 start-page: 280 year: 2020 ident: 10.1016/j.ins.2021.07.039_b0230 article-title: A mixed attributes oriented dynamic som fuzzy cluster algorithm for mobile user classification publication-title: Information Sciences doi: 10.1016/j.ins.2019.12.019 – volume: 32 start-page: 979 issue: 1 year: 2017 ident: 10.1016/j.ins.2021.07.039_b0155 article-title: A weighted k-modes clustering using new weighting method based on within-cluster and between-cluster impurity measures publication-title: Journal of Intelligent & Fuzzy Systems – ident: 10.1016/j.ins.2021.07.039_b0095 doi: 10.1109/TPAMI.2005.95 – ident: 10.1016/j.ins.2021.07.039_b0100 – volume: 45 start-page: 447 issue: 1 year: 2012 ident: 10.1016/j.ins.2021.07.039_b0040 article-title: Partitioning hard clustering algorithms based on multiple dissimilarity matrices publication-title: Pattern Recognition doi: 10.1016/j.patcog.2011.05.016 – volume: 505 start-page: 513 year: 2019 ident: 10.1016/j.ins.2021.07.039_b0055 article-title: Fuzzy clustering of mixed data publication-title: Information Sciences doi: 10.1016/j.ins.2019.07.100 – volume: 177 start-page: 4474 issue: 20 year: 2007 ident: 10.1016/j.ins.2021.07.039_b0085 article-title: Hierarchical clustering of mixed data based on distance hierarchy publication-title: Information Sciences doi: 10.1016/j.ins.2007.05.003 – volume: 46 start-page: 2228 issue: 8 year: 2013 ident: 10.1016/j.ins.2021.07.039_b0035 article-title: Categorical-and-numerical-attribute data clustering based on a unified similarity metric without knowing cluster number publication-title: Pattern Recognition doi: 10.1016/j.patcog.2013.01.027 – ident: 10.1016/j.ins.2021.07.039_b0165 doi: 10.1016/j.ins.2020.12.051 – ident: 10.1016/j.ins.2021.07.039_b0215 – volume: 572 start-page: 67 year: 2021 ident: 10.1016/j.ins.2021.07.039_b0240 article-title: Unsupervised attribute reduction for mixed data based on fuzzy rough sets publication-title: Information Sciences doi: 10.1016/j.ins.2021.04.083 – volume: 11 start-page: 63 issue: 1 year: 1993 ident: 10.1016/j.ins.2021.07.039_b0080 article-title: Very simple classification rules perform well on most commonly used datasets publication-title: Machine Learning doi: 10.1023/A:1022631118932 – year: 2009 ident: 10.1016/j.ins.2021.07.039_b0140 – start-page: 41 year: 2011 ident: 10.1016/j.ins.2021.07.039_b0050 article-title: Hellinger distance based drift detection for nonstationary environments – volume: 1993 start-page: 1022 year: 1993 ident: 10.1016/j.ins.2021.07.039_b0060 article-title: Multi-interval discretization of continuous-valued attributes for classification learning publication-title: Machine Learning – volume: 29 start-page: 3308 issue: 8 year: 2017 ident: 10.1016/j.ins.2021.07.039_b0130 article-title: Subspace clustering of categorical and numerical data with an unknown number of clusters publication-title: IEEE Transactions on Neural Networks and Learning Systems doi: 10.1109/TNNLS.2017.2728138 – ident: 10.1016/j.ins.2021.07.039_b0225 – volume: 25 start-page: 1263 issue: 11 year: 2004 ident: 10.1016/j.ins.2021.07.039_b0150 article-title: Fuzzy clustering of categorical data using fuzzy centroids publication-title: Pattern Recognition Letters doi: 10.1016/j.patrec.2004.04.004 – volume: 63 start-page: 503 issue: 2 year: 2007 ident: 10.1016/j.ins.2021.07.039_b0005 article-title: A k-mean clustering algorithm for mixed numeric and categorical data publication-title: Data & Knowledge Engineering doi: 10.1016/j.datak.2007.03.016 – volume: 9 start-page: 52125 year: 2021 ident: 10.1016/j.ins.2021.07.039_b0190 article-title: Cluster analysis of mixed and missing chronic kidney disease data in kwazulu-natal province, south africa publication-title: IEEE Access doi: 10.1109/ACCESS.2021.3069684 – volume: 32 start-page: 12 issue: 1 year: 2007 ident: 10.1016/j.ins.2021.07.039_b0090 article-title: Mining of mixed data with application to catalog marketing publication-title: Expert Systems with Applications doi: 10.1016/j.eswa.2005.11.017 – volume: 9 start-page: 24913 year: 2021 ident: 10.1016/j.ins.2021.07.039_b0125 article-title: A multi-view clustering algorithm for mixed numeric and categorical data publication-title: IEEE Access doi: 10.1109/ACCESS.2021.3057113 – volume: 27 start-page: 857 year: 1971 ident: 10.1016/j.ins.2021.07.039_b0075 article-title: A general coefficient of similarity and some of its properties publication-title: Biometrics doi: 10.2307/2528823 – volume: 2 start-page: 193 issue: 1 year: 1985 ident: 10.1016/j.ins.2021.07.039_b0115 article-title: Comparing partitions publication-title: Journal of Classification doi: 10.1007/BF01908075 – volume: 29 start-page: 503 issue: 3 year: 2007 ident: 10.1016/j.ins.2021.07.039_b0185 article-title: On the impact of dissimilarity measure in k-modes clustering algorithm publication-title: IEEE Transactions on Pattern Analysis and Machine Intelligence doi: 10.1109/TPAMI.2007.53 – volume: 43 start-page: 37 issue: 3 year: 2018 ident: 10.1016/j.ins.2021.07.039_b0195 article-title: An equi-biased k-prototypes algorithm for clustering mixed-type data publication-title: Sādhanā doi: 10.1007/s12046-018-0823-0 – volume: 144 year: 2020 ident: 10.1016/j.ins.2021.07.039_b0200 article-title: Model-based co-clustering for mixed type data publication-title: Computational Statistics & Data Analysis doi: 10.1016/j.csda.2019.106866 – volume: 7 start-page: 446 issue: 4 year: 1999 ident: 10.1016/j.ins.2021.07.039_b0110 article-title: A fuzzy k-modes algorithm for clustering categorical data publication-title: IEEE Transactions on Fuzzy Systems doi: 10.1109/91.784206 – volume: 38 start-page: 3319 issue: 3 year: 2020 ident: 10.1016/j.ins.2021.07.039_b0135 article-title: Adaptive soft subspace clustering combining within-cluster and between-cluster information publication-title: Journal of Intelligent & Fuzzy Systems – volume: 87 start-page: 80 issue: 1 year: 2019 ident: 10.1016/j.ins.2021.07.039_b0065 article-title: Distance metrics and clustering methods for mixed-type data publication-title: International Statistical Review doi: 10.1111/insr.12274 – volume: 28 start-page: 129 issue: 2 year: 1982 ident: 10.1016/j.ins.2021.07.039_b0170 article-title: Least squares quantization in pcm publication-title: IEEE Transactions on Information Theory doi: 10.1109/TIT.1982.1056489 – volume: 27 start-page: 379 issue: 3 year: 1948 ident: 10.1016/j.ins.2021.07.039_b0205 article-title: A mathematical theory of communication publication-title: The Bell System Technical Journal doi: 10.1002/j.1538-7305.1948.tb01338.x – start-page: 123 year: 1992 ident: 10.1016/j.ins.2021.07.039_b0145 article-title: Chimerge: Discretization of numeric attributes – volume: 2 start-page: 283 issue: 3 year: 1998 ident: 10.1016/j.ins.2021.07.039_b0105 article-title: Extensions to the k-means algorithm for clustering large data sets with categorical values publication-title: Data Mining and Knowledge Discovery doi: 10.1023/A:1009769707641 – volume: 48 start-page: 39 year: 2016 ident: 10.1016/j.ins.2021.07.039_b0010 article-title: K-harmonic means type clustering algorithm for mixed datasets publication-title: Applied Soft Computing doi: 10.1016/j.asoc.2016.06.019 – volume: 1 start-page: 69 issue: 1–2 year: 1999 ident: 10.1016/j.ins.2021.07.039_b0235 article-title: An evaluation of statistical approaches to text categorization publication-title: Information Retrieval doi: 10.1023/A:1009982220290 |
| SSID | ssj0004766 |
| Score | 2.4355118 |
| Snippet | •A novel mixed data clustering optimization approach is applied for cluster analysis.•Clustering performance is optimized by noise filtering distribution... |
| SourceID | crossref elsevier |
| SourceType | Enrichment Source Index Database Publisher |
| StartPage | 697 |
| SubjectTerms | Inter-cluster heterogeneity Intra-cluster homogeneity Iterative weight adjustment strategy Mixed data clustering Noise-filtered distribution centroid |
| Title | A mixed data clustering algorithm with noise-filtered distribution centroid and iterative weight adjustment strategy |
| URI | https://dx.doi.org/10.1016/j.ins.2021.07.039 |
| Volume | 577 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVESC databaseName: Baden-Württemberg Complete Freedom Collection (Elsevier) customDbUrl: eissn: 1872-6291 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0004766 issn: 0020-0255 databaseCode: GBLVA dateStart: 20110101 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier – providerCode: PRVESC databaseName: Elsevier Freedom Collection customDbUrl: eissn: 1872-6291 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0004766 issn: 0020-0255 databaseCode: AIKHN dateStart: 19950101 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier – providerCode: PRVESC databaseName: Elsevier SD Complete Freedom Collection customDbUrl: eissn: 1872-6291 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0004766 issn: 0020-0255 databaseCode: ACRLP dateStart: 19950101 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier – providerCode: PRVESC databaseName: ScienceDirect (Elsevier) customDbUrl: eissn: 1872-6291 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0004766 issn: 0020-0255 databaseCode: .~1 dateStart: 19950101 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier – providerCode: PRVLSH databaseName: Elsevier Journals customDbUrl: mediaType: online eissn: 1872-6291 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0004766 issn: 0020-0255 databaseCode: AKRWK dateStart: 19681201 isFulltext: true providerName: Library Specific Holdings |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV07T8MwELaqssCAeIry0g2IASm0CU7djFVFVUB0olK3yE8IKklVgoCF347PcXhIwMDoyBdFd875fP7uO0KOlJCMC3tMtfaNA8qRAzIScWASqYWhRipXHn097o4m9HIaTxtkUNfCIKzS-_7Kpztv7Z-0vTbb8yzDGt_IRcRR6OivsdCcUoZdDE7fPmEelFX3lXhMwtn1zabDeGU5MnZHoePvxH7hP-1NX_ab4RpZ9YEi9KtvWScNnW-QlS_0gRvkwBcdwDH4qiLUMvjfdZOUfXjIXrQChIGCnD0hKYKVBD67LRZZefcAmIaFvMgedWAyvDjH2cil69tggQNvFpkCniuoKJitf4Rnl1EFru7tSzHBCI8Vze3rFpkMz28Go8B3WQhklLAy4CY2MedCcSV5qJmRVLJQ046Muz3DhKRGC5EkkiYm5IxzbYc6lKbbMTb80GfbpJkXud4hgOR_uoeLIsLOZiJRUneTMAo5ss7RqEU6tX5T6SnIsRPGLK2xZvepNUmKJkk7LLUmaZGTD5F5xb_x12RaGy39tohSuz_8Lrb7P7E9soyjCtm3T5rl4kkf2AilFIduCR6Spf7F1Wj8DtcW6rk |
| linkProvider | Elsevier |
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LT9wwEB7xOJQeENCiQqHMAfVQKWXjOmtyRKhoeZ5A4mb5CVktWcRmVbjw2-txHAoS5cAxiSeKPM54ZvzNNwDbVhuhdAhTg36LjCvigGS6yHxpnPbcGxvLo0_P-oMLfnRZXM7AflcLQ7DKZPtbmx6tdbqzk2Zz57aqqMaXRY-Y5ZH-upiFeV4wQRHYz8d_OA8u2gNLipNoeHe0GUFeVU2U3SyPBJ7UMPy1zenZhnOwBIvJU8S99mOWYcbVK_DxGX_gCmymqgP8jqmsiKYZ0__6CZo9vKnunUXCgaIZTYkVIUiiGl2N76rm-gYpD4v1uJq4zFd0ck6jiUw39cHCiN4cVxZVbbHlYA4GEv_ElCoqOwwvpQwjTlqe24fPcHHw-3x_kKU2C5lhpWgy5QtfKKWtskblTnjDjcgd75miv-uFNtw7rcvS8NLnSijlwqXLje_3fPA_3K9VmKvHtfsCSOx_bpdWBaPWZrq0xvXLnOWKaOc4W4NeN7_SJA5yaoUxkh3YbCiDSiSpRPaEDCpZgx9PIrctAcdbg3mnNPliFcmwQfxfbP19YlvwYXB-eiJPDs-Ov8ICPWlhfhsw19xN3WZwVxr9LS7Hv0_07E4 |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+mixed+data+clustering+algorithm+with+noise-filtered+distribution+centroid+and+iterative+weight+adjustment+strategy&rft.jtitle=Information+sciences&rft.au=Li%2C+Xiangjun&rft.au=Wu%2C+Zijie&rft.au=Zhao%2C+Zhibin&rft.au=Ding%2C+Feng&rft.date=2021-10-01&rft.pub=Elsevier+Inc&rft.issn=0020-0255&rft.eissn=1872-6291&rft.volume=577&rft.spage=697&rft.epage=721&rft_id=info:doi/10.1016%2Fj.ins.2021.07.039&rft.externalDocID=S0020025521007295 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0020-0255&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0020-0255&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0020-0255&client=summon |