Chameleon 2 An Improved Graph-Based Clustering Algorithm
Traditional clustering algorithms fail to produce human-like results when confronted with data of variable density, complex distributions, or in the presence of noise. We propose an improved graph-based clustering algorithm called Chameleon 2, which overcomes several drawbacks of state-of-the-art cl...
Saved in:
| Published in | ACM transactions on knowledge discovery from data Vol. 13; no. 1; pp. 1 - 27 |
|---|---|
| Main Authors | , , |
| Format | Journal Article |
| Language | English |
| Published |
01.01.2019
|
| Online Access | Get full text |
| ISSN | 1556-4681 1556-472X |
| DOI | 10.1145/3299876 |
Cover
| Abstract | Traditional clustering algorithms fail to produce human-like results when confronted with data of variable density, complex distributions, or in the presence of noise. We propose an improved graph-based clustering algorithm called Chameleon 2, which overcomes several drawbacks of state-of-the-art clustering approaches. We modified the internal cluster quality measure and added an extra step to ensure algorithm robustness. Our results reveal a significant positive impact on the clustering quality measured by Normalized Mutual Information on 32 artificial datasets used in the clustering literature. This significant improvement is also confirmed on real-world datasets.
The performance of clustering algorithms such as DBSCAN is extremely parameter sensitive, and exhaustive manual parameter tuning is necessary to obtain a meaningful result. All hierarchical clustering methods are very sensitive to cutoff selection, and a human expert is often required to find the true cutoff for each clustering result. We present an automated cutoff selection method that enables the Chameleon 2 algorithm to generate high-quality clustering in autonomous mode. |
|---|---|
| AbstractList | Traditional clustering algorithms fail to produce human-like results when confronted with data of variable density, complex distributions, or in the presence of noise. We propose an improved graph-based clustering algorithm called Chameleon 2, which overcomes several drawbacks of state-of-the-art clustering approaches. We modified the internal cluster quality measure and added an extra step to ensure algorithm robustness. Our results reveal a significant positive impact on the clustering quality measured by Normalized Mutual Information on 32 artificial datasets used in the clustering literature. This significant improvement is also confirmed on real-world datasets.
The performance of clustering algorithms such as DBSCAN is extremely parameter sensitive, and exhaustive manual parameter tuning is necessary to obtain a meaningful result. All hierarchical clustering methods are very sensitive to cutoff selection, and a human expert is often required to find the true cutoff for each clustering result. We present an automated cutoff selection method that enables the Chameleon 2 algorithm to generate high-quality clustering in autonomous mode. |
| Author | Barton, Tomas Kordik, Pavel Bruna, Tomas |
| Author_xml | – sequence: 1 givenname: Tomas orcidid: 0000-0002-8163-9687 surname: Barton fullname: Barton, Tomas organization: Czech Technical University in Prague, Institute of Molecular Genetics ASCR – sequence: 2 givenname: Tomas surname: Bruna fullname: Bruna, Tomas organization: Czech Technical University in Prague, Prague, Czech Republic – sequence: 3 givenname: Pavel surname: Kordik fullname: Kordik, Pavel organization: Czech Technical University in Prague, Prague, Czech Republic |
| BookMark | eNpljz1PwzAURa2qSG0D6l9gYzL4Pfs9JyOK-JIqsYDEFjmOrQalCbKz8O8poiww3Tsc3auzEctxGoMQW1DXAIZuNFZVaXkh1kDE0lh8W_52LmElNjm_K0UEgGtR1Ht3CEOYxks8F2fRDTlcnLIQr_d3L_Wj3D0_PNW3O-kRaZZoqggU26hs7NAFUowcSnbGBN21VrG2pnPaHz-98twab0sC7dgbzdbrQlz97Po05ZxCbD5Sf3DpswHVfDs0J4cjKf-Qvp_d3E_jnFw__OO_AOTrRmY |
| CitedBy_id | crossref_primary_10_1002_ece3_9496 crossref_primary_10_3390_s23146337 crossref_primary_10_1007_s10462_022_10366_3 crossref_primary_10_1007_s10618_023_00980_2 crossref_primary_10_1016_j_eswa_2022_119099 crossref_primary_10_3390_en14206778 crossref_primary_10_1016_j_knosys_2021_107295 crossref_primary_10_1007_s10489_020_01926_7 crossref_primary_10_3390_app12199402 crossref_primary_10_1007_s10489_021_02830_4 crossref_primary_10_1016_j_aei_2024_102799 crossref_primary_10_1007_s11760_024_03446_0 crossref_primary_10_1007_s10489_021_02389_0 crossref_primary_10_1016_j_ins_2022_12_078 crossref_primary_10_1016_j_eswa_2023_121124 crossref_primary_10_1109_JSEN_2020_3009231 crossref_primary_10_1186_s12911_020_01214_x crossref_primary_10_1016_j_jksuci_2023_101676 |
| Cites_doi | 10.21105/joss.00205 10.1145/2723372.2737792 10.1186/1471-2105-8-3 10.1109/ICDM.2006.103 10.1145/276304.276312 10.5555/1654758.1654774 10.1201/b17320 10.1109/TEVC.2006.877146 10.1016/j.patcog.2005.09.012 10.1145/309847.309954 10.1109/ICTAI.2004.50 10.1016/j.patcog.2007.04.010 10.1111/j.2517-6161.1977.tb01600.x 10.1007/11590316_1 10.1016/S0167-7152(96)00213-1 10.1093/comjnl/9.4.373 10.1162/153244303321897735 10.1109/T-C.1973.223640 10.1126/science.1198704 10.1145/1217299.1217303 10.1109/92.748202 10.1109/2.781637 10.1016/j.patrec.2009.09.011 10.1007/BF01908075 10.1109/TPAMI.2002.1033218 10.1109/TIT.1982.1056489 10.1111/1467-9868.00293 10.1109/TPAMI.1979.4766909 10.1080/03610927408827101 10.1103/PhysRevE.69.026113 10.1109/TSMC.1987.4309069 10.1126/science.1136800 10.1109/T-C.1971.223083 |
| ContentType | Journal Article |
| DBID | AAYXX CITATION |
| DOI | 10.1145/3299876 |
| DatabaseName | CrossRef |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | CrossRef |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 1556-472X |
| EndPage | 27 |
| ExternalDocumentID | 10_1145_3299876 |
| GroupedDBID | .4S .DC 23M 4.4 5GY 5VS 8US AAKMM AALFJ AAYFX AAYXX ABPPZ ACM ADBCU ADL ADMLS AEBYY AEFXT AEJOY AENEX AENSD AFWIH AFWXC AIKLT AKRVB ALMA_UNASSIGNED_HOLDINGS ARCSS ASPBG AVWKF BDXCO CCLIF CITATION CS3 EBS EDO EJD F5P FEDTE GUFHI HGAVV H~9 I07 LHSKQ MK~ P1C P2P RNS ROL TUS ZCA |
| ID | FETCH-LOGICAL-c225t-249f15fbf07fd2ae50626e86a44e3db706374da3c155c0c6b4c78513a6c4367c3 |
| ISSN | 1556-4681 |
| IngestDate | Thu Apr 24 22:59:16 EDT 2025 Wed Oct 01 05:50:41 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 1 |
| Language | English |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-c225t-249f15fbf07fd2ae50626e86a44e3db706374da3c155c0c6b4c78513a6c4367c3 |
| ORCID | 0000-0002-8163-9687 |
| PageCount | 27 |
| ParticipantIDs | crossref_primary_10_1145_3299876 crossref_citationtrail_10_1145_3299876 |
| ProviderPackageCode | CITATION AAYXX |
| PublicationCentury | 2000 |
| PublicationDate | 2019-01-01 |
| PublicationDateYYYYMMDD | 2019-01-01 |
| PublicationDate_xml | – month: 01 year: 2019 text: 2019-01-01 day: 01 |
| PublicationDecade | 2010 |
| PublicationTitle | ACM transactions on knowledge discovery from data |
| PublicationYear | 2019 |
| References | e_1_2_2_24_1 e_1_2_2_49_1 e_1_2_2_6_1 Bartoň Tomáš (e_1_2_2_4_1) e_1_2_2_22_1 e_1_2_2_20_1 Jain Anil K. (e_1_2_2_28_1) e_1_2_2_8_1 Barton T. (e_1_2_2_3_1) e_1_2_2_47_1 e_1_2_2_13_1 e_1_2_2_38_1 e_1_2_2_11_1 Shatovska Tetyana (e_1_2_2_48_1) 2007; 107 Su Mu-Chun (e_1_2_2_50_1) 2005; 7 e_1_2_2_51_1 Ester Martin (e_1_2_2_14_1) 1996 e_1_2_2_19_1 e_1_2_2_32_1 e_1_2_2_53_1 e_1_2_2_17_1 e_1_2_2_34_1 e_1_2_2_15_1 e_1_2_2_36_1 Jain Anil K. (e_1_2_2_26_1) 1988 Kaufman L. (e_1_2_2_35_1) 1990 e_1_2_2_25_1 e_1_2_2_5_1 e_1_2_2_23_1 e_1_2_2_7_1 e_1_2_2_21_1 e_1_2_2_1_1 Zelnik-Manor Lihi (e_1_2_2_55_1) 2004 e_1_2_2_40_1 e_1_2_2_42_1 e_1_2_2_9_1 e_1_2_2_29_1 e_1_2_2_44_1 e_1_2_2_27_1 MacQueen J. B. (e_1_2_2_41_1) 1967; 1 McLachlan G. J. (e_1_2_2_43_1) 1988 Ball G. (e_1_2_2_2_1) 1965 Rodriguez Alex (e_1_2_2_46_1) 2014 Fiduccia C. M. (e_1_2_2_16_1) e_1_2_2_37_1 e_1_2_2_12_1 e_1_2_2_39_1 e_1_2_2_10_1 e_1_2_2_31_1 e_1_2_2_54_1 e_1_2_2_18_1 e_1_2_2_33_1 e_1_2_2_56_1 Nguyen Xuan Vinh (e_1_2_2_45_1) 2010; 11 |
| References_xml | – volume-title: Proceedings of the 17th International Conference on Pattern Recognition (ICPR’04) ident: e_1_2_2_28_1 – ident: e_1_2_2_42_1 doi: 10.21105/joss.00205 – ident: e_1_2_2_20_1 doi: 10.1145/2723372.2737792 – ident: e_1_2_2_38_1 – volume: 107 volume-title: Proceedings of the ISTA (LNI), Heinrich C. Mayr and Dimitris Karagiannis (Eds.) year: 2007 ident: e_1_2_2_48_1 – ident: e_1_2_2_19_1 doi: 10.1186/1471-2105-8-3 – ident: e_1_2_2_10_1 doi: 10.1109/ICDM.2006.103 – ident: e_1_2_2_31_1 – ident: e_1_2_2_22_1 doi: 10.1145/276304.276312 – ident: e_1_2_2_6_1 doi: 10.5555/1654758.1654774 – ident: e_1_2_2_15_1 – ident: e_1_2_2_1_1 doi: 10.1201/b17320 – volume-title: Proceedings of the 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW’16) ident: e_1_2_2_3_1 – ident: e_1_2_2_23_1 doi: 10.1109/TEVC.2006.877146 – volume-title: Evaluation of relative indexes for multi-objective clustering ident: e_1_2_2_4_1 – ident: e_1_2_2_17_1 doi: 10.1016/j.patcog.2005.09.012 – ident: e_1_2_2_34_1 doi: 10.1145/309847.309954 – ident: e_1_2_2_47_1 doi: 10.1109/ICTAI.2004.50 – ident: e_1_2_2_11_1 doi: 10.1016/j.patcog.2007.04.010 – ident: e_1_2_2_13_1 doi: 10.1111/j.2517-6161.1977.tb01600.x – ident: e_1_2_2_27_1 doi: 10.1007/11590316_1 – ident: e_1_2_2_7_1 doi: 10.1016/S0167-7152(96)00213-1 – volume-title: Mixture Models: Inference and Applications to Clustering year: 1988 ident: e_1_2_2_43_1 – ident: e_1_2_2_37_1 doi: 10.1093/comjnl/9.4.373 – ident: e_1_2_2_39_1 – ident: e_1_2_2_49_1 doi: 10.1162/153244303321897735 – volume: 11 start-page: 2837 year: 2010 ident: e_1_2_2_45_1 article-title: Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance publication-title: Journal of Machine Learning Research – ident: e_1_2_2_29_1 doi: 10.1109/T-C.1973.223640 – ident: e_1_2_2_5_1 doi: 10.1126/science.1198704 – volume-title: ISODATA: A Novel Method of Data Analysis and Pattern Classification. Technical Report year: 1965 ident: e_1_2_2_2_1 – ident: e_1_2_2_21_1 doi: 10.1145/1217299.1217303 – ident: e_1_2_2_32_1 doi: 10.1109/92.748202 – ident: e_1_2_2_33_1 doi: 10.1109/2.781637 – ident: e_1_2_2_25_1 doi: 10.1016/j.patrec.2009.09.011 – volume-title: Clustering by fast search and find of density peaks. Science 344, 6191 year: 2014 ident: e_1_2_2_46_1 – ident: e_1_2_2_24_1 doi: 10.1007/BF01908075 – volume-title: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD). Evangelos Simoudis, Jiawei Han, and Usama M. Fayyad (Eds.). AAAI Press, 226--231 year: 1996 ident: e_1_2_2_14_1 – ident: e_1_2_2_53_1 doi: 10.1109/TPAMI.2002.1033218 – volume: 1 volume-title: Proceedngs of the 5th Berkeley Symposium on Mathematical Statistics and Probability, L. M. Le Cam and J. Neyman (Eds.) year: 1967 ident: e_1_2_2_41_1 – ident: e_1_2_2_40_1 doi: 10.1109/TIT.1982.1056489 – ident: e_1_2_2_56_1 – ident: e_1_2_2_51_1 doi: 10.1111/1467-9868.00293 – volume-title: Proceedings of Neural Information Processing Systems (NIPS’04) year: 2004 ident: e_1_2_2_55_1 – ident: e_1_2_2_12_1 doi: 10.1109/TPAMI.1979.4766909 – volume-title: Dubes year: 1988 ident: e_1_2_2_26_1 – ident: e_1_2_2_8_1 – ident: e_1_2_2_9_1 doi: 10.1080/03610927408827101 – ident: e_1_2_2_44_1 doi: 10.1103/PhysRevE.69.026113 – ident: e_1_2_2_36_1 doi: 10.1109/TSMC.1987.4309069 – ident: e_1_2_2_18_1 doi: 10.1126/science.1136800 – volume-title: Data: An Introduction to Cluster Analysis year: 1990 ident: e_1_2_2_35_1 – volume-title: Proceedings of the 19th Design Automation Conference (DAC’82) ident: e_1_2_2_16_1 – volume: 7 start-page: 175 year: 2005 ident: e_1_2_2_50_1 article-title: Fuzzy C-means algorithm with a point symmetry distance publication-title: International Journal of Fuzzy Systems – ident: e_1_2_2_54_1 doi: 10.1109/T-C.1971.223083 |
| SSID | ssj0055112 |
| Score | 2.3258312 |
| Snippet | Traditional clustering algorithms fail to produce human-like results when confronted with data of variable density, complex distributions, or in the presence... |
| SourceID | crossref |
| SourceType | Enrichment Source Index Database |
| StartPage | 1 |
| Subtitle | An Improved Graph-Based Clustering Algorithm |
| Title | Chameleon 2 |
| Volume | 13 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVEBS databaseName: Inspec with Full Text customDbUrl: eissn: 1556-472X dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0055112 issn: 1556-4681 databaseCode: ADMLS dateStart: 20070301 isFulltext: true titleUrlDefault: https://www.ebsco.com/products/research-databases/inspec-full-text providerName: EBSCOhost |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LT4NAEN5ovejBt7G-wsF4owL7AI5N1TRqvdgmvTUw3Y1GraahJvrrnWUXim0THxdCNrsE-MjMN8t8M4ScRixOfOmBq4SvXJYGyo1l6CEggQKK_FuFWuDcuRPtHrvu8_60o2CuLsnSBnwu1JX8B1UcQ1y1SvYPyJYXxQE8R3zxiAjj8VcYtx6SF3QbCGBQJZnNVke3fij6gOc_BMq9M_1LBnTe5oeRllhxmt3OHNts-m6ZNpSH6hOjHPs2eoNh6-OTYaHvNvHebh9oxVK5fWAtHhcuE6ZvSkNWxsK8z_nUTNK5z8HYPL_iPI3Of94sM13BgqLri8IFha9nHFKZJmhE03xgFy6TlQBtt1cjK82Lzu194XG5Jo15XVz7KEYcrZee26UV1lGhD91Nsm55v9M0IG6RJTnaJhtFTw3HmtgdslZi6gS7pHd12W21XduwwgU0i5mLoazyuUqVF6phkEjuYbgoI5EwJnUda6SDIRsmFPBOwQORMgiR8dJEAKMiBLpHaqPXkdwnDudDwSGiIqUxSzAqThVwGvvAIxAYBNbJWfFAA7DV3HVTkefBzEurE6ec-GYKmMxOOfh5yiFZnX48R6SWjSfyGNlYlp5YML4A7-kwNA |
| linkProvider | EBSCOhost |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Chameleon+2&rft.jtitle=ACM+transactions+on+knowledge+discovery+from+data&rft.au=Barton%2C+Tomas&rft.au=Bruna%2C+Tomas&rft.au=Kordik%2C+Pavel&rft.date=2019-01-01&rft.issn=1556-4681&rft.eissn=1556-472X&rft.volume=13&rft.issue=1&rft.spage=1&rft.epage=27&rft_id=info:doi/10.1145%2F3299876&rft.externalDBID=n%2Fa&rft.externalDocID=10_1145_3299876 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1556-4681&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1556-4681&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1556-4681&client=summon |