Chameleon 2 An Improved Graph-Based Clustering Algorithm

Traditional clustering algorithms fail to produce human-like results when confronted with data of variable density, complex distributions, or in the presence of noise. We propose an improved graph-based clustering algorithm called Chameleon 2, which overcomes several drawbacks of state-of-the-art cl...

Full description

Saved in:
Bibliographic Details
Published inACM transactions on knowledge discovery from data Vol. 13; no. 1; pp. 1 - 27
Main Authors Barton, Tomas, Bruna, Tomas, Kordik, Pavel
Format Journal Article
LanguageEnglish
Published 01.01.2019
Online AccessGet full text
ISSN1556-4681
1556-472X
DOI10.1145/3299876

Cover

Abstract Traditional clustering algorithms fail to produce human-like results when confronted with data of variable density, complex distributions, or in the presence of noise. We propose an improved graph-based clustering algorithm called Chameleon 2, which overcomes several drawbacks of state-of-the-art clustering approaches. We modified the internal cluster quality measure and added an extra step to ensure algorithm robustness. Our results reveal a significant positive impact on the clustering quality measured by Normalized Mutual Information on 32 artificial datasets used in the clustering literature. This significant improvement is also confirmed on real-world datasets. The performance of clustering algorithms such as DBSCAN is extremely parameter sensitive, and exhaustive manual parameter tuning is necessary to obtain a meaningful result. All hierarchical clustering methods are very sensitive to cutoff selection, and a human expert is often required to find the true cutoff for each clustering result. We present an automated cutoff selection method that enables the Chameleon 2 algorithm to generate high-quality clustering in autonomous mode.
AbstractList Traditional clustering algorithms fail to produce human-like results when confronted with data of variable density, complex distributions, or in the presence of noise. We propose an improved graph-based clustering algorithm called Chameleon 2, which overcomes several drawbacks of state-of-the-art clustering approaches. We modified the internal cluster quality measure and added an extra step to ensure algorithm robustness. Our results reveal a significant positive impact on the clustering quality measured by Normalized Mutual Information on 32 artificial datasets used in the clustering literature. This significant improvement is also confirmed on real-world datasets. The performance of clustering algorithms such as DBSCAN is extremely parameter sensitive, and exhaustive manual parameter tuning is necessary to obtain a meaningful result. All hierarchical clustering methods are very sensitive to cutoff selection, and a human expert is often required to find the true cutoff for each clustering result. We present an automated cutoff selection method that enables the Chameleon 2 algorithm to generate high-quality clustering in autonomous mode.
Author Barton, Tomas
Kordik, Pavel
Bruna, Tomas
Author_xml – sequence: 1
  givenname: Tomas
  orcidid: 0000-0002-8163-9687
  surname: Barton
  fullname: Barton, Tomas
  organization: Czech Technical University in Prague, Institute of Molecular Genetics ASCR
– sequence: 2
  givenname: Tomas
  surname: Bruna
  fullname: Bruna, Tomas
  organization: Czech Technical University in Prague, Prague, Czech Republic
– sequence: 3
  givenname: Pavel
  surname: Kordik
  fullname: Kordik, Pavel
  organization: Czech Technical University in Prague, Prague, Czech Republic
BookMark eNpljz1PwzAURa2qSG0D6l9gYzL4Pfs9JyOK-JIqsYDEFjmOrQalCbKz8O8poiww3Tsc3auzEctxGoMQW1DXAIZuNFZVaXkh1kDE0lh8W_52LmElNjm_K0UEgGtR1Ht3CEOYxks8F2fRDTlcnLIQr_d3L_Wj3D0_PNW3O-kRaZZoqggU26hs7NAFUowcSnbGBN21VrG2pnPaHz-98twab0sC7dgbzdbrQlz97Po05ZxCbD5Sf3DpswHVfDs0J4cjKf-Qvp_d3E_jnFw__OO_AOTrRmY
CitedBy_id crossref_primary_10_1002_ece3_9496
crossref_primary_10_3390_s23146337
crossref_primary_10_1007_s10462_022_10366_3
crossref_primary_10_1007_s10618_023_00980_2
crossref_primary_10_1016_j_eswa_2022_119099
crossref_primary_10_3390_en14206778
crossref_primary_10_1016_j_knosys_2021_107295
crossref_primary_10_1007_s10489_020_01926_7
crossref_primary_10_3390_app12199402
crossref_primary_10_1007_s10489_021_02830_4
crossref_primary_10_1016_j_aei_2024_102799
crossref_primary_10_1007_s11760_024_03446_0
crossref_primary_10_1007_s10489_021_02389_0
crossref_primary_10_1016_j_ins_2022_12_078
crossref_primary_10_1016_j_eswa_2023_121124
crossref_primary_10_1109_JSEN_2020_3009231
crossref_primary_10_1186_s12911_020_01214_x
crossref_primary_10_1016_j_jksuci_2023_101676
Cites_doi 10.21105/joss.00205
10.1145/2723372.2737792
10.1186/1471-2105-8-3
10.1109/ICDM.2006.103
10.1145/276304.276312
10.5555/1654758.1654774
10.1201/b17320
10.1109/TEVC.2006.877146
10.1016/j.patcog.2005.09.012
10.1145/309847.309954
10.1109/ICTAI.2004.50
10.1016/j.patcog.2007.04.010
10.1111/j.2517-6161.1977.tb01600.x
10.1007/11590316_1
10.1016/S0167-7152(96)00213-1
10.1093/comjnl/9.4.373
10.1162/153244303321897735
10.1109/T-C.1973.223640
10.1126/science.1198704
10.1145/1217299.1217303
10.1109/92.748202
10.1109/2.781637
10.1016/j.patrec.2009.09.011
10.1007/BF01908075
10.1109/TPAMI.2002.1033218
10.1109/TIT.1982.1056489
10.1111/1467-9868.00293
10.1109/TPAMI.1979.4766909
10.1080/03610927408827101
10.1103/PhysRevE.69.026113
10.1109/TSMC.1987.4309069
10.1126/science.1136800
10.1109/T-C.1971.223083
ContentType Journal Article
DBID AAYXX
CITATION
DOI 10.1145/3299876
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList CrossRef
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1556-472X
EndPage 27
ExternalDocumentID 10_1145_3299876
GroupedDBID .4S
.DC
23M
4.4
5GY
5VS
8US
AAKMM
AALFJ
AAYFX
AAYXX
ABPPZ
ACM
ADBCU
ADL
ADMLS
AEBYY
AEFXT
AEJOY
AENEX
AENSD
AFWIH
AFWXC
AIKLT
AKRVB
ALMA_UNASSIGNED_HOLDINGS
ARCSS
ASPBG
AVWKF
BDXCO
CCLIF
CITATION
CS3
EBS
EDO
EJD
F5P
FEDTE
GUFHI
HGAVV
H~9
I07
LHSKQ
MK~
P1C
P2P
RNS
ROL
TUS
ZCA
ID FETCH-LOGICAL-c225t-249f15fbf07fd2ae50626e86a44e3db706374da3c155c0c6b4c78513a6c4367c3
ISSN 1556-4681
IngestDate Thu Apr 24 22:59:16 EDT 2025
Wed Oct 01 05:50:41 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 1
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c225t-249f15fbf07fd2ae50626e86a44e3db706374da3c155c0c6b4c78513a6c4367c3
ORCID 0000-0002-8163-9687
PageCount 27
ParticipantIDs crossref_primary_10_1145_3299876
crossref_citationtrail_10_1145_3299876
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2019-01-01
PublicationDateYYYYMMDD 2019-01-01
PublicationDate_xml – month: 01
  year: 2019
  text: 2019-01-01
  day: 01
PublicationDecade 2010
PublicationTitle ACM transactions on knowledge discovery from data
PublicationYear 2019
References e_1_2_2_24_1
e_1_2_2_49_1
e_1_2_2_6_1
Bartoň Tomáš (e_1_2_2_4_1)
e_1_2_2_22_1
e_1_2_2_20_1
Jain Anil K. (e_1_2_2_28_1)
e_1_2_2_8_1
Barton T. (e_1_2_2_3_1)
e_1_2_2_47_1
e_1_2_2_13_1
e_1_2_2_38_1
e_1_2_2_11_1
Shatovska Tetyana (e_1_2_2_48_1) 2007; 107
Su Mu-Chun (e_1_2_2_50_1) 2005; 7
e_1_2_2_51_1
Ester Martin (e_1_2_2_14_1) 1996
e_1_2_2_19_1
e_1_2_2_32_1
e_1_2_2_53_1
e_1_2_2_17_1
e_1_2_2_34_1
e_1_2_2_15_1
e_1_2_2_36_1
Jain Anil K. (e_1_2_2_26_1) 1988
Kaufman L. (e_1_2_2_35_1) 1990
e_1_2_2_25_1
e_1_2_2_5_1
e_1_2_2_23_1
e_1_2_2_7_1
e_1_2_2_21_1
e_1_2_2_1_1
Zelnik-Manor Lihi (e_1_2_2_55_1) 2004
e_1_2_2_40_1
e_1_2_2_42_1
e_1_2_2_9_1
e_1_2_2_29_1
e_1_2_2_44_1
e_1_2_2_27_1
MacQueen J. B. (e_1_2_2_41_1) 1967; 1
McLachlan G. J. (e_1_2_2_43_1) 1988
Ball G. (e_1_2_2_2_1) 1965
Rodriguez Alex (e_1_2_2_46_1) 2014
Fiduccia C. M. (e_1_2_2_16_1)
e_1_2_2_37_1
e_1_2_2_12_1
e_1_2_2_39_1
e_1_2_2_10_1
e_1_2_2_31_1
e_1_2_2_54_1
e_1_2_2_18_1
e_1_2_2_33_1
e_1_2_2_56_1
Nguyen Xuan Vinh (e_1_2_2_45_1) 2010; 11
References_xml – volume-title: Proceedings of the 17th International Conference on Pattern Recognition (ICPR’04)
  ident: e_1_2_2_28_1
– ident: e_1_2_2_42_1
  doi: 10.21105/joss.00205
– ident: e_1_2_2_20_1
  doi: 10.1145/2723372.2737792
– ident: e_1_2_2_38_1
– volume: 107
  volume-title: Proceedings of the ISTA (LNI), Heinrich C. Mayr and Dimitris Karagiannis (Eds.)
  year: 2007
  ident: e_1_2_2_48_1
– ident: e_1_2_2_19_1
  doi: 10.1186/1471-2105-8-3
– ident: e_1_2_2_10_1
  doi: 10.1109/ICDM.2006.103
– ident: e_1_2_2_31_1
– ident: e_1_2_2_22_1
  doi: 10.1145/276304.276312
– ident: e_1_2_2_6_1
  doi: 10.5555/1654758.1654774
– ident: e_1_2_2_15_1
– ident: e_1_2_2_1_1
  doi: 10.1201/b17320
– volume-title: Proceedings of the 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW’16)
  ident: e_1_2_2_3_1
– ident: e_1_2_2_23_1
  doi: 10.1109/TEVC.2006.877146
– volume-title: Evaluation of relative indexes for multi-objective clustering
  ident: e_1_2_2_4_1
– ident: e_1_2_2_17_1
  doi: 10.1016/j.patcog.2005.09.012
– ident: e_1_2_2_34_1
  doi: 10.1145/309847.309954
– ident: e_1_2_2_47_1
  doi: 10.1109/ICTAI.2004.50
– ident: e_1_2_2_11_1
  doi: 10.1016/j.patcog.2007.04.010
– ident: e_1_2_2_13_1
  doi: 10.1111/j.2517-6161.1977.tb01600.x
– ident: e_1_2_2_27_1
  doi: 10.1007/11590316_1
– ident: e_1_2_2_7_1
  doi: 10.1016/S0167-7152(96)00213-1
– volume-title: Mixture Models: Inference and Applications to Clustering
  year: 1988
  ident: e_1_2_2_43_1
– ident: e_1_2_2_37_1
  doi: 10.1093/comjnl/9.4.373
– ident: e_1_2_2_39_1
– ident: e_1_2_2_49_1
  doi: 10.1162/153244303321897735
– volume: 11
  start-page: 2837
  year: 2010
  ident: e_1_2_2_45_1
  article-title: Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance
  publication-title: Journal of Machine Learning Research
– ident: e_1_2_2_29_1
  doi: 10.1109/T-C.1973.223640
– ident: e_1_2_2_5_1
  doi: 10.1126/science.1198704
– volume-title: ISODATA: A Novel Method of Data Analysis and Pattern Classification. Technical Report
  year: 1965
  ident: e_1_2_2_2_1
– ident: e_1_2_2_21_1
  doi: 10.1145/1217299.1217303
– ident: e_1_2_2_32_1
  doi: 10.1109/92.748202
– ident: e_1_2_2_33_1
  doi: 10.1109/2.781637
– ident: e_1_2_2_25_1
  doi: 10.1016/j.patrec.2009.09.011
– volume-title: Clustering by fast search and find of density peaks. Science 344, 6191
  year: 2014
  ident: e_1_2_2_46_1
– ident: e_1_2_2_24_1
  doi: 10.1007/BF01908075
– volume-title: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD). Evangelos Simoudis, Jiawei Han, and Usama M. Fayyad (Eds.). AAAI Press, 226--231
  year: 1996
  ident: e_1_2_2_14_1
– ident: e_1_2_2_53_1
  doi: 10.1109/TPAMI.2002.1033218
– volume: 1
  volume-title: Proceedngs of the 5th Berkeley Symposium on Mathematical Statistics and Probability, L. M. Le Cam and J. Neyman (Eds.)
  year: 1967
  ident: e_1_2_2_41_1
– ident: e_1_2_2_40_1
  doi: 10.1109/TIT.1982.1056489
– ident: e_1_2_2_56_1
– ident: e_1_2_2_51_1
  doi: 10.1111/1467-9868.00293
– volume-title: Proceedings of Neural Information Processing Systems (NIPS’04)
  year: 2004
  ident: e_1_2_2_55_1
– ident: e_1_2_2_12_1
  doi: 10.1109/TPAMI.1979.4766909
– volume-title: Dubes
  year: 1988
  ident: e_1_2_2_26_1
– ident: e_1_2_2_8_1
– ident: e_1_2_2_9_1
  doi: 10.1080/03610927408827101
– ident: e_1_2_2_44_1
  doi: 10.1103/PhysRevE.69.026113
– ident: e_1_2_2_36_1
  doi: 10.1109/TSMC.1987.4309069
– ident: e_1_2_2_18_1
  doi: 10.1126/science.1136800
– volume-title: Data: An Introduction to Cluster Analysis
  year: 1990
  ident: e_1_2_2_35_1
– volume-title: Proceedings of the 19th Design Automation Conference (DAC’82)
  ident: e_1_2_2_16_1
– volume: 7
  start-page: 175
  year: 2005
  ident: e_1_2_2_50_1
  article-title: Fuzzy C-means algorithm with a point symmetry distance
  publication-title: International Journal of Fuzzy Systems
– ident: e_1_2_2_54_1
  doi: 10.1109/T-C.1971.223083
SSID ssj0055112
Score 2.3258312
Snippet Traditional clustering algorithms fail to produce human-like results when confronted with data of variable density, complex distributions, or in the presence...
SourceID crossref
SourceType Enrichment Source
Index Database
StartPage 1
Subtitle An Improved Graph-Based Clustering Algorithm
Title Chameleon 2
Volume 13
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVEBS
  databaseName: Inspec with Full Text
  customDbUrl:
  eissn: 1556-472X
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0055112
  issn: 1556-4681
  databaseCode: ADMLS
  dateStart: 20070301
  isFulltext: true
  titleUrlDefault: https://www.ebsco.com/products/research-databases/inspec-full-text
  providerName: EBSCOhost
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LT4NAEN5ovejBt7G-wsF4owL7AI5N1TRqvdgmvTUw3Y1GraahJvrrnWUXim0THxdCNrsE-MjMN8t8M4ScRixOfOmBq4SvXJYGyo1l6CEggQKK_FuFWuDcuRPtHrvu8_60o2CuLsnSBnwu1JX8B1UcQ1y1SvYPyJYXxQE8R3zxiAjj8VcYtx6SF3QbCGBQJZnNVke3fij6gOc_BMq9M_1LBnTe5oeRllhxmt3OHNts-m6ZNpSH6hOjHPs2eoNh6-OTYaHvNvHebh9oxVK5fWAtHhcuE6ZvSkNWxsK8z_nUTNK5z8HYPL_iPI3Of94sM13BgqLri8IFha9nHFKZJmhE03xgFy6TlQBtt1cjK82Lzu194XG5Jo15XVz7KEYcrZee26UV1lGhD91Nsm55v9M0IG6RJTnaJhtFTw3HmtgdslZi6gS7pHd12W21XduwwgU0i5mLoazyuUqVF6phkEjuYbgoI5EwJnUda6SDIRsmFPBOwQORMgiR8dJEAKMiBLpHaqPXkdwnDudDwSGiIqUxSzAqThVwGvvAIxAYBNbJWfFAA7DV3HVTkefBzEurE6ec-GYKmMxOOfh5yiFZnX48R6SWjSfyGNlYlp5YML4A7-kwNA
linkProvider EBSCOhost
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Chameleon+2&rft.jtitle=ACM+transactions+on+knowledge+discovery+from+data&rft.au=Barton%2C+Tomas&rft.au=Bruna%2C+Tomas&rft.au=Kordik%2C+Pavel&rft.date=2019-01-01&rft.issn=1556-4681&rft.eissn=1556-472X&rft.volume=13&rft.issue=1&rft.spage=1&rft.epage=27&rft_id=info:doi/10.1145%2F3299876&rft.externalDBID=n%2Fa&rft.externalDocID=10_1145_3299876
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1556-4681&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1556-4681&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1556-4681&client=summon