Circular Silhouette and a Fast Algorithm
Circular data clustering has recently been solved exactly in sub-quadratic time. However, the solution requires a given number of clusters; methods for choosing this number on linear data are inapplicable to circular data. To fill this gap, we introduce the circular silhouette to measure cluster qua...
Saved in:
| Published in | IEEE transactions on pattern analysis and machine intelligence Vol. 45; no. 11; pp. 14038 - 14044 |
|---|---|
| Main Authors | , , , |
| Format | Journal Article |
| Language | English |
| Published |
New York
IEEE
01.11.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Subjects | |
| Online Access | Get full text |
| ISSN | 0162-8828 1939-3539 2160-9292 1939-3539 |
| DOI | 10.1109/TPAMI.2023.3310495 |
Cover
| Abstract | Circular data clustering has recently been solved exactly in sub-quadratic time. However, the solution requires a given number of clusters; methods for choosing this number on linear data are inapplicable to circular data. To fill this gap, we introduce the circular silhouette to measure cluster quality and a fast algorithm to calculate the average silhouette width. The algorithm runs in linear time to the number of points on sorted data, instead of quadratic time by the silhouette definition. Empirically, it is over 3000 times faster than by silhouette definition on 1,000,000 circular data points in five clusters. On simulated datasets, the algorithm returned correct numbers of clusters. We identified clusters on round genomes of human mitochondria and bacteria. On sunspot activity data, we found changed solar-cycle patterns over the past two centuries. Using the circular silhouette not only eliminates the subjective selection of number of clusters, but is also scalable to big circular and periodic data abundant in science, engineering, and medicine. |
|---|---|
| AbstractList | Circular data clustering has recently been solved exactly in sub-quadratic time. However, the solution requires a given number of clusters; methods for choosing this number on linear data are inapplicable to circular data. To fill this gap, we introduce the circular silhouette to measure cluster quality and a fast algorithm to calculate the average silhouette width. The algorithm runs in linear time to the number of points on sorted data, instead of quadratic time by the silhouette definition. Empirically, it is over 3000 times faster than by silhouette definition on 1,000,000 circular data points in five clusters. On simulated datasets, the algorithm returned correct numbers of clusters. We identified clusters on round genomes of human mitochondria and bacteria. On sunspot activity data, we found changed solar-cycle patterns over the past two centuries. Using the circular silhouette not only eliminates the subjective selection of number of clusters, but is also scalable to big circular and periodic data abundant in science, engineering, and medicine. Circular data clustering has recently been solved exactly in sub-quadratic time. However, the solution requires a given number of clusters; methods for choosing this number on linear data are inapplicable to circular data. To fill this gap, we introduce the circular silhouette to measure cluster quality and a fast algorithm to calculate the average silhouette width. The algorithm runs in linear time to the number of points on sorted data, instead of quadratic time by the silhouette definition. Empirically, it is over 3000 times faster than by silhouette definition on 1,000,000 circular data points in five clusters. On simulated datasets, the algorithm returned correct numbers of clusters. We identified clusters on round genomes of human mitochondria and bacteria. On sunspot activity data, we found changed solar-cycle patterns over the past two centuries. Using the circular silhouette not only eliminates the subjective selection of number of clusters, but is also scalable to big circular and periodic data abundant in science, engineering, and medicine.Circular data clustering has recently been solved exactly in sub-quadratic time. However, the solution requires a given number of clusters; methods for choosing this number on linear data are inapplicable to circular data. To fill this gap, we introduce the circular silhouette to measure cluster quality and a fast algorithm to calculate the average silhouette width. The algorithm runs in linear time to the number of points on sorted data, instead of quadratic time by the silhouette definition. Empirically, it is over 3000 times faster than by silhouette definition on 1,000,000 circular data points in five clusters. On simulated datasets, the algorithm returned correct numbers of clusters. We identified clusters on round genomes of human mitochondria and bacteria. On sunspot activity data, we found changed solar-cycle patterns over the past two centuries. Using the circular silhouette not only eliminates the subjective selection of number of clusters, but is also scalable to big circular and periodic data abundant in science, engineering, and medicine. |
| Author | Chen, Yinong Song, Mingzhou Debnath, Tathagata Cai, Andrew |
| Author_xml | – sequence: 1 givenname: Yinong orcidid: 0000-0003-1641-1712 surname: Chen fullname: Chen, Yinong email: ychen557@jh.edu organization: Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA – sequence: 2 givenname: Tathagata orcidid: 0000-0001-6445-275X surname: Debnath fullname: Debnath, Tathagata email: tirtha.debnath@gmail.com organization: Department of Computer Science, New Mexico State University, Las Cruces, NM, USA – sequence: 3 givenname: Andrew orcidid: 0009-0000-9444-6027 surname: Cai fullname: Cai, Andrew email: cai.andrew1226@gmail.com organization: School of Electrical and Computer Engineering, Cornell University, Ithaca, NY, USA – sequence: 4 givenname: Mingzhou orcidid: 0000-0002-6883-6547 surname: Song fullname: Song, Mingzhou email: joemsong@nmsu.edu organization: Department of Computer Science, New Mexico State University, Las Cruces, NM, USA |
| BookMark | eNp9kD1PwzAQQC1URD_gDyCGSCxdUuw7x4nHqqJQqQgkyhw5jktdpUmxnYF_T0o7oA5Mt7x3d3pD0qub2hByy-iEMSofVm_Tl8UEKOAEkVEukwsyACZoLEFCjwwoExBnGWR9MvR-SynjCcUr0sdUJIzLdEDGM-t0WykXvdtq07QmBBOpuoxUNFc-RNPqs3E2bHbX5HKtKm9uTnNEPuaPq9lzvHx9Wsymy1gjyBAXHIyAJKVlyYUADUrIErFMuTSqAGTKMI0aRGLQ0LTUooA1y5AVLAUsJI7I-Lh375qv1viQ76zXpqpUbZrW55AJymkiGe3Q-zN027Su7r7rqBQ4R5keKDhS2jXeO7PO987ulPvOGc0PHfPfjvmhY37q2EnZmaRtUME2dXDKVv-rd0fVGmP-3AIUSYb4A5xtfcY |
| CODEN | ITPIDJ |
| CitedBy_id | crossref_primary_10_2478_ias_2024_0002 |
| Cites_doi | 10.1007/s00500-012-0802-z 10.3389/fspas.2018.00038 10.1093/bioinformatics/btaa613 10.1007/s00382-016-3053-3 10.1038/13779 10.18637/jss.v050.i10 10.1109/IWACI.2010.5585203 10.1109/ICDCSW.2011.20 10.12968/bjhc.2019.0067 10.1016/j.imavis.2019.04.009 10.1128/genomeA.00552-15 10.1109/TCBB.2021.3077573 10.1007/s10044-014-0418-2 10.1198/016214503000000666 10.1016/0377-0427(87)90125-7 10.1109/RADAR.2010.5494540 10.1007/978-3-319-00032-9_20 10.1080/02664763.2011.626850 10.1177/0049124104268644 10.1016/j.patcog.2016.12.016 10.1002/sim.5589 10.1128/genomeA.01466-17 10.1109/TFUZZ.2020.2966182 10.1016/j.mehy.2007.05.053 |
| ContentType | Journal Article |
| Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023 |
| Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023 |
| DBID | 97E RIA RIE AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D 7X8 |
| DOI | 10.1109/TPAMI.2023.3310495 |
| DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional MEDLINE - Academic |
| DatabaseTitle | CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional MEDLINE - Academic |
| DatabaseTitleList | MEDLINE - Academic Technology Research Database |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering Computer Science |
| EISSN | 2160-9292 1939-3539 |
| EndPage | 14044 |
| ExternalDocumentID | 10_1109_TPAMI_2023_3310495 10236583 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: National Science Foundation grantid: 1661331 funderid: 10.13039/501100008982 |
| GroupedDBID | --- -DZ -~X .DC 0R~ 29I 4.4 53G 5GY 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACGFO ACGFS ACIWK ACNCT AENEX AGQYO AHBIQ AKJIK AKQYR ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 E.L EBS EJD F5P HZ~ IEDLZ IFIPE IPLJI JAVBF LAI M43 MS~ O9- OCL P2P PQQKQ RIA RIE RNS RXW TAE TN5 UHB ~02 5VS 9M8 AAYXX ABFSI ADRHT AETEA AETIX AGSQL AI. AIBXA ALLEH CITATION FA8 H~9 IBMZZ ICLAB IFJZH RNI RZB VH1 XJT 7SC 7SP 8FD JQ2 L7M L~C L~D 7X8 |
| ID | FETCH-LOGICAL-c329t-b42e62570dd4662c2a69d33d749eab231ae1c3c265e3e07dc6b2f1831b1723b93 |
| IEDL.DBID | RIE |
| ISSN | 0162-8828 1939-3539 |
| IngestDate | Wed Oct 01 13:09:32 EDT 2025 Mon Jun 30 06:12:27 EDT 2025 Thu Apr 24 23:07:33 EDT 2025 Wed Oct 01 02:24:13 EDT 2025 Wed Aug 27 02:50:37 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 11 |
| Language | English |
| License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037 |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c329t-b42e62570dd4662c2a69d33d749eab231ae1c3c265e3e07dc6b2f1831b1723b93 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 |
| ORCID | 0000-0003-1641-1712 0009-0000-9444-6027 0000-0002-6883-6547 0000-0001-6445-275X |
| PMID | 37651497 |
| PQID | 2872443970 |
| PQPubID | 85458 |
| PageCount | 7 |
| ParticipantIDs | proquest_miscellaneous_2860405910 crossref_primary_10_1109_TPAMI_2023_3310495 crossref_citationtrail_10_1109_TPAMI_2023_3310495 proquest_journals_2872443970 ieee_primary_10236583 |
| ProviderPackageCode | CITATION AAYXX |
| PublicationCentury | 2000 |
| PublicationDate | 2023-11-01 |
| PublicationDateYYYYMMDD | 2023-11-01 |
| PublicationDate_xml | – month: 11 year: 2023 text: 2023-11-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | New York |
| PublicationPlace_xml | – name: New York |
| PublicationTitle | IEEE transactions on pattern analysis and machine intelligence |
| PublicationTitleAbbrev | TPAMI |
| PublicationYear | 2023 |
| Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| References | ref13 ref12 ref15 ref14 ref11 ref10 ref2 ref1 ref17 ref16 ref19 ref18 (ref24) 0 ref23 ref25 ref20 ref22 ref21 ref8 ref7 ref9 ref4 ref3 ref6 ref5 |
| References_xml | – ident: ref6 doi: 10.1007/s00500-012-0802-z – ident: ref18 doi: 10.3389/fspas.2018.00038 – ident: ref25 doi: 10.1093/bioinformatics/btaa613 – ident: ref2 doi: 10.1007/s00382-016-3053-3 – ident: ref21 doi: 10.1038/13779 – ident: ref4 doi: 10.18637/jss.v050.i10 – ident: ref5 doi: 10.1109/IWACI.2010.5585203 – ident: ref20 doi: 10.1109/ICDCSW.2011.20 – ident: ref3 doi: 10.12968/bjhc.2019.0067 – ident: ref17 doi: 10.1016/j.imavis.2019.04.009 – ident: ref23 doi: 10.1128/genomeA.00552-15 – ident: ref12 doi: 10.1109/TCBB.2021.3077573 – ident: ref10 doi: 10.1007/s10044-014-0418-2 – ident: ref16 doi: 10.1198/016214503000000666 – ident: ref13 doi: 10.1016/0377-0427(87)90125-7 – ident: ref1 doi: 10.1109/RADAR.2010.5494540 – ident: ref8 doi: 10.1007/978-3-319-00032-9_20 – ident: ref7 doi: 10.1080/02664763.2011.626850 – ident: ref15 doi: 10.1177/0049124104268644 – year: 0 ident: ref24 article-title: (1818-2019) The international sunspot number – ident: ref11 doi: 10.1016/j.patcog.2016.12.016 – ident: ref9 doi: 10.1002/sim.5589 – ident: ref22 doi: 10.1128/genomeA.01466-17 – ident: ref14 doi: 10.1109/TFUZZ.2020.2966182 – ident: ref19 doi: 10.1016/j.mehy.2007.05.053 |
| SSID | ssj0014503 |
| Score | 2.4495034 |
| Snippet | Circular data clustering has recently been solved exactly in sub-quadratic time. However, the solution requires a given number of clusters; methods for... |
| SourceID | proquest crossref ieee |
| SourceType | Aggregation Database Enrichment Source Index Database Publisher |
| StartPage | 14038 |
| SubjectTerms | Algorithms Arrays Bacterial genome circular clustering circular genome Clustering Clustering algorithms Data points Elbow Genomics Hidden Markov models Mathematical models Microorganisms mitochondria periodic data silhouette solar cycle |
| Title | Circular Silhouette and a Fast Algorithm |
| URI | https://ieeexplore.ieee.org/document/10236583 https://www.proquest.com/docview/2872443970 https://www.proquest.com/docview/2860405910 |
| Volume | 45 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVIEE databaseName: IEEE Electronic Library (IEL) customDbUrl: eissn: 2160-9292 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0014503 issn: 0162-8828 databaseCode: RIE dateStart: 19790101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS8QwEB7Ug-jBt7i-qOBBkNa2SbPNcRGXVVgRVPBW0mZWF9dW1vbir3eStsuiKN4KTUqbyWS-6Tw-gFMTeuMoY1eMpHZ5pH1XjXDkkqXVvpbINRpHcXgrBo_85il6aorVbS0MItrkM_TMpY3l6yKrzK-yC9NmgCwmW4TFbizqYq1ZyIBHlgaZIAypOPkRbYWMLy8e7nrDa88QhXuM4Az5BCuwTJpFYME0e5ozSJZh5cexbG1Nfx1u27esU0xevapMvezzWwPHf3_GBqw1qNPp1dtkExYw34L1ltHBaRR8C1bn2hNuw9nleGrTVJ378eSlqExWkKNy7Sinrz5Kpzd5Lqbj8uVtBx77Vw-XA7dhVnAzFsrSTXmIwvDXac2FCLNQCakZ010uUaUE-RQGGctCESFDv6szkYYjUv4gJbzDUsl2YSkvctwDJ1aSqSDUvqDZ6PNUBKh4qLSSMcdYdyBolzfJmrbjhv1iklj3w5eJlU5ipJM00unA-WzOe91048_RO2aN50bWy9uBw1aMSaOYHwk5iARoCIT5HTiZ3SaVMnESlWNRmTGCjraIgNT-L48-gBXzBnVN4iEsldMKjwiclOmx3ZRfwCfbEw |
| linkProvider | IEEE |
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3dS-NAEB_8APUe_JarXxfhHg4kMclu1u5jEUv1bBGs4FvYZKda7CVHTV78653dJKUoim-B7IZkZ2fnN5mPH8BvE3rjKNuuGEnt8kj7rhrhyCVLq30tkWs0jmJ_IHr3_PoheqiL1W0tDCLa5DP0zKWN5es8Lc2vsjPTZoAsJluE5YhzHlXlWrOgAY8sETKBGFJy8iSaGhlfng1vO_0rz1CFe4wADXkFa7BCukVwwbR7mjNJlmPlw8FsrU13AwbNe1ZJJs9eWSRe-vquheO3P2QT1mvc6XSqjbIFC5htw0bD6eDUKr4NP-YaFO7An4vx1CaqOnfjyVNemrwgR2XaUU5XvRROZ_KYT8fF079duO9eDi96bs2t4KYslIWb8BCFYbDTmgsRpqESUjOmz7lElRDoUxikLA1FhAz9c52KJByR-gcJIR6WSLYHS1me4U9w2koyFYTaFzQbfZ6IABUPlVayzbGtWxA0yxundeNxw38xia0D4svYSic20olr6bTgdDbnf9V248vRu2aN50ZWy9uCw0aMca2aLzG5iARpCIb5LTiZ3SalMpESlWFemjGCDreIoNT-J4_-Bau9Yf8mvrka_D2ANfM2VYXiISwV0xKPCKoUybHdoG9oFd5g |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Circular+Silhouette+and+a+Fast+Algorithm&rft.jtitle=IEEE+transactions+on+pattern+analysis+and+machine+intelligence&rft.au=Chen%2C+Yinong&rft.au=Debnath%2C+Tathagata&rft.au=Cai%2C+Andrew&rft.au=Song%2C+Mingzhou&rft.date=2023-11-01&rft.pub=The+Institute+of+Electrical+and+Electronics+Engineers%2C+Inc.+%28IEEE%29&rft.issn=0162-8828&rft.eissn=1939-3539&rft.volume=45&rft.issue=11&rft.spage=14038&rft_id=info:doi/10.1109%2FTPAMI.2023.3310495&rft.externalDBID=NO_FULL_TEXT |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0162-8828&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0162-8828&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0162-8828&client=summon |