Circular Silhouette and a Fast Algorithm

Circular data clustering has recently been solved exactly in sub-quadratic time. However, the solution requires a given number of clusters; methods for choosing this number on linear data are inapplicable to circular data. To fill this gap, we introduce the circular silhouette to measure cluster qua...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on pattern analysis and machine intelligence Vol. 45; no. 11; pp. 14038 - 14044
Main Authors Chen, Yinong, Debnath, Tathagata, Cai, Andrew, Song, Mingzhou
Format Journal Article
LanguageEnglish
Published New York IEEE 01.11.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text
ISSN0162-8828
1939-3539
2160-9292
1939-3539
DOI10.1109/TPAMI.2023.3310495

Cover

Abstract Circular data clustering has recently been solved exactly in sub-quadratic time. However, the solution requires a given number of clusters; methods for choosing this number on linear data are inapplicable to circular data. To fill this gap, we introduce the circular silhouette to measure cluster quality and a fast algorithm to calculate the average silhouette width. The algorithm runs in linear time to the number of points on sorted data, instead of quadratic time by the silhouette definition. Empirically, it is over 3000 times faster than by silhouette definition on 1,000,000 circular data points in five clusters. On simulated datasets, the algorithm returned correct numbers of clusters. We identified clusters on round genomes of human mitochondria and bacteria. On sunspot activity data, we found changed solar-cycle patterns over the past two centuries. Using the circular silhouette not only eliminates the subjective selection of number of clusters, but is also scalable to big circular and periodic data abundant in science, engineering, and medicine.
AbstractList Circular data clustering has recently been solved exactly in sub-quadratic time. However, the solution requires a given number of clusters; methods for choosing this number on linear data are inapplicable to circular data. To fill this gap, we introduce the circular silhouette to measure cluster quality and a fast algorithm to calculate the average silhouette width. The algorithm runs in linear time to the number of points on sorted data, instead of quadratic time by the silhouette definition. Empirically, it is over 3000 times faster than by silhouette definition on 1,000,000 circular data points in five clusters. On simulated datasets, the algorithm returned correct numbers of clusters. We identified clusters on round genomes of human mitochondria and bacteria. On sunspot activity data, we found changed solar-cycle patterns over the past two centuries. Using the circular silhouette not only eliminates the subjective selection of number of clusters, but is also scalable to big circular and periodic data abundant in science, engineering, and medicine.
Circular data clustering has recently been solved exactly in sub-quadratic time. However, the solution requires a given number of clusters; methods for choosing this number on linear data are inapplicable to circular data. To fill this gap, we introduce the circular silhouette to measure cluster quality and a fast algorithm to calculate the average silhouette width. The algorithm runs in linear time to the number of points on sorted data, instead of quadratic time by the silhouette definition. Empirically, it is over 3000 times faster than by silhouette definition on 1,000,000 circular data points in five clusters. On simulated datasets, the algorithm returned correct numbers of clusters. We identified clusters on round genomes of human mitochondria and bacteria. On sunspot activity data, we found changed solar-cycle patterns over the past two centuries. Using the circular silhouette not only eliminates the subjective selection of number of clusters, but is also scalable to big circular and periodic data abundant in science, engineering, and medicine.Circular data clustering has recently been solved exactly in sub-quadratic time. However, the solution requires a given number of clusters; methods for choosing this number on linear data are inapplicable to circular data. To fill this gap, we introduce the circular silhouette to measure cluster quality and a fast algorithm to calculate the average silhouette width. The algorithm runs in linear time to the number of points on sorted data, instead of quadratic time by the silhouette definition. Empirically, it is over 3000 times faster than by silhouette definition on 1,000,000 circular data points in five clusters. On simulated datasets, the algorithm returned correct numbers of clusters. We identified clusters on round genomes of human mitochondria and bacteria. On sunspot activity data, we found changed solar-cycle patterns over the past two centuries. Using the circular silhouette not only eliminates the subjective selection of number of clusters, but is also scalable to big circular and periodic data abundant in science, engineering, and medicine.
Author Chen, Yinong
Song, Mingzhou
Debnath, Tathagata
Cai, Andrew
Author_xml – sequence: 1
  givenname: Yinong
  orcidid: 0000-0003-1641-1712
  surname: Chen
  fullname: Chen, Yinong
  email: ychen557@jh.edu
  organization: Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
– sequence: 2
  givenname: Tathagata
  orcidid: 0000-0001-6445-275X
  surname: Debnath
  fullname: Debnath, Tathagata
  email: tirtha.debnath@gmail.com
  organization: Department of Computer Science, New Mexico State University, Las Cruces, NM, USA
– sequence: 3
  givenname: Andrew
  orcidid: 0009-0000-9444-6027
  surname: Cai
  fullname: Cai, Andrew
  email: cai.andrew1226@gmail.com
  organization: School of Electrical and Computer Engineering, Cornell University, Ithaca, NY, USA
– sequence: 4
  givenname: Mingzhou
  orcidid: 0000-0002-6883-6547
  surname: Song
  fullname: Song, Mingzhou
  email: joemsong@nmsu.edu
  organization: Department of Computer Science, New Mexico State University, Las Cruces, NM, USA
BookMark eNp9kD1PwzAQQC1URD_gDyCGSCxdUuw7x4nHqqJQqQgkyhw5jktdpUmxnYF_T0o7oA5Mt7x3d3pD0qub2hByy-iEMSofVm_Tl8UEKOAEkVEukwsyACZoLEFCjwwoExBnGWR9MvR-SynjCcUr0sdUJIzLdEDGM-t0WykXvdtq07QmBBOpuoxUNFc-RNPqs3E2bHbX5HKtKm9uTnNEPuaPq9lzvHx9Wsymy1gjyBAXHIyAJKVlyYUADUrIErFMuTSqAGTKMI0aRGLQ0LTUooA1y5AVLAUsJI7I-Lh375qv1viQ76zXpqpUbZrW55AJymkiGe3Q-zN027Su7r7rqBQ4R5keKDhS2jXeO7PO987ulPvOGc0PHfPfjvmhY37q2EnZmaRtUME2dXDKVv-rd0fVGmP-3AIUSYb4A5xtfcY
CODEN ITPIDJ
CitedBy_id crossref_primary_10_2478_ias_2024_0002
Cites_doi 10.1007/s00500-012-0802-z
10.3389/fspas.2018.00038
10.1093/bioinformatics/btaa613
10.1007/s00382-016-3053-3
10.1038/13779
10.18637/jss.v050.i10
10.1109/IWACI.2010.5585203
10.1109/ICDCSW.2011.20
10.12968/bjhc.2019.0067
10.1016/j.imavis.2019.04.009
10.1128/genomeA.00552-15
10.1109/TCBB.2021.3077573
10.1007/s10044-014-0418-2
10.1198/016214503000000666
10.1016/0377-0427(87)90125-7
10.1109/RADAR.2010.5494540
10.1007/978-3-319-00032-9_20
10.1080/02664763.2011.626850
10.1177/0049124104268644
10.1016/j.patcog.2016.12.016
10.1002/sim.5589
10.1128/genomeA.01466-17
10.1109/TFUZZ.2020.2966182
10.1016/j.mehy.2007.05.053
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023
DBID 97E
RIA
RIE
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
7X8
DOI 10.1109/TPAMI.2023.3310495
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
MEDLINE - Academic
DatabaseTitle CrossRef
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
MEDLINE - Academic
DatabaseTitleList
MEDLINE - Academic
Technology Research Database
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Computer Science
EISSN 2160-9292
1939-3539
EndPage 14044
ExternalDocumentID 10_1109_TPAMI_2023_3310495
10236583
Genre orig-research
GrantInformation_xml – fundername: National Science Foundation
  grantid: 1661331
  funderid: 10.13039/501100008982
GroupedDBID ---
-DZ
-~X
.DC
0R~
29I
4.4
53G
5GY
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABQJQ
ABVLG
ACGFO
ACGFS
ACIWK
ACNCT
AENEX
AGQYO
AHBIQ
AKJIK
AKQYR
ALMA_UNASSIGNED_HOLDINGS
ASUFR
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
E.L
EBS
EJD
F5P
HZ~
IEDLZ
IFIPE
IPLJI
JAVBF
LAI
M43
MS~
O9-
OCL
P2P
PQQKQ
RIA
RIE
RNS
RXW
TAE
TN5
UHB
~02
5VS
9M8
AAYXX
ABFSI
ADRHT
AETEA
AETIX
AGSQL
AI.
AIBXA
ALLEH
CITATION
FA8
H~9
IBMZZ
ICLAB
IFJZH
RNI
RZB
VH1
XJT
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
7X8
ID FETCH-LOGICAL-c329t-b42e62570dd4662c2a69d33d749eab231ae1c3c265e3e07dc6b2f1831b1723b93
IEDL.DBID RIE
ISSN 0162-8828
1939-3539
IngestDate Wed Oct 01 13:09:32 EDT 2025
Mon Jun 30 06:12:27 EDT 2025
Thu Apr 24 23:07:33 EDT 2025
Wed Oct 01 02:24:13 EDT 2025
Wed Aug 27 02:50:37 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 11
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c329t-b42e62570dd4662c2a69d33d749eab231ae1c3c265e3e07dc6b2f1831b1723b93
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ORCID 0000-0003-1641-1712
0009-0000-9444-6027
0000-0002-6883-6547
0000-0001-6445-275X
PMID 37651497
PQID 2872443970
PQPubID 85458
PageCount 7
ParticipantIDs proquest_miscellaneous_2860405910
crossref_primary_10_1109_TPAMI_2023_3310495
crossref_citationtrail_10_1109_TPAMI_2023_3310495
proquest_journals_2872443970
ieee_primary_10236583
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2023-11-01
PublicationDateYYYYMMDD 2023-11-01
PublicationDate_xml – month: 11
  year: 2023
  text: 2023-11-01
  day: 01
PublicationDecade 2020
PublicationPlace New York
PublicationPlace_xml – name: New York
PublicationTitle IEEE transactions on pattern analysis and machine intelligence
PublicationTitleAbbrev TPAMI
PublicationYear 2023
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref13
ref12
ref15
ref14
ref11
ref10
ref2
ref1
ref17
ref16
ref19
ref18
(ref24) 0
ref23
ref25
ref20
ref22
ref21
ref8
ref7
ref9
ref4
ref3
ref6
ref5
References_xml – ident: ref6
  doi: 10.1007/s00500-012-0802-z
– ident: ref18
  doi: 10.3389/fspas.2018.00038
– ident: ref25
  doi: 10.1093/bioinformatics/btaa613
– ident: ref2
  doi: 10.1007/s00382-016-3053-3
– ident: ref21
  doi: 10.1038/13779
– ident: ref4
  doi: 10.18637/jss.v050.i10
– ident: ref5
  doi: 10.1109/IWACI.2010.5585203
– ident: ref20
  doi: 10.1109/ICDCSW.2011.20
– ident: ref3
  doi: 10.12968/bjhc.2019.0067
– ident: ref17
  doi: 10.1016/j.imavis.2019.04.009
– ident: ref23
  doi: 10.1128/genomeA.00552-15
– ident: ref12
  doi: 10.1109/TCBB.2021.3077573
– ident: ref10
  doi: 10.1007/s10044-014-0418-2
– ident: ref16
  doi: 10.1198/016214503000000666
– ident: ref13
  doi: 10.1016/0377-0427(87)90125-7
– ident: ref1
  doi: 10.1109/RADAR.2010.5494540
– ident: ref8
  doi: 10.1007/978-3-319-00032-9_20
– ident: ref7
  doi: 10.1080/02664763.2011.626850
– ident: ref15
  doi: 10.1177/0049124104268644
– year: 0
  ident: ref24
  article-title: (1818-2019) The international sunspot number
– ident: ref11
  doi: 10.1016/j.patcog.2016.12.016
– ident: ref9
  doi: 10.1002/sim.5589
– ident: ref22
  doi: 10.1128/genomeA.01466-17
– ident: ref14
  doi: 10.1109/TFUZZ.2020.2966182
– ident: ref19
  doi: 10.1016/j.mehy.2007.05.053
SSID ssj0014503
Score 2.4495034
Snippet Circular data clustering has recently been solved exactly in sub-quadratic time. However, the solution requires a given number of clusters; methods for...
SourceID proquest
crossref
ieee
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 14038
SubjectTerms Algorithms
Arrays
Bacterial genome
circular clustering
circular genome
Clustering
Clustering algorithms
Data points
Elbow
Genomics
Hidden Markov models
Mathematical models
Microorganisms
mitochondria
periodic data
silhouette
solar cycle
Title Circular Silhouette and a Fast Algorithm
URI https://ieeexplore.ieee.org/document/10236583
https://www.proquest.com/docview/2872443970
https://www.proquest.com/docview/2860405910
Volume 45
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVIEE
  databaseName: IEEE Electronic Library (IEL)
  customDbUrl:
  eissn: 2160-9292
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0014503
  issn: 0162-8828
  databaseCode: RIE
  dateStart: 19790101
  isFulltext: true
  titleUrlDefault: https://ieeexplore.ieee.org/
  providerName: IEEE
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS8QwEB7Ug-jBt7i-qOBBkNa2SbPNcRGXVVgRVPBW0mZWF9dW1vbir3eStsuiKN4KTUqbyWS-6Tw-gFMTeuMoY1eMpHZ5pH1XjXDkkqXVvpbINRpHcXgrBo_85il6aorVbS0MItrkM_TMpY3l6yKrzK-yC9NmgCwmW4TFbizqYq1ZyIBHlgaZIAypOPkRbYWMLy8e7nrDa88QhXuM4Az5BCuwTJpFYME0e5ozSJZh5cexbG1Nfx1u27esU0xevapMvezzWwPHf3_GBqw1qNPp1dtkExYw34L1ltHBaRR8C1bn2hNuw9nleGrTVJ378eSlqExWkKNy7Sinrz5Kpzd5Lqbj8uVtBx77Vw-XA7dhVnAzFsrSTXmIwvDXac2FCLNQCakZ010uUaUE-RQGGctCESFDv6szkYYjUv4gJbzDUsl2YSkvctwDJ1aSqSDUvqDZ6PNUBKh4qLSSMcdYdyBolzfJmrbjhv1iklj3w5eJlU5ipJM00unA-WzOe91048_RO2aN50bWy9uBw1aMSaOYHwk5iARoCIT5HTiZ3SaVMnESlWNRmTGCjraIgNT-L48-gBXzBnVN4iEsldMKjwiclOmx3ZRfwCfbEw
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3dS-NAEB_8APUe_JarXxfhHg4kMclu1u5jEUv1bBGs4FvYZKda7CVHTV78653dJKUoim-B7IZkZ2fnN5mPH8BvE3rjKNuuGEnt8kj7rhrhyCVLq30tkWs0jmJ_IHr3_PoheqiL1W0tDCLa5DP0zKWN5es8Lc2vsjPTZoAsJluE5YhzHlXlWrOgAY8sETKBGFJy8iSaGhlfng1vO_0rz1CFe4wADXkFa7BCukVwwbR7mjNJlmPlw8FsrU13AwbNe1ZJJs9eWSRe-vquheO3P2QT1mvc6XSqjbIFC5htw0bD6eDUKr4NP-YaFO7An4vx1CaqOnfjyVNemrwgR2XaUU5XvRROZ_KYT8fF079duO9eDi96bs2t4KYslIWb8BCFYbDTmgsRpqESUjOmz7lElRDoUxikLA1FhAz9c52KJByR-gcJIR6WSLYHS1me4U9w2koyFYTaFzQbfZ6IABUPlVayzbGtWxA0yxundeNxw38xia0D4svYSic20olr6bTgdDbnf9V248vRu2aN50ZWy9uCw0aMca2aLzG5iARpCIb5LTiZ3SalMpESlWFemjGCDreIoNT-J4_-Bau9Yf8mvrka_D2ANfM2VYXiISwV0xKPCKoUybHdoG9oFd5g
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Circular+Silhouette+and+a+Fast+Algorithm&rft.jtitle=IEEE+transactions+on+pattern+analysis+and+machine+intelligence&rft.au=Chen%2C+Yinong&rft.au=Debnath%2C+Tathagata&rft.au=Cai%2C+Andrew&rft.au=Song%2C+Mingzhou&rft.date=2023-11-01&rft.pub=The+Institute+of+Electrical+and+Electronics+Engineers%2C+Inc.+%28IEEE%29&rft.issn=0162-8828&rft.eissn=1939-3539&rft.volume=45&rft.issue=11&rft.spage=14038&rft_id=info:doi/10.1109%2FTPAMI.2023.3310495&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0162-8828&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0162-8828&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0162-8828&client=summon