K-sets and k-swaps algorithms for clustering sets
•Novel K-sets algorithm that generalizes k-means to work for sets data.•Novel K-swaps algorithm to avoid local minima.•Benchmark for evaluating clustering of sets data.•Case study with clustering patients based on their ICD-10 diagnoses. We present two new clustering algorithms called k-sets and k-s...
Saved in:
| Published in | Pattern recognition Vol. 139; p. 109454 |
|---|---|
| Main Authors | , |
| Format | Journal Article |
| Language | English |
| Published |
Elsevier Ltd
01.07.2023
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 0031-3203 1873-5142 1873-5142 |
| DOI | 10.1016/j.patcog.2023.109454 |
Cover
| Abstract | •Novel K-sets algorithm that generalizes k-means to work for sets data.•Novel K-swaps algorithm to avoid local minima.•Benchmark for evaluating clustering of sets data.•Case study with clustering patients based on their ICD-10 diagnoses.
We present two new clustering algorithms called k-sets and k-swaps for data where each object is a set. First, we define the mean of the sets in a cluster, and the distance between a set and the mean. We then derive the k-sets algorithm from the principles of classical k-means so that it repeats the assignment and update steps until convergence. To the best of our knowledge, the proposed algorithm is the first k-means based algorithm for this kind of data. We adopt the idea also into random swap algorithm, which is a wrapper around the k-means that avoids local minima. This variant is called k-swaps. We show by experiments that this algorithm provides more accurate clustering results than k-medoids and other competitive methods. |
|---|---|
| AbstractList | •Novel K-sets algorithm that generalizes k-means to work for sets data.•Novel K-swaps algorithm to avoid local minima.•Benchmark for evaluating clustering of sets data.•Case study with clustering patients based on their ICD-10 diagnoses.
We present two new clustering algorithms called k-sets and k-swaps for data where each object is a set. First, we define the mean of the sets in a cluster, and the distance between a set and the mean. We then derive the k-sets algorithm from the principles of classical k-means so that it repeats the assignment and update steps until convergence. To the best of our knowledge, the proposed algorithm is the first k-means based algorithm for this kind of data. We adopt the idea also into random swap algorithm, which is a wrapper around the k-means that avoids local minima. This variant is called k-swaps. We show by experiments that this algorithm provides more accurate clustering results than k-medoids and other competitive methods. |
| ArticleNumber | 109454 |
| Author | Fränti, Pasi Rezaei, Mohammad |
| Author_xml | – sequence: 1 givenname: Mohammad surname: Rezaei fullname: Rezaei, Mohammad email: rezaei@cs.uef.fi – sequence: 2 givenname: Pasi orcidid: 0000-0002-9554-2827 surname: Fränti fullname: Fränti, Pasi email: franti@cs.uef.fi |
| BookMark | eNqNkMtOAzEMRSNUJNrCH7CYH5jiPKbJsEBCFS9RiQ2sozSPkjLNjJKUqn_PVMOKBbCyZftc-3qCRqENFqFLDDMMeH61mXUq63Y9I0BoX6pZxU7QGAtOywozMkJjAIpLSoCeoUlKGwDM-8YY4ecy2ZwKFUzxUaa96vq8WbfR5_dtKlwbC93sUrbRh3VxHD1Hp041yV58xyl6u797XTyWy5eHp8XtstS0IrlkZmUqV5MKas6tASIYd0owYficC-aEEf2dFghXmgK4lcNzarjRoKggoqZTVA26u9Cpw141jeyi36p4kBjk0bfcyMG3PPqWg--eux44HduUonVS-6yyb0OOyjd_wewH_M-dNwNm-4d8ehtl0t4GbY2PVmdpWv-7wBcsSYhi |
| CitedBy_id | crossref_primary_10_1007_s10489_023_04843_7 crossref_primary_10_3390_a16070349 crossref_primary_10_3390_a16120572 crossref_primary_10_1016_j_patcog_2023_109763 crossref_primary_10_3934_aci_2024016 crossref_primary_10_1016_j_ecolind_2024_112629 crossref_primary_10_3934_aci_2023008 crossref_primary_10_1109_JSTARS_2023_3296876 |
| Cites_doi | 10.1016/0167-8655(95)00075-R 10.3390/e16063273 10.1038/s41598-017-15647-4 10.1016/j.patcog.2022.109144 10.1007/s10115-021-01623-y 10.1016/j.eswa.2019.03.048 10.1109/TPAMI.2004.1265860 10.1016/j.simpat.2022.102712 10.1016/j.patcog.2022.109269 10.1007/s10707-019-00372-z 10.1016/j.patcog.2019.04.014 10.1023/A:1009769707641 10.1109/TKDE.2016.2551240 10.1109/83.855429 10.1109/ACCESS.2020.2993295 10.1109/TPAMI.2006.227 10.1007/s10618-008-0123-0 10.1016/j.eswa.2008.01.039 10.1007/s100440070007 10.2196/35422 10.1186/1741-7015-11-194 10.1109/ACCESS.2019.2936630 10.1038/ncomms5022 10.1016/j.patcog.2020.107625 10.1016/j.patrec.2016.03.007 10.1007/s10489-018-1238-7 10.1186/s40537-018-0122-y |
| ContentType | Journal Article |
| Copyright | 2023 The Author(s) |
| Copyright_xml | – notice: 2023 The Author(s) |
| DBID | 6I. AAFTH AAYXX CITATION ADTOC UNPAY |
| DOI | 10.1016/j.patcog.2023.109454 |
| DatabaseName | ScienceDirect Open Access Titles Elsevier:ScienceDirect:Open Access CrossRef Unpaywall for CDI: Periodical Content Unpaywall |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: UNPAY name: Unpaywall url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/ sourceTypes: Open Access Repository |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 1873-5142 |
| ExternalDocumentID | 10.1016/j.patcog.2023.109454 10_1016_j_patcog_2023_109454 S0031320323001541 |
| GroupedDBID | --K --M -D8 -DT -~X .DC .~1 0R~ 123 1B1 1RT 1~. 1~5 29O 4.4 457 4G. 53G 5VS 6I. 7-5 71M 8P~ 9JN AABNK AACTN AAEDT AAEDW AAFTH AAIAV AAIKJ AAKOC AALRI AAOAW AAQFI AAQXK AAXUO AAYFN ABBOA ABEFU ABFNM ABFRF ABHFT ABJNI ABMAC ABTAH ABXDB ABYKQ ACBEA ACDAQ ACGFO ACGFS ACNNM ACRLP ACZNC ADBBV ADEZE ADJOM ADMUD ADMXK ADTZH AEBSH AECPX AEFWE AEKER AENEX AFKWA AFTJW AGHFR AGUBO AGYEJ AHHHB AHJVU AHZHX AIALX AIEXJ AIKHN AITUG AJBFU AJOXV ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ AOUOD ASPBG AVWKF AXJTR AZFZN BJAXD BKOJK BLXMC CS3 DU5 EBS EFJIC EFLBG EJD EO8 EO9 EP2 EP3 F0J F5P FD6 FDB FEDTE FGOYB FIRID FNPLU FYGXN G-Q G8K GBLVA GBOLZ HLZ HVGLF HZ~ H~9 IHE J1W JJJVA KOM KZ1 LG9 LMP LY1 M41 MO0 N9A O-L O9- OAUVE OZT P-8 P-9 P2P PC. Q38 R2- RIG RNS ROL RPZ SBC SDF SDG SDP SDS SES SEW SPC SPCBC SST SSV SSZ T5K TN5 UNMZH VOH WUQ XJE XPP ZMT ZY4 ~G- AATTM AAXKI AAYWO AAYXX ABDPE ABWVN ACLOT ACRPL ACVFH ADCNI ADNMO AEIPS AEUPX AFJKZ AFPUW AGQPQ AIGII AIIUN AKBMS AKRWK AKYEP ANKPU APXCP CITATION EFKBS ~HD ADTOC UNPAY |
| ID | FETCH-LOGICAL-c352t-4dbd5f9250977ed02847fa848d76784f8d8094e027ac300fbf163d7dc0a382893 |
| IEDL.DBID | .~1 |
| ISSN | 0031-3203 1873-5142 |
| IngestDate | Fri Sep 26 05:47:40 EDT 2025 Wed Oct 01 05:11:34 EDT 2025 Thu Apr 24 22:54:40 EDT 2025 Fri Feb 23 02:37:16 EST 2024 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Keywords | K-means Customer segmentation K-medoids K-swaps Clustering sets Similarity of sets Clustering healthcare records Random swap |
| Language | English |
| License | This is an open access article under the CC BY license. cc-by |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c352t-4dbd5f9250977ed02847fa848d76784f8d8094e027ac300fbf163d7dc0a382893 |
| ORCID | 0000-0002-9554-2827 |
| OpenAccessLink | https://www.sciencedirect.com/science/article/pii/S0031320323001541 |
| ParticipantIDs | unpaywall_primary_10_1016_j_patcog_2023_109454 crossref_citationtrail_10_1016_j_patcog_2023_109454 crossref_primary_10_1016_j_patcog_2023_109454 elsevier_sciencedirect_doi_10_1016_j_patcog_2023_109454 |
| ProviderPackageCode | CITATION AAYXX |
| PublicationCentury | 2000 |
| PublicationDate | July 2023 2023-07-00 |
| PublicationDateYYYYMMDD | 2023-07-01 |
| PublicationDate_xml | – month: 07 year: 2023 text: July 2023 |
| PublicationDecade | 2020 |
| PublicationTitle | Pattern recognition |
| PublicationYear | 2023 |
| Publisher | Elsevier Ltd |
| Publisher_xml | – name: Elsevier Ltd |
| References | Kang (bib0008) 2003 Yan, Tang (bib0016) 2019; 7 Balagopalan, Balasubramanian, Balasubramanian, Chandrasekharan, Damodar (bib0010) 2012 Nielsen, Nock, Amari (bib0041) 2014; 16 Park, Jun (bib0005) 2009; 36 Nigro, Cicirelli, Fränti (bib0048) 2023; 124 Magalhães, Ciravegna, Rüger (bib0009) 2008 Hautamäki, Pöllänen, Kinnunen, Lee, Li, Fränti (bib0003) 2014 Jimenez, Gonzalez, Gelbukh (bib0035) 2010 Xia, Peng, Meng, Zhang, Wang, Giem, Chen (bib0049) 2022; 44 2005. Fränti, Sieranoja (bib0030) 2018; 48 L. Kaufman and P.J. Rousseeuw, "Clustering by means of medoids," Huang (bib0023) 1997 Bagirov, Aliguliyev, Sultanova (bib0051) 2023; 135 Melville, Sindhwani (bib0014) 2010; 1 Gali, Mariescu-Istodor, Hostettler, Fränti (bib0037) 2019; 129 Yih, Goodman, Carvalho (bib0012) 2006 Folino, Pizzuti (bib0020) 2010 Guralnik, Karypis (bib0025) 2001 M. Rezaei, "Clustering validation," PhD Thesis, School of Computing, University of Eastern Finland, 2016.https://erepo.uef.fi/bitstream/handle/123456789/16786/urn_isbn_978-952-61-2145-1.pdf?sequence=1 Vol.31, 1987. Nguyen, Huynh (bib0022) 2016 Soheily-Khah, Douzal-Chouakria, Gaussier (bib0026) 2016; 75 MacQueen (bib0021) 1967 Roy, Sharma (bib0024) 2010; 1 Rezaei, Fränti (bib0011) 2014 Fränti, Kivijärvi (bib0042) 2000; 3 Cai, Wang, Jiang (bib0007) 2007 Gottlieb, Stein, Ruppin, Altman, Sharan (bib0019) 2013; 11 Rezaei, Fränti (bib0044) 2016; 28 Fränti, Sieranoja, Wikström, Laatikainen (bib0046) 2022; 10 Jensen, Moseley, Oprea, Ellesøe, Eriksson, Schmock (bib0017) 2014; 5 San, Huynh, Nakamori (bib0039) 2004; 14 Zhao, Rezaei, Chen, Fränti (bib0036) 2012 Mahdavi, Abolhassani (bib0028) 2009; 18 Z. He, X. Xu, S. Deng, and B. Dong, "K-histograms: an efficient clustering algorithm for categorical dataset," arXiv preprint cs/0509033 Saha, Mukherjee (bib0053) 2021; 110 Fränti (bib0031) 2018; 5 Kaukoranta, Fränti, Nevalainen (bib0050) 2000; 9 M. Steinbach, G. Karypis, and V. Kumar, "A comparison of document clustering techniques," 2000.Technical Report; 00-034 Gupta, Ghosh (bib0015) 2001 Rezaei, Fränti (bib0052) 2020; 8 Fränti, Virmajoki, Hautamaki (bib0043) 2006; 28 Ralambondrainy (bib0034) 1995; 16 Zhong (bib0040) 2005 Mussabayev, Mladenovic, Jarboui, Mussabayev (bib0047) 2023; 137 Huang (bib0006) 1998; 2 Sieranoja, Fränti (bib0033) 2022; 64 Kaufman, Rousseeuw (bib0032) 2009; 344 Jeong, Ko, Oh, Han (bib0018) 2017; 7 Chen, Xu, Zhou, Zhao, Liu, Fang (bib0038) 2020; 24 Rezaei, Gali, Fränti (bib0013) 2015 Fränti, Sieranoja (bib0029) 2019; 93 Yang, Wu (bib0001) 2004; 26 Kaufman (10.1016/j.patcog.2023.109454_bib0032) 2009; 344 Xia (10.1016/j.patcog.2023.109454_bib0049) 2022; 44 Gali (10.1016/j.patcog.2023.109454_bib0037) 2019; 129 Roy (10.1016/j.patcog.2023.109454_bib0024) 2010; 1 Yan (10.1016/j.patcog.2023.109454_bib0016) 2019; 7 Nguyen (10.1016/j.patcog.2023.109454_bib0022) 2016 Rezaei (10.1016/j.patcog.2023.109454_bib0052) 2020; 8 Rezaei (10.1016/j.patcog.2023.109454_bib0044) 2016; 28 Kang (10.1016/j.patcog.2023.109454_bib0008) 2003 Fränti (10.1016/j.patcog.2023.109454_bib0043) 2006; 28 Zhao (10.1016/j.patcog.2023.109454_bib0036) 2012 Fränti (10.1016/j.patcog.2023.109454_bib0042) 2000; 3 Ralambondrainy (10.1016/j.patcog.2023.109454_bib0034) 1995; 16 Mussabayev (10.1016/j.patcog.2023.109454_bib0047) 2023; 137 Rezaei (10.1016/j.patcog.2023.109454_bib0011) 2014 10.1016/j.patcog.2023.109454_bib0027 Cai (10.1016/j.patcog.2023.109454_bib0007) 2007 Mahdavi (10.1016/j.patcog.2023.109454_bib0028) 2009; 18 Soheily-Khah (10.1016/j.patcog.2023.109454_bib0026) 2016; 75 Magalhães (10.1016/j.patcog.2023.109454_bib0009) 2008 Saha (10.1016/j.patcog.2023.109454_bib0053) 2021; 110 Yang (10.1016/j.patcog.2023.109454_bib0001) 2004; 26 Sieranoja (10.1016/j.patcog.2023.109454_bib0033) 2022; 64 Zhong (10.1016/j.patcog.2023.109454_bib0040) 2005 Yih (10.1016/j.patcog.2023.109454_bib0012) 2006 Park (10.1016/j.patcog.2023.109454_bib0005) 2009; 36 Fränti (10.1016/j.patcog.2023.109454_bib0030) 2018; 48 Guralnik (10.1016/j.patcog.2023.109454_bib0025) 2001 Jensen (10.1016/j.patcog.2023.109454_bib0017) 2014; 5 Hautamäki (10.1016/j.patcog.2023.109454_bib0003) 2014 MacQueen (10.1016/j.patcog.2023.109454_bib0021) 1967 Fränti (10.1016/j.patcog.2023.109454_bib0031) 2018; 5 Balagopalan (10.1016/j.patcog.2023.109454_bib0010) 2012 San (10.1016/j.patcog.2023.109454_bib0039) 2004; 14 10.1016/j.patcog.2023.109454_bib0002 Gottlieb (10.1016/j.patcog.2023.109454_bib0019) 2013; 11 Nielsen (10.1016/j.patcog.2023.109454_bib0041) 2014; 16 Melville (10.1016/j.patcog.2023.109454_bib0014) 2010; 1 Gupta (10.1016/j.patcog.2023.109454_bib0015) 2001 10.1016/j.patcog.2023.109454_bib0004 Huang (10.1016/j.patcog.2023.109454_bib0023) 1997 Chen (10.1016/j.patcog.2023.109454_bib0038) 2020; 24 Folino (10.1016/j.patcog.2023.109454_bib0020) 2010 Fränti (10.1016/j.patcog.2023.109454_bib0046) 2022; 10 Fränti (10.1016/j.patcog.2023.109454_bib0029) 2019; 93 Nigro (10.1016/j.patcog.2023.109454_bib0048) 2023; 124 Kaukoranta (10.1016/j.patcog.2023.109454_bib0050) 2000; 9 Jeong (10.1016/j.patcog.2023.109454_bib0018) 2017; 7 Bagirov (10.1016/j.patcog.2023.109454_bib0051) 2023; 135 Rezaei (10.1016/j.patcog.2023.109454_bib0013) 2015 10.1016/j.patcog.2023.109454_bib0045 Huang (10.1016/j.patcog.2023.109454_bib0006) 1998; 2 Jimenez (10.1016/j.patcog.2023.109454_bib0035) 2010 |
| References_xml | – reference: Z. He, X. Xu, S. Deng, and B. Dong, "K-histograms: an efficient clustering algorithm for categorical dataset," arXiv preprint cs/0509033 – volume: 14 start-page: 241 year: 2004 end-page: 247 ident: bib0039 article-title: An alternative extension of the k-means algorithm for clustering categorical data publication-title: Int. J. Appl. Math. Comput. Sci. – volume: 135 year: 2023 ident: bib0051 article-title: Finding compact and well-separated clusters: clustering using silhouette coefficients publication-title: Pattern Recognit. – start-page: 1 year: 2012 end-page: 10 ident: bib0010 article-title: Automatic keyphrase extraction and segmentation of video lectures publication-title: Proceedings of the IEEE International Conference on Technology Enhanced Education (ICTEE) – volume: 36 start-page: 3336 year: 2009 end-page: 3341 ident: bib0005 article-title: A simple and fast algorithm for K-medoids clustering publication-title: Expert Syst. Appl. – start-page: 79 year: 2015 end-page: 84 ident: bib0013 article-title: ClRank: a method for keyword extraction from web pages using clustering and distribution of nouns publication-title: Proceedings of the Web Intelligence and Intelligent Agent Technology (WI-IAT), IEEE/WIC/ACM – volume: 93 start-page: 95 year: 2019 end-page: 112 ident: bib0029 article-title: How much can k-means be improved by using better initialization and repeats? publication-title: Pattern Recognit. – volume: 1 start-page: 23 year: 2010 end-page: 28 ident: bib0024 article-title: Genetic k-means clustering algorithm for mixed numeric and categorical data sets publication-title: Int. J. Artif. Intell. Appl. – volume: 26 start-page: 434 year: 2004 end-page: 448 ident: bib0001 article-title: A similarity-based robust clustering method publication-title: IEEE Trans. Pattern Anal. Mach. Intell. – volume: 1 start-page: 829 year: 2010 end-page: 838 ident: bib0014 article-title: Recommender systems, Encyclopedia of Machine Learning and Data Mining publication-title: Encyclopedia of machine learning – volume: 24 start-page: 3 year: 2020 end-page: 25 ident: bib0038 article-title: S 2 R-tree: a pivot-based indexing structure for semantic-aware spatial keyword search publication-title: Geoinformatica – start-page: 6 year: 2001 end-page: 15 ident: bib0015 article-title: Value-balanced agglomerative connectivity clustering publication-title: Proceedings of the Data Mining and Knowledge Discovery: Theory, Tools, and Technology III – volume: 7 start-page: 118690 year: 2019 end-page: 118701 ident: bib0016 article-title: Collaborative filtering based on gaussian mixture model and improved jaccard similarity publication-title: IEEE Access – volume: 75 start-page: 63 year: 2016 end-page: 69 ident: bib0026 article-title: Generalized k-means-based clustering for temporal data under weighted and kernel time warp publication-title: Pattern Recognit. Lett. – volume: 129 start-page: 169 year: 2019 end-page: 185 ident: bib0037 article-title: Framework for syntactic string similarity measures publication-title: Expert Syst. Appl. – reference: L. Kaufman and P.J. Rousseeuw, "Clustering by means of medoids," – volume: 7 start-page: 15561 year: 2017 ident: bib0018 article-title: Network-based analysis of diagnosis progression patterns using claims data publication-title: Sci. Rep. – volume: 48 start-page: 4743 year: 2018 end-page: 4759 ident: bib0030 article-title: K-means properties on six clustering benchmark datasets publication-title: Appl. Intell. – volume: 2 start-page: 283 year: 1998 end-page: 304 ident: bib0006 article-title: Extensions to the k-means algorithm for clustering large data sets with categorical values publication-title: Data Min. Knowl. Discov. – start-page: 297 year: 2010 end-page: 302 ident: bib0035 article-title: Text comparison using soft cardinality publication-title: Proceedings of the International Symposium on String Processing and Information Retrieval – volume: 44 start-page: 87 year: 2022 end-page: 99 ident: bib0049 article-title: Ball k-means: fast adaptive clustering with no bounds publication-title: IEEE Trans. Pattern Anal. Mach. Intell. – volume: 5 start-page: 4022 year: 2014 ident: bib0017 article-title: Temporal disease trajectories condensed from population-wide registry data covering 6.2 million patients publication-title: Nat. Commun. – volume: 8 start-page: 89239 year: 2020 end-page: 89257 ident: bib0052 article-title: Can the number of clusters be determined by external indices? publication-title: IEEE Access – year: 2016 ident: bib0022 publication-title: A k-means-like algorithm for clustering categorical data using an information theoretic-based dissimilarity measure – start-page: 53 year: 2014 end-page: 62 ident: bib0003 publication-title: A comparison of categorical attribute data clustering methods," in Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR) – volume: 28 start-page: 2173 year: 2016 end-page: 2186 ident: bib0044 article-title: Set matching measures for external cluster validity publication-title: IEEE Trans. Knowl. Data Eng. – volume: 28 start-page: 1875 year: 2006 end-page: 1881 ident: bib0043 article-title: Fast agglomerative clustering using a k-nearest neighbor graph publication-title: IEEE Trans. Pattern Anal. Mach. Intell. – start-page: 436 year: 2007 end-page: 443 ident: bib0007 article-title: K-distributions: a new algorithm for clustering categorical data publication-title: Proceedings of the International Conference on Intelligent Computing – volume: 3 start-page: 358 year: 2000 end-page: 369 ident: bib0042 article-title: Randomised local search algorithm for the clustering problem publication-title: Pattern Anal. Appl. – start-page: 179 year: 2001 end-page: 186 ident: bib0025 article-title: A scalable algorithm for clustering sequential data – volume: 9 start-page: 1337 year: 2000 end-page: 1342 ident: bib0050 article-title: A fast exact GLA based on code vector activity detection publication-title: IEEE Trans. Image Process. – volume: 110 year: 2021 ident: bib0053 article-title: CNAK: cluster number assisted K-means publication-title: Pattern Recognit. – start-page: 3180 year: 2005 end-page: 3185 ident: bib0040 article-title: Efficient online spherical k-means clustering publication-title: Proceedings of the IEEE International Joint Conference on Neural Networks – start-page: 6 year: 2010 end-page: 12 ident: bib0020 article-title: A comorbidity-based recommendation engine for disease prediction publication-title: Proceedings of the IEEE 23rd International Symposium on Computer-Based Medical Systems (CBMS) – volume: 124 year: 2023 ident: bib0048 article-title: Parallel random swap: an efficient and reliable clustering algorithm in Java publication-title: Simul. Model. Pract. Theory – start-page: 2845 year: 2012 end-page: 2848 ident: bib0036 article-title: Keyword clustering for automatic categorization publication-title: Proceedings of the Pattern Recognition (ICPR), 21st International Conference on Pattern Recognition – reference: 2005. – volume: 16 start-page: 3273 year: 2014 end-page: 3301 ident: bib0041 article-title: On clustering histograms with k-means by using mixed α-divergences publication-title: Entropy – volume: 10 start-page: e35422 year: 2022 ident: bib0046 article-title: Clustering diagnoses from 58M patient visits in Finland between 2015 and 2018 publication-title: JMIR Med. Inform. – volume: 137 year: 2023 ident: bib0047 article-title: How to use K-means for big data clustering? publication-title: Pattern Recognit. – volume: 11 start-page: 194 year: 2013 ident: bib0019 article-title: A method for inferring medical diagnoses from patient similarities publication-title: BMC Med. – reference: M. Steinbach, G. Karypis, and V. Kumar, "A comparison of document clustering techniques," 2000.Technical Report; 00-034; – reference: M. Rezaei, "Clustering validation," PhD Thesis, School of Computing, University of Eastern Finland, 2016.https://erepo.uef.fi/bitstream/handle/123456789/16786/urn_isbn_978-952-61-2145-1.pdf?sequence=1 – volume: 18 start-page: 370 year: 2009 end-page: 391 ident: bib0028 article-title: Harmony K-means algorithm for document clustering publication-title: Data Min. Knowl. Discov. – volume: 64 start-page: 115 year: 2022 end-page: 142 ident: bib0033 article-title: Adapting k-means for graph clustering publication-title: Knowl. Inf. Syst. – reference: Vol.31, 1987. – start-page: 193 year: 2014 end-page: 202 ident: bib0011 article-title: Matching similarity for keyword-based clustering publication-title: Proceedings of the Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR) – volume: 5 start-page: 13 year: 2018 ident: bib0031 article-title: Efficiency of random swap clustering publication-title: J. Big Data – volume: 16 start-page: 1147 year: 1995 end-page: 1157 ident: bib0034 article-title: A conceptual version of the K-means algorithm publication-title: Pattern Recognit. Lett. – start-page: 21 year: 1997 end-page: 34 ident: bib0023 article-title: Clustering large data sets with mixed numeric and categorical values publication-title: Proceedings of the 1st Pacific-Asia Conference on Knowledge Discovery and Data Mining,(PAKDD) – start-page: 281 year: 1967 end-page: 297 ident: bib0021 article-title: Some methods for classification and analysis of multivariate observations publication-title: Proceedings of the fifth Berkeley Symposium on Mathematical Statistics and Probability – volume: 344 year: 2009 ident: bib0032 publication-title: Finding Groups in Data: An Introduction to Cluster Analysis – start-page: 101 year: 2008 end-page: 110 ident: bib0009 article-title: Exploring multimedia in a keyword space publication-title: Proceedings of the 16th ACM International Conference on Multimedia – start-page: 213 year: 2006 end-page: 222 ident: bib0012 article-title: Finding advertising keywords on web pages publication-title: Proceedings of the 15th International Conference on World Wide Web – start-page: 132 year: 2003 end-page: 137 ident: bib0008 article-title: Keyword-based document clustering publication-title: Proceedings of the Sixth International Workshop on Information Retrieval with Asian Languages-Volume 11 – ident: 10.1016/j.patcog.2023.109454_bib0004 – volume: 16 start-page: 1147 year: 1995 ident: 10.1016/j.patcog.2023.109454_bib0034 article-title: A conceptual version of the K-means algorithm publication-title: Pattern Recognit. Lett. doi: 10.1016/0167-8655(95)00075-R – ident: 10.1016/j.patcog.2023.109454_bib0027 – start-page: 2845 year: 2012 ident: 10.1016/j.patcog.2023.109454_bib0036 article-title: Keyword clustering for automatic categorization – volume: 16 start-page: 3273 year: 2014 ident: 10.1016/j.patcog.2023.109454_bib0041 article-title: On clustering histograms with k-means by using mixed α-divergences publication-title: Entropy doi: 10.3390/e16063273 – start-page: 281 year: 1967 ident: 10.1016/j.patcog.2023.109454_bib0021 article-title: Some methods for classification and analysis of multivariate observations – start-page: 21 year: 1997 ident: 10.1016/j.patcog.2023.109454_bib0023 article-title: Clustering large data sets with mixed numeric and categorical values – volume: 7 start-page: 15561 year: 2017 ident: 10.1016/j.patcog.2023.109454_bib0018 article-title: Network-based analysis of diagnosis progression patterns using claims data publication-title: Sci. Rep. doi: 10.1038/s41598-017-15647-4 – start-page: 6 year: 2010 ident: 10.1016/j.patcog.2023.109454_bib0020 article-title: A comorbidity-based recommendation engine for disease prediction – start-page: 101 year: 2008 ident: 10.1016/j.patcog.2023.109454_bib0009 article-title: Exploring multimedia in a keyword space – volume: 135 year: 2023 ident: 10.1016/j.patcog.2023.109454_bib0051 article-title: Finding compact and well-separated clusters: clustering using silhouette coefficients publication-title: Pattern Recognit. doi: 10.1016/j.patcog.2022.109144 – volume: 1 start-page: 829 year: 2010 ident: 10.1016/j.patcog.2023.109454_bib0014 article-title: Recommender systems, Encyclopedia of Machine Learning and Data Mining publication-title: Encyclopedia of machine learning – volume: 64 start-page: 115 year: 2022 ident: 10.1016/j.patcog.2023.109454_bib0033 article-title: Adapting k-means for graph clustering publication-title: Knowl. Inf. Syst. doi: 10.1007/s10115-021-01623-y – volume: 129 start-page: 169 year: 2019 ident: 10.1016/j.patcog.2023.109454_bib0037 article-title: Framework for syntactic string similarity measures publication-title: Expert Syst. Appl. doi: 10.1016/j.eswa.2019.03.048 – volume: 26 start-page: 434 year: 2004 ident: 10.1016/j.patcog.2023.109454_bib0001 article-title: A similarity-based robust clustering method publication-title: IEEE Trans. Pattern Anal. Mach. Intell. doi: 10.1109/TPAMI.2004.1265860 – volume: 124 year: 2023 ident: 10.1016/j.patcog.2023.109454_bib0048 article-title: Parallel random swap: an efficient and reliable clustering algorithm in Java publication-title: Simul. Model. Pract. Theory doi: 10.1016/j.simpat.2022.102712 – volume: 137 year: 2023 ident: 10.1016/j.patcog.2023.109454_bib0047 article-title: How to use K-means for big data clustering? publication-title: Pattern Recognit. doi: 10.1016/j.patcog.2022.109269 – volume: 24 start-page: 3 year: 2020 ident: 10.1016/j.patcog.2023.109454_bib0038 article-title: S 2 R-tree: a pivot-based indexing structure for semantic-aware spatial keyword search publication-title: Geoinformatica doi: 10.1007/s10707-019-00372-z – volume: 1 start-page: 23 year: 2010 ident: 10.1016/j.patcog.2023.109454_bib0024 article-title: Genetic k-means clustering algorithm for mixed numeric and categorical data sets publication-title: Int. J. Artif. Intell. Appl. – start-page: 3180 year: 2005 ident: 10.1016/j.patcog.2023.109454_bib0040 article-title: Efficient online spherical k-means clustering – ident: 10.1016/j.patcog.2023.109454_bib0045 – start-page: 213 year: 2006 ident: 10.1016/j.patcog.2023.109454_bib0012 article-title: Finding advertising keywords on web pages – start-page: 79 year: 2015 ident: 10.1016/j.patcog.2023.109454_bib0013 article-title: ClRank: a method for keyword extraction from web pages using clustering and distribution of nouns – volume: 344 year: 2009 ident: 10.1016/j.patcog.2023.109454_bib0032 – volume: 93 start-page: 95 year: 2019 ident: 10.1016/j.patcog.2023.109454_bib0029 article-title: How much can k-means be improved by using better initialization and repeats? publication-title: Pattern Recognit. doi: 10.1016/j.patcog.2019.04.014 – volume: 2 start-page: 283 year: 1998 ident: 10.1016/j.patcog.2023.109454_bib0006 article-title: Extensions to the k-means algorithm for clustering large data sets with categorical values publication-title: Data Min. Knowl. Discov. doi: 10.1023/A:1009769707641 – volume: 28 start-page: 2173 year: 2016 ident: 10.1016/j.patcog.2023.109454_bib0044 article-title: Set matching measures for external cluster validity publication-title: IEEE Trans. Knowl. Data Eng. doi: 10.1109/TKDE.2016.2551240 – volume: 9 start-page: 1337 issue: 8 year: 2000 ident: 10.1016/j.patcog.2023.109454_bib0050 article-title: A fast exact GLA based on code vector activity detection publication-title: IEEE Trans. Image Process. doi: 10.1109/83.855429 – volume: 8 start-page: 89239 year: 2020 ident: 10.1016/j.patcog.2023.109454_bib0052 article-title: Can the number of clusters be determined by external indices? publication-title: IEEE Access doi: 10.1109/ACCESS.2020.2993295 – start-page: 53 year: 2014 ident: 10.1016/j.patcog.2023.109454_bib0003 – start-page: 6 year: 2001 ident: 10.1016/j.patcog.2023.109454_bib0015 article-title: Value-balanced agglomerative connectivity clustering – volume: 28 start-page: 1875 year: 2006 ident: 10.1016/j.patcog.2023.109454_bib0043 article-title: Fast agglomerative clustering using a k-nearest neighbor graph publication-title: IEEE Trans. Pattern Anal. Mach. Intell. doi: 10.1109/TPAMI.2006.227 – ident: 10.1016/j.patcog.2023.109454_bib0002 – start-page: 436 year: 2007 ident: 10.1016/j.patcog.2023.109454_bib0007 article-title: K-distributions: a new algorithm for clustering categorical data – year: 2016 ident: 10.1016/j.patcog.2023.109454_bib0022 – volume: 18 start-page: 370 year: 2009 ident: 10.1016/j.patcog.2023.109454_bib0028 article-title: Harmony K-means algorithm for document clustering publication-title: Data Min. Knowl. Discov. doi: 10.1007/s10618-008-0123-0 – start-page: 193 year: 2014 ident: 10.1016/j.patcog.2023.109454_bib0011 article-title: Matching similarity for keyword-based clustering – start-page: 179 year: 2001 ident: 10.1016/j.patcog.2023.109454_bib0025 article-title: A scalable algorithm for clustering sequential data – volume: 14 start-page: 241 year: 2004 ident: 10.1016/j.patcog.2023.109454_bib0039 article-title: An alternative extension of the k-means algorithm for clustering categorical data publication-title: Int. J. Appl. Math. Comput. Sci. – volume: 36 start-page: 3336 year: 2009 ident: 10.1016/j.patcog.2023.109454_bib0005 article-title: A simple and fast algorithm for K-medoids clustering publication-title: Expert Syst. Appl. doi: 10.1016/j.eswa.2008.01.039 – volume: 3 start-page: 358 year: 2000 ident: 10.1016/j.patcog.2023.109454_bib0042 article-title: Randomised local search algorithm for the clustering problem publication-title: Pattern Anal. Appl. doi: 10.1007/s100440070007 – volume: 10 start-page: e35422 issue: 5 year: 2022 ident: 10.1016/j.patcog.2023.109454_bib0046 article-title: Clustering diagnoses from 58M patient visits in Finland between 2015 and 2018 publication-title: JMIR Med. Inform. doi: 10.2196/35422 – volume: 11 start-page: 194 year: 2013 ident: 10.1016/j.patcog.2023.109454_bib0019 article-title: A method for inferring medical diagnoses from patient similarities publication-title: BMC Med. doi: 10.1186/1741-7015-11-194 – volume: 7 start-page: 118690 year: 2019 ident: 10.1016/j.patcog.2023.109454_bib0016 article-title: Collaborative filtering based on gaussian mixture model and improved jaccard similarity publication-title: IEEE Access doi: 10.1109/ACCESS.2019.2936630 – volume: 44 start-page: 87 issue: 01 year: 2022 ident: 10.1016/j.patcog.2023.109454_bib0049 article-title: Ball k-means: fast adaptive clustering with no bounds publication-title: IEEE Trans. Pattern Anal. Mach. Intell. – volume: 5 start-page: 4022 year: 2014 ident: 10.1016/j.patcog.2023.109454_bib0017 article-title: Temporal disease trajectories condensed from population-wide registry data covering 6.2 million patients publication-title: Nat. Commun. doi: 10.1038/ncomms5022 – start-page: 132 year: 2003 ident: 10.1016/j.patcog.2023.109454_bib0008 article-title: Keyword-based document clustering – start-page: 1 year: 2012 ident: 10.1016/j.patcog.2023.109454_bib0010 article-title: Automatic keyphrase extraction and segmentation of video lectures – volume: 110 year: 2021 ident: 10.1016/j.patcog.2023.109454_bib0053 article-title: CNAK: cluster number assisted K-means publication-title: Pattern Recognit. doi: 10.1016/j.patcog.2020.107625 – volume: 75 start-page: 63 year: 2016 ident: 10.1016/j.patcog.2023.109454_bib0026 article-title: Generalized k-means-based clustering for temporal data under weighted and kernel time warp publication-title: Pattern Recognit. Lett. doi: 10.1016/j.patrec.2016.03.007 – volume: 48 start-page: 4743 issue: 12 year: 2018 ident: 10.1016/j.patcog.2023.109454_bib0030 article-title: K-means properties on six clustering benchmark datasets publication-title: Appl. Intell. doi: 10.1007/s10489-018-1238-7 – volume: 5 start-page: 13 year: 2018 ident: 10.1016/j.patcog.2023.109454_bib0031 article-title: Efficiency of random swap clustering publication-title: J. Big Data doi: 10.1186/s40537-018-0122-y – start-page: 297 year: 2010 ident: 10.1016/j.patcog.2023.109454_bib0035 article-title: Text comparison using soft cardinality |
| SSID | ssj0017142 |
| Score | 2.4667544 |
| Snippet | •Novel K-sets algorithm that generalizes k-means to work for sets data.•Novel K-swaps algorithm to avoid local minima.•Benchmark for evaluating clustering of... |
| SourceID | unpaywall crossref elsevier |
| SourceType | Open Access Repository Enrichment Source Index Database Publisher |
| StartPage | 109454 |
| SubjectTerms | Clustering healthcare records Clustering sets Customer segmentation K-means K-medoids K-swaps Random swap Similarity of sets |
| SummonAdditionalLinks | – databaseName: Unpaywall dbid: UNPAY link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PT8IwFG4QDp7EnxGjpgePdunYxrojMRKi0XiQBE9Lu7aozLGwLUT_el_ZRtSEgLcd-truvW7f1_T1ewhdUU6FG_iKUKY84koaEO7bmjiBAHxjka_FUu3zsTccuXdjb9xA1_VdmF_n98s8rBR-SbOJZcp8G-0j13N3UKvnAfNuotbo8an_Uiov2sTpLgsh28x3CPCAbn1Tbk0365Bot0hS_rngcfwDaQZt9FDPsUwwmVpFLqzo649847YvsY_2KsqJ--UaOUANlRyidl3OAVdf9xGy70mm8gzzROIpyRY8hed4Mpu_5a8fGQZ2i6O4MMIKAHfYND1Go8Ht882QVBUVSAREK4dYCOnpAGgP0D4lqcEmzZnLpA-g5WomGUxOwVaVRw6lWmiga9KXEeWO2Zo5J6iZzBJ1ijD42px8c1toDpRMM2FHPdkNhBIUBrE7yKm9G0aV3LipehGHdV7Ze1h6JTReCUuvdBBZWaWl3MaG9n4duLCiDCUVCCEGGyytVZy3GursvwbnqJnPC3UBpCUXl9Va_QbL4eYx priority: 102 providerName: Unpaywall |
| Title | K-sets and k-swaps algorithms for clustering sets |
| URI | https://dx.doi.org/10.1016/j.patcog.2023.109454 https://doi.org/10.1016/j.patcog.2023.109454 |
| UnpaywallVersion | publishedVersion |
| Volume | 139 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVESC databaseName: Baden-Württemberg Complete Freedom Collection (Elsevier) customDbUrl: eissn: 1873-5142 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0017142 issn: 0031-3203 databaseCode: GBLVA dateStart: 20110101 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier – providerCode: PRVESC databaseName: Elsevier SD Complete Freedom Collection [SCCMFC] customDbUrl: eissn: 1873-5142 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0017142 issn: 0031-3203 databaseCode: ACRLP dateStart: 19950101 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier – providerCode: PRVESC databaseName: Elsevier SD Freedom Collection Journals [SCFCJ] customDbUrl: eissn: 1873-5142 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0017142 issn: 0031-3203 databaseCode: AIKHN dateStart: 19950101 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier – providerCode: PRVESC databaseName: Science Direct customDbUrl: eissn: 1873-5142 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0017142 issn: 0031-3203 databaseCode: .~1 dateStart: 19950101 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier – providerCode: PRVLSH databaseName: Elsevier Journals customDbUrl: mediaType: online eissn: 1873-5142 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0017142 issn: 0031-3203 databaseCode: AKRWK dateStart: 19680101 isFulltext: true providerName: Library Specific Holdings |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV07T8MwELaqMsDCG1EelQdWtw5JmmSsKqpCpYqBSmWK7NguhZBGbaqKhd_OXR4VDKiIKQ_ZsvM5ufus3H1HyA0XXDqBpxn3tcscxQMmPMswO5Dg3_zIMzJX-xx1BmPnYeJOaqRX5cJgWGVp-wubnlvr8k67RLOdzmaY44uyg9wGEo1EIM9gdzysYtD63IR5YH3vQjHcthi2rtLn8hivFMzdfNrCEuKoq-S4zm_uaXeVpOJjLeL4m_vpH5L9kjfSbjG1I1LTyTE5qGoy0PITPSHWkC11tqQiUfSNLdcihfN4Ol_Mspf3JQWKSqN4heoI4LMoNj0l4_7dU2_AyrIILAK2lAGgUrkmAO4C3E0rjg7GCN_xlQeexzG-8uFRNOw3RQTQGGmAcylPRVzYuL-yz0g9mSf6nFDAA39fC0saAbzK-NKKOuo2kFpyGMRqELtCI4xKzXAsXRGHVXDYa1hgGCKGYYFhg7BNr7TQzNjS3quADn-sfQhmfUvP1mZd_jTUxb-HuiR7eFVE6l6RerZY6WvgI5ls5i9ck-x074eDERzHo8fu8xedoN65 |
| linkProvider | Elsevier |
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LT8JAEN4QPODFtxGfPXhdaOmWtkdDJCjICRJuzW53F9FaGlpCvPjbnemD6MFovDXtbLb9tp35Jp39hpBbk5uC-a6ipqccyqTpU-5amtq-gPjmha4WudrnuDuYsseZM6uRXrUXBssqS99f-PTcW5dn2iWa7WSxwD2-KDto2kCikQhACrTDnI6LGVjrY1vngQ2-C8lw26JoXu2fy4u8EvB3y3kLe4ijsBJz2E_xqbGOE_6-4VH0Jf70D8heSRyNu-LeDklNxUdkv2rKYJTf6DGxhjRVWWrwWBqvNN3wBI6j-XK1yJ7fUgM4qhFGa5RHgKBloOkJmfbvJ70BLfsi0BDoUgaICuloH8gLkDclTYwwmnvMky6EHqY96cGjKEg4eQjYaKGBdElXhia3McGyT0k9XsbqjBiAB_6_5pbQHIiV9oQVdmXHF0qYMInVJHaFRhCWouHYuyIKquqwl6DAMEAMgwLDJqHbUUkhmvGLvVsBHXxb_AD8-i8jW9t1-dNU5_-e6oY0BpOnUTB6GA8vyC5eKcp2L0k9W63VFZCTTFznL98n7Bjeng |
| linkToUnpaywall | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PT8IwFG4QDp7EnxGjpgePdunYxrojMRKi0XiQBE9Lu7aozLGwLUT_el_ZRtSEgLcd-truvW7f1_T1ewhdUU6FG_iKUKY84koaEO7bmjiBAHxjka_FUu3zsTccuXdjb9xA1_VdmF_n98s8rBR-SbOJZcp8G-0j13N3UKvnAfNuotbo8an_Uiov2sTpLgsh28x3CPCAbn1Tbk0365Bot0hS_rngcfwDaQZt9FDPsUwwmVpFLqzo649847YvsY_2KsqJ--UaOUANlRyidl3OAVdf9xGy70mm8gzzROIpyRY8hed4Mpu_5a8fGQZ2i6O4MMIKAHfYND1Go8Ht882QVBUVSAREK4dYCOnpAGgP0D4lqcEmzZnLpA-g5WomGUxOwVaVRw6lWmiga9KXEeWO2Zo5J6iZzBJ1ijD42px8c1toDpRMM2FHPdkNhBIUBrE7yKm9G0aV3LipehGHdV7Ze1h6JTReCUuvdBBZWaWl3MaG9n4duLCiDCUVCCEGGyytVZy3GursvwbnqJnPC3UBpCUXl9Va_QbL4eYx |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=K-sets+and+k-swaps+algorithms+for+clustering+sets&rft.jtitle=Pattern+recognition&rft.au=Rezaei%2C+Mohammad&rft.au=Fr%C3%A4nti%2C+Pasi&rft.date=2023-07-01&rft.pub=Elsevier+Ltd&rft.issn=0031-3203&rft.eissn=1873-5142&rft.volume=139&rft_id=info:doi/10.1016%2Fj.patcog.2023.109454&rft.externalDocID=S0031320323001541 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0031-3203&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0031-3203&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0031-3203&client=summon |