An efficient k-means clustering algorithms: Analysis and implementation
In k-means clustering, we are given a set of n data points in d-dimensional space R super(d) and an integer k and the problem is to determine a set of k points in R super(d), called centers, so as to minimize the mean squared distance from each data point to its nearest center. A popular heuristic f...
Saved in:
| Published in | IEEE transactions on pattern analysis and machine intelligence Vol. 24; no. 7; pp. 881 - 892 |
|---|---|
| Main Authors | , , , , , |
| Format | Journal Article |
| Language | English |
| Published |
01.07.2002
|
| Online Access | Get full text |
| ISSN | 0162-8828 |
| DOI | 10.1109/TPAMI.2002.1017616 |
Cover
| Abstract | In k-means clustering, we are given a set of n data points in d-dimensional space R super(d) and an integer k and the problem is to determine a set of k points in R super(d), called centers, so as to minimize the mean squared distance from each data point to its nearest center. A popular heuristic for k-means clustering is Lloyd's algorithm. In this paper, we present a simple and efficient implementation of Lloyd's k-means clustering algorithm, which we call the filtering algorithm. This algorithm is easy to implement, requiring a kd-tree as the only major data structure. We establish the practical efficiency of the filtering algorithm in two ways. First, we present a data-sensitive analysis of the algorithm's running time, which allows that the algorithm runs faster as the separation between clusters increases. Second, we present a number of empirical studies both on synthetically generated data and on real data sets from applications in color quantization, data compression, and image segmentation. |
|---|---|
| AbstractList | In k-means clustering, we are given a set of n data points in d-dimensional space R super(d) and an integer k and the problem is to determine a set of k points in R super(d), called centers, so as to minimize the mean squared distance from each data point to its nearest center. A popular heuristic for k-means clustering is Lloyd's algorithm. In this paper, we present a simple and efficient implementation of Lloyd's k-means clustering algorithm, which we call the filtering algorithm. This algorithm is easy to implement, requiring a kd-tree as the only major data structure. We establish the practical efficiency of the filtering algorithm in two ways. First, we present a data-sensitive analysis of the algorithm's running time, which allows that the algorithm runs faster as the separation between clusters increases. Second, we present a number of empirical studies both on synthetically generated data and on real data sets from applications in color quantization, data compression, and image segmentation. |
| Author | Piatko, Christine D Wu, Angela Y Mount, David M Netanyahu, Nathan S Silverman, Ruth Kanungo, Tapas |
| Author_xml | – sequence: 1 givenname: Tapas surname: Kanungo fullname: Kanungo, Tapas – sequence: 2 givenname: David surname: Mount middlename: M fullname: Mount, David M – sequence: 3 givenname: Nathan surname: Netanyahu middlename: S fullname: Netanyahu, Nathan S – sequence: 4 givenname: Christine surname: Piatko middlename: D fullname: Piatko, Christine D – sequence: 5 givenname: Ruth surname: Silverman fullname: Silverman, Ruth – sequence: 6 givenname: Angela surname: Wu middlename: Y fullname: Wu, Angela Y |
| BookMark | eNotjrFOwzAUAD0UibbwA0ye2BKeHSeO2aIKSqUiGLJXL45dDI4TYmfg76kE0013ug1ZhTEYQu4Y5IyBemjfm9dDzgF4zoDJilUrsgZW8ayueX1NNjF-AjBRQrEm-yZQY63TzoREv7LBYIhU-yUmM7twpujP4-zSxxAfaRPQ_0QXKYaeumHyZrhYmNwYbsiVRR_N7T-3pH1-ancv2fFtf9g1x2xSKmUFgBYoRQ-1spcZI6SSBgHQItdVKaCvC12ALpWuGYIVQnRWAq-45th1xZbc_2WnefxeTEynwUVtvMdgxiWeuCylgkoUv7TOT-o |
| ContentType | Journal Article |
| DBID | 7SC 8FD JQ2 L7M L~C L~D |
| DOI | 10.1109/TPAMI.2002.1017616 |
| DatabaseName | Computer and Information Systems Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional |
| DatabaseTitle | Computer and Information Systems Abstracts Technology Research Database Computer and Information Systems Abstracts – Academic Advanced Technologies Database with Aerospace ProQuest Computer Science Collection Computer and Information Systems Abstracts Professional |
| DatabaseTitleList | Computer and Information Systems Abstracts |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering Computer Science |
| EndPage | 892 |
| GroupedDBID | --- -DZ -~X .DC 0R~ 29I 4.4 53G 5GY 6IK 7SC 8FD 9M8 AAJGR AAWTH ABAZT ABQJQ ABVLG ACGFO ACGFS ACIWK ACNCT ADRHT AENEX AETEA AGQYO AHBIQ AIBXA AKQYR ALMA_UNASSIGNED_HOLDINGS ASUFR BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 E.L EBS EJD F5P HZ~ H~9 IEDLZ IFIPE IPLJI JAVBF JQ2 L7M LAI L~C L~D M43 MS~ O9- OCL P2P PQQKQ RIA RIE RNI RNS RXW RZB TAE TN5 UHB XJT ~02 |
| ID | FETCH-LOGICAL-p99t-300c4a74d089f828e4797ea00afa2c6540d83c30c59c81a0f444bf70262c2abb3 |
| ISSN | 0162-8828 |
| IngestDate | Thu Oct 02 05:08:05 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 7 |
| Language | English |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-p99t-300c4a74d089f828e4797ea00afa2c6540d83c30c59c81a0f444bf70262c2abb3 |
| Notes | ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23 |
| PQID | 27579064 |
| PQPubID | 23500 |
| PageCount | 12 |
| ParticipantIDs | proquest_miscellaneous_27579064 |
| PublicationCentury | 2000 |
| PublicationDate | 20020701 |
| PublicationDateYYYYMMDD | 2002-07-01 |
| PublicationDate_xml | – month: 07 year: 2002 text: 20020701 day: 01 |
| PublicationDecade | 2000 |
| PublicationTitle | IEEE transactions on pattern analysis and machine intelligence |
| PublicationYear | 2002 |
| SSID | ssj0014503 |
| Score | 2.3977823 |
| Snippet | In k-means clustering, we are given a set of n data points in d-dimensional space R super(d) and an integer k and the problem is to determine a set of k points... |
| SourceID | proquest |
| SourceType | Aggregation Database |
| StartPage | 881 |
| Title | An efficient k-means clustering algorithms: Analysis and implementation |
| URI | https://www.proquest.com/docview/27579064 |
| Volume | 24 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVIEE databaseName: IEEE/IET Electronic Library issn: 0162-8828 databaseCode: RIE dateStart: 19790101 customDbUrl: isFulltext: true dateEnd: 99991231 titleUrlDefault: https://ieeexplore.ieee.org/ omitProxy: false ssIdentifier: ssj0014503 providerName: IEEE |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1Lb9QwELaW9gIHoAVEgbY-IC6VwZt17JjbqvQB6i49pNLeVrbj0KrdZKGbA_yB_u2OHSfZaitRuERRFOU1X2bG8_gGofci6Sujc0FibRlhkmuipZGkrxUDb0PEPHMNzqMxPz5j3ybxpNe7Wapaqhb6o_lzb1_J_0gVjoFcXZfsP0i2vSgcgH2QL2xBwrB9kIyHhavHuPA9jXuXZGbB7uyZq8qRH_jmw6sfJSz-z2fXdfwv8I_4fMGsKRxvJRNcVLf8c5MjmjHiPp8w9zScrnB56RozX4hpPeVEQ-vZKnBVgBrxcdgU7HHruY_KwHTgi-m7WOzYgpf6W51Xtcp3Af0uKnsK-LksOy4Ed9Mvd-IVXW1rG8LkoIOT0BIedHDdRx2wJpYUalIPdGlscz03b1Xte9bU9HQ4-uqrTjx7FO_fw7E9_j49PDs5maYHk_TD_Cdx48dcmj7MYnmE1iMwD7RuBGzTUSz2I7bbR2-6r6j8tHrTFavuXZX0OXoa1hh4WANmA_VssYmeNfM7cFDnm-jJEhnlC3Q0LHCLJhzQhDs04Q5Nn3GDJQw4wHex9BKlhwfp_jEJYzbIXMoFGVBqmBIso4nM4d0sE1JYRanKVWQ4ePRZMjADamJp4MemOWMMfm5Yu0cmUloPXqG1oizsa4QjmTEpIpYxzZmOuMykjjORK5UJqbjeQrvNl5mCFnOpKVXYsrqeRiIWErzjN3894y163KHqHVpb_KrsNviFC73jZXYLFCBpkQ |
| linkProvider | IEEE |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=An+efficient+k-means+clustering+algorithms%3A+Analysis+and+implementation&rft.jtitle=IEEE+transactions+on+pattern+analysis+and+machine+intelligence&rft.au=Kanungo%2C+Tapas&rft.au=Mount%2C+David+M&rft.au=Netanyahu%2C+Nathan+S&rft.au=Piatko%2C+Christine+D&rft.date=2002-07-01&rft.issn=0162-8828&rft.volume=24&rft.issue=7&rft.spage=881&rft.epage=892&rft_id=info:doi/10.1109%2FTPAMI.2002.1017616&rft.externalDBID=NO_FULL_TEXT |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0162-8828&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0162-8828&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0162-8828&client=summon |