An efficient k-means clustering algorithms: Analysis and implementation

In k-means clustering, we are given a set of n data points in d-dimensional space R super(d) and an integer k and the problem is to determine a set of k points in R super(d), called centers, so as to minimize the mean squared distance from each data point to its nearest center. A popular heuristic f...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on pattern analysis and machine intelligence Vol. 24; no. 7; pp. 881 - 892
Main Authors	Kanungo, Tapas, Mount, David M, Netanyahu, Nathan S, Piatko, Christine D, Silverman, Ruth, Wu, Angela Y
Format	Journal Article
Language	English
Published	01.07.2002
Online Access	Get full text
ISSN	0162-8828
DOI	10.1109/TPAMI.2002.1017616

Cover

Abstract	In k-means clustering, we are given a set of n data points in d-dimensional space R super(d) and an integer k and the problem is to determine a set of k points in R super(d), called centers, so as to minimize the mean squared distance from each data point to its nearest center. A popular heuristic for k-means clustering is Lloyd's algorithm. In this paper, we present a simple and efficient implementation of Lloyd's k-means clustering algorithm, which we call the filtering algorithm. This algorithm is easy to implement, requiring a kd-tree as the only major data structure. We establish the practical efficiency of the filtering algorithm in two ways. First, we present a data-sensitive analysis of the algorithm's running time, which allows that the algorithm runs faster as the separation between clusters increases. Second, we present a number of empirical studies both on synthetically generated data and on real data sets from applications in color quantization, data compression, and image segmentation.
AbstractList	In k-means clustering, we are given a set of n data points in d-dimensional space R super(d) and an integer k and the problem is to determine a set of k points in R super(d), called centers, so as to minimize the mean squared distance from each data point to its nearest center. A popular heuristic for k-means clustering is Lloyd's algorithm. In this paper, we present a simple and efficient implementation of Lloyd's k-means clustering algorithm, which we call the filtering algorithm. This algorithm is easy to implement, requiring a kd-tree as the only major data structure. We establish the practical efficiency of the filtering algorithm in two ways. First, we present a data-sensitive analysis of the algorithm's running time, which allows that the algorithm runs faster as the separation between clusters increases. Second, we present a number of empirical studies both on synthetically generated data and on real data sets from applications in color quantization, data compression, and image segmentation.
Author	Piatko, Christine D Wu, Angela Y Mount, David M Netanyahu, Nathan S Silverman, Ruth Kanungo, Tapas
Author_xml	– sequence: 1 givenname: Tapas surname: Kanungo fullname: Kanungo, Tapas – sequence: 2 givenname: David surname: Mount middlename: M fullname: Mount, David M – sequence: 3 givenname: Nathan surname: Netanyahu middlename: S fullname: Netanyahu, Nathan S – sequence: 4 givenname: Christine surname: Piatko middlename: D fullname: Piatko, Christine D – sequence: 5 givenname: Ruth surname: Silverman fullname: Silverman, Ruth – sequence: 6 givenname: Angela surname: Wu middlename: Y fullname: Wu, Angela Y
BookMark	eNotjrFOwzAUAD0UibbwA0ye2BKeHSeO2aIKSqUiGLJXL45dDI4TYmfg76kE0013ug1ZhTEYQu4Y5IyBemjfm9dDzgF4zoDJilUrsgZW8ayueX1NNjF-AjBRQrEm-yZQY63TzoREv7LBYIhU-yUmM7twpujP4-zSxxAfaRPQ_0QXKYaeumHyZrhYmNwYbsiVRR_N7T-3pH1-ancv2fFtf9g1x2xSKmUFgBYoRQ-1spcZI6SSBgHQItdVKaCvC12ALpWuGYIVQnRWAq-45th1xZbc_2WnefxeTEynwUVtvMdgxiWeuCylgkoUv7TOT-o
ContentType	Journal Article
DBID	7SC 8FD JQ2 L7M L~C L~D
DOI	10.1109/TPAMI.2002.1017616
DatabaseName	Computer and Information Systems Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional
DatabaseTitle	Computer and Information Systems Abstracts Technology Research Database Computer and Information Systems Abstracts – Academic Advanced Technologies Database with Aerospace ProQuest Computer Science Collection Computer and Information Systems Abstracts Professional
DatabaseTitleList	Computer and Information Systems Abstracts
DeliveryMethod	fulltext_linktorsrc
Discipline	Engineering Computer Science
EndPage	892
GroupedDBID	--- -DZ -~X .DC 0R~ 29I 4.4 53G 5GY 6IK 7SC 8FD 9M8 AAJGR AAWTH ABAZT ABQJQ ABVLG ACGFO ACGFS ACIWK ACNCT ADRHT AENEX AETEA AGQYO AHBIQ AIBXA AKQYR ALMA_UNASSIGNED_HOLDINGS ASUFR BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 E.L EBS EJD F5P HZ~ H~9 IEDLZ IFIPE IPLJI JAVBF JQ2 L7M LAI L~C L~D M43 MS~ O9- OCL P2P PQQKQ RIA RIE RNI RNS RXW RZB TAE TN5 UHB XJT ~02
ID	FETCH-LOGICAL-p99t-300c4a74d089f828e4797ea00afa2c6540d83c30c59c81a0f444bf70262c2abb3
ISSN	0162-8828
IngestDate	Thu Oct 02 05:08:05 EDT 2025
IsPeerReviewed	true
IsScholarly	true
Issue	7
Language	English
LinkModel	OpenURL
MergedId	FETCHMERGED-LOGICAL-p99t-300c4a74d089f828e4797ea00afa2c6540d83c30c59c81a0f444bf70262c2abb3
Notes	ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23
PQID	27579064
PQPubID	23500
PageCount	12
ParticipantIDs	proquest_miscellaneous_27579064
PublicationCentury	2000
PublicationDate	20020701
PublicationDateYYYYMMDD	2002-07-01
PublicationDate_xml	– month: 07 year: 2002 text: 20020701 day: 01
PublicationDecade	2000
PublicationTitle	IEEE transactions on pattern analysis and machine intelligence
PublicationYear	2002
SSID	ssj0014503
Score	2.3977823
Snippet	In k-means clustering, we are given a set of n data points in d-dimensional space R super(d) and an integer k and the problem is to determine a set of k points...
SourceID	proquest
SourceType	Aggregation Database
StartPage	881
Title	An efficient k-means clustering algorithms: Analysis and implementation
URI	https://www.proquest.com/docview/27579064
Volume	24
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
journalDatabaseRights	– providerCode: PRVIEE databaseName: IEEE/IET Electronic Library issn: 0162-8828 databaseCode: RIE dateStart: 19790101 customDbUrl: isFulltext: true dateEnd: 99991231 titleUrlDefault: https://ieeexplore.ieee.org/ omitProxy: false ssIdentifier: ssj0014503 providerName: IEEE
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1Lb9QwELaW9gIHoAVEgbY-IC6VwZt17JjbqvQB6i49pNLeVrbj0KrdZKGbA_yB_u2OHSfZaitRuERRFOU1X2bG8_gGofci6Sujc0FibRlhkmuipZGkrxUDb0PEPHMNzqMxPz5j3ybxpNe7Wapaqhb6o_lzb1_J_0gVjoFcXZfsP0i2vSgcgH2QL2xBwrB9kIyHhavHuPA9jXuXZGbB7uyZq8qRH_jmw6sfJSz-z2fXdfwv8I_4fMGsKRxvJRNcVLf8c5MjmjHiPp8w9zScrnB56RozX4hpPeVEQ-vZKnBVgBrxcdgU7HHruY_KwHTgi-m7WOzYgpf6W51Xtcp3Af0uKnsK-LksOy4Ed9Mvd-IVXW1rG8LkoIOT0BIedHDdRx2wJpYUalIPdGlscz03b1Xte9bU9HQ4-uqrTjx7FO_fw7E9_j49PDs5maYHk_TD_Cdx48dcmj7MYnmE1iMwD7RuBGzTUSz2I7bbR2-6r6j8tHrTFavuXZX0OXoa1hh4WANmA_VssYmeNfM7cFDnm-jJEhnlC3Q0LHCLJhzQhDs04Q5Nn3GDJQw4wHex9BKlhwfp_jEJYzbIXMoFGVBqmBIso4nM4d0sE1JYRanKVWQ4ePRZMjADamJp4MemOWMMfm5Yu0cmUloPXqG1oizsa4QjmTEpIpYxzZmOuMykjjORK5UJqbjeQrvNl5mCFnOpKVXYsrqeRiIWErzjN3894y163KHqHVpb_KrsNviFC73jZXYLFCBpkQ
linkProvider	IEEE
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=An+efficient+k-means+clustering+algorithms%3A+Analysis+and+implementation&rft.jtitle=IEEE+transactions+on+pattern+analysis+and+machine+intelligence&rft.au=Kanungo%2C+Tapas&rft.au=Mount%2C+David+M&rft.au=Netanyahu%2C+Nathan+S&rft.au=Piatko%2C+Christine+D&rft.date=2002-07-01&rft.issn=0162-8828&rft.volume=24&rft.issue=7&rft.spage=881&rft.epage=892&rft_id=info:doi/10.1109%2FTPAMI.2002.1017616&rft.externalDBID=NO_FULL_TEXT
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0162-8828&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0162-8828&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0162-8828&client=summon