An efficient k-means clustering algorithms: Analysis and implementation

In k-means clustering, we are given a set of n data points in d-dimensional space R super(d) and an integer k and the problem is to determine a set of k points in R super(d), called centers, so as to minimize the mean squared distance from each data point to its nearest center. A popular heuristic f...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on pattern analysis and machine intelligence Vol. 24; no. 7; pp. 881 - 892
Main Authors Kanungo, Tapas, Mount, David M, Netanyahu, Nathan S, Piatko, Christine D, Silverman, Ruth, Wu, Angela Y
Format Journal Article
LanguageEnglish
Published 01.07.2002
Online AccessGet full text
ISSN0162-8828
DOI10.1109/TPAMI.2002.1017616

Cover

Abstract In k-means clustering, we are given a set of n data points in d-dimensional space R super(d) and an integer k and the problem is to determine a set of k points in R super(d), called centers, so as to minimize the mean squared distance from each data point to its nearest center. A popular heuristic for k-means clustering is Lloyd's algorithm. In this paper, we present a simple and efficient implementation of Lloyd's k-means clustering algorithm, which we call the filtering algorithm. This algorithm is easy to implement, requiring a kd-tree as the only major data structure. We establish the practical efficiency of the filtering algorithm in two ways. First, we present a data-sensitive analysis of the algorithm's running time, which allows that the algorithm runs faster as the separation between clusters increases. Second, we present a number of empirical studies both on synthetically generated data and on real data sets from applications in color quantization, data compression, and image segmentation.
AbstractList In k-means clustering, we are given a set of n data points in d-dimensional space R super(d) and an integer k and the problem is to determine a set of k points in R super(d), called centers, so as to minimize the mean squared distance from each data point to its nearest center. A popular heuristic for k-means clustering is Lloyd's algorithm. In this paper, we present a simple and efficient implementation of Lloyd's k-means clustering algorithm, which we call the filtering algorithm. This algorithm is easy to implement, requiring a kd-tree as the only major data structure. We establish the practical efficiency of the filtering algorithm in two ways. First, we present a data-sensitive analysis of the algorithm's running time, which allows that the algorithm runs faster as the separation between clusters increases. Second, we present a number of empirical studies both on synthetically generated data and on real data sets from applications in color quantization, data compression, and image segmentation.
Author Piatko, Christine D
Wu, Angela Y
Mount, David M
Netanyahu, Nathan S
Silverman, Ruth
Kanungo, Tapas
Author_xml – sequence: 1
  givenname: Tapas
  surname: Kanungo
  fullname: Kanungo, Tapas
– sequence: 2
  givenname: David
  surname: Mount
  middlename: M
  fullname: Mount, David M
– sequence: 3
  givenname: Nathan
  surname: Netanyahu
  middlename: S
  fullname: Netanyahu, Nathan S
– sequence: 4
  givenname: Christine
  surname: Piatko
  middlename: D
  fullname: Piatko, Christine D
– sequence: 5
  givenname: Ruth
  surname: Silverman
  fullname: Silverman, Ruth
– sequence: 6
  givenname: Angela
  surname: Wu
  middlename: Y
  fullname: Wu, Angela Y
BookMark eNotjrFOwzAUAD0UibbwA0ye2BKeHSeO2aIKSqUiGLJXL45dDI4TYmfg76kE0013ug1ZhTEYQu4Y5IyBemjfm9dDzgF4zoDJilUrsgZW8ayueX1NNjF-AjBRQrEm-yZQY63TzoREv7LBYIhU-yUmM7twpujP4-zSxxAfaRPQ_0QXKYaeumHyZrhYmNwYbsiVRR_N7T-3pH1-ancv2fFtf9g1x2xSKmUFgBYoRQ-1spcZI6SSBgHQItdVKaCvC12ALpWuGYIVQnRWAq-45th1xZbc_2WnefxeTEynwUVtvMdgxiWeuCylgkoUv7TOT-o
ContentType Journal Article
DBID 7SC
8FD
JQ2
L7M
L~C
L~D
DOI 10.1109/TPAMI.2002.1017616
DatabaseName Computer and Information Systems Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle Computer and Information Systems Abstracts
Technology Research Database
Computer and Information Systems Abstracts – Academic
Advanced Technologies Database with Aerospace
ProQuest Computer Science Collection
Computer and Information Systems Abstracts Professional
DatabaseTitleList Computer and Information Systems Abstracts
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Computer Science
EndPage 892
GroupedDBID ---
-DZ
-~X
.DC
0R~
29I
4.4
53G
5GY
6IK
7SC
8FD
9M8
AAJGR
AAWTH
ABAZT
ABQJQ
ABVLG
ACGFO
ACGFS
ACIWK
ACNCT
ADRHT
AENEX
AETEA
AGQYO
AHBIQ
AIBXA
AKQYR
ALMA_UNASSIGNED_HOLDINGS
ASUFR
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
E.L
EBS
EJD
F5P
HZ~
H~9
IEDLZ
IFIPE
IPLJI
JAVBF
JQ2
L7M
LAI
L~C
L~D
M43
MS~
O9-
OCL
P2P
PQQKQ
RIA
RIE
RNI
RNS
RXW
RZB
TAE
TN5
UHB
XJT
~02
ID FETCH-LOGICAL-p99t-300c4a74d089f828e4797ea00afa2c6540d83c30c59c81a0f444bf70262c2abb3
ISSN 0162-8828
IngestDate Thu Oct 02 05:08:05 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 7
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-p99t-300c4a74d089f828e4797ea00afa2c6540d83c30c59c81a0f444bf70262c2abb3
Notes ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 23
PQID 27579064
PQPubID 23500
PageCount 12
ParticipantIDs proquest_miscellaneous_27579064
PublicationCentury 2000
PublicationDate 20020701
PublicationDateYYYYMMDD 2002-07-01
PublicationDate_xml – month: 07
  year: 2002
  text: 20020701
  day: 01
PublicationDecade 2000
PublicationTitle IEEE transactions on pattern analysis and machine intelligence
PublicationYear 2002
SSID ssj0014503
Score 2.3977823
Snippet In k-means clustering, we are given a set of n data points in d-dimensional space R super(d) and an integer k and the problem is to determine a set of k points...
SourceID proquest
SourceType Aggregation Database
StartPage 881
Title An efficient k-means clustering algorithms: Analysis and implementation
URI https://www.proquest.com/docview/27579064
Volume 24
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVIEE
  databaseName: IEEE/IET Electronic Library
  issn: 0162-8828
  databaseCode: RIE
  dateStart: 19790101
  customDbUrl:
  isFulltext: true
  dateEnd: 99991231
  titleUrlDefault: https://ieeexplore.ieee.org/
  omitProxy: false
  ssIdentifier: ssj0014503
  providerName: IEEE
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1Lb9QwELaW9gIHoAVEgbY-IC6VwZt17JjbqvQB6i49pNLeVrbj0KrdZKGbA_yB_u2OHSfZaitRuERRFOU1X2bG8_gGofci6Sujc0FibRlhkmuipZGkrxUDb0PEPHMNzqMxPz5j3ybxpNe7Wapaqhb6o_lzb1_J_0gVjoFcXZfsP0i2vSgcgH2QL2xBwrB9kIyHhavHuPA9jXuXZGbB7uyZq8qRH_jmw6sfJSz-z2fXdfwv8I_4fMGsKRxvJRNcVLf8c5MjmjHiPp8w9zScrnB56RozX4hpPeVEQ-vZKnBVgBrxcdgU7HHruY_KwHTgi-m7WOzYgpf6W51Xtcp3Af0uKnsK-LksOy4Ed9Mvd-IVXW1rG8LkoIOT0BIedHDdRx2wJpYUalIPdGlscz03b1Xte9bU9HQ4-uqrTjx7FO_fw7E9_j49PDs5maYHk_TD_Cdx48dcmj7MYnmE1iMwD7RuBGzTUSz2I7bbR2-6r6j8tHrTFavuXZX0OXoa1hh4WANmA_VssYmeNfM7cFDnm-jJEhnlC3Q0LHCLJhzQhDs04Q5Nn3GDJQw4wHex9BKlhwfp_jEJYzbIXMoFGVBqmBIso4nM4d0sE1JYRanKVWQ4ePRZMjADamJp4MemOWMMfm5Yu0cmUloPXqG1oizsa4QjmTEpIpYxzZmOuMykjjORK5UJqbjeQrvNl5mCFnOpKVXYsrqeRiIWErzjN3894y163KHqHVpb_KrsNviFC73jZXYLFCBpkQ
linkProvider IEEE
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=An+efficient+k-means+clustering+algorithms%3A+Analysis+and+implementation&rft.jtitle=IEEE+transactions+on+pattern+analysis+and+machine+intelligence&rft.au=Kanungo%2C+Tapas&rft.au=Mount%2C+David+M&rft.au=Netanyahu%2C+Nathan+S&rft.au=Piatko%2C+Christine+D&rft.date=2002-07-01&rft.issn=0162-8828&rft.volume=24&rft.issue=7&rft.spage=881&rft.epage=892&rft_id=info:doi/10.1109%2FTPAMI.2002.1017616&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0162-8828&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0162-8828&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0162-8828&client=summon