K-means clustering algorithm and Python implementation

K-means is a commonly used algorithm in machine learning. It is an unsupervised learning algorithm. It is regularly used for data clustering. Only the number of clusters are needed to be specified for it to automatically aggregate the data into multiple categories, the similarity between data in the...

Full description

Saved in:
Bibliographic Details
Published in2021 IEEE International Conference on Computer Science, Artificial Intelligence and Electronic Engineering (CSAIEE) pp. 55 - 59
Main Author Wu, BoKai
Format Conference Proceeding
LanguageEnglish
Published IEEE 20.08.2021
Subjects
Online AccessGet full text
DOI10.1109/CSAIEE54046.2021.9543260

Cover

Abstract K-means is a commonly used algorithm in machine learning. It is an unsupervised learning algorithm. It is regularly used for data clustering. Only the number of clusters are needed to be specified for it to automatically aggregate the data into multiple categories, the similarity between data in the same cluster is high, thus, the similarity of data in different clusters is low. K-means algorithm is a typical distance-based clustering algorithm. It takes distance as the evaluation index of similarity, that is, the closer the distance between two objects, the greater similarity. Clustering is also extremely extensive in practical applications, such as: market segmentation, social network analysis, organized computing clusters, and astronomical data analysis. This paper is my own attempt to make K-means code and API, using Python and Java to jointly complete a project. The Python is mainly used to write the framework of the core algorithm of K-means, and the Java to create experimental data. In this research report, I will describe the simple data model provided by K-means, as well as the design and implementation of K-means.
AbstractList K-means is a commonly used algorithm in machine learning. It is an unsupervised learning algorithm. It is regularly used for data clustering. Only the number of clusters are needed to be specified for it to automatically aggregate the data into multiple categories, the similarity between data in the same cluster is high, thus, the similarity of data in different clusters is low. K-means algorithm is a typical distance-based clustering algorithm. It takes distance as the evaluation index of similarity, that is, the closer the distance between two objects, the greater similarity. Clustering is also extremely extensive in practical applications, such as: market segmentation, social network analysis, organized computing clusters, and astronomical data analysis. This paper is my own attempt to make K-means code and API, using Python and Java to jointly complete a project. The Python is mainly used to write the framework of the core algorithm of K-means, and the Java to create experimental data. In this research report, I will describe the simple data model provided by K-means, as well as the design and implementation of K-means.
Author Wu, BoKai
Author_xml – sequence: 1
  givenname: BoKai
  surname: Wu
  fullname: Wu, BoKai
  email: 204911@student.upm.edu.my
  organization: University Putra,Department of computer science and information technology,Malaysia
BookMark eNotj71uwjAYAI3UDoX2Cbr4BZL6c_wTjyhKWwQSSG1nZCefwVLsoMQMvH1Vlem2O92SPKQxISEUWAnAzFvztd60rRRMqJIzDqWRouKKLcgSlJKCcybgiahtEdGmmXbDdc44hXSidjiNU8jnSG3q6eGWz2OiIV4GjJiyzWFMz-TR22HGlztX5Oe9_W4-i93-Y9Osd0UAqHNhnKyd80IjQGdcrSxTWnCDf_m6c9hzdMhMr8GCRuUMV5U2XvYe0Tpfrcjrvzcg4vEyhWin2_G-Uv0C5WxE9Q
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/CSAIEE54046.2021.9543260
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 1665422041
9781665422048
EndPage 59
ExternalDocumentID 9543260
Genre orig-research
GroupedDBID 6IE
6IL
CBEJK
RIE
RIL
ID FETCH-LOGICAL-i118t-9b58bbf47e11c9b86a067429e42208cbed2ebe09d71a17e6b926379f5dfeeabf3
IEDL.DBID RIE
IngestDate Thu Jun 29 18:37:34 EDT 2023
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i118t-9b58bbf47e11c9b86a067429e42208cbed2ebe09d71a17e6b926379f5dfeeabf3
PageCount 5
ParticipantIDs ieee_primary_9543260
PublicationCentury 2000
PublicationDate 2021-Aug.-20
PublicationDateYYYYMMDD 2021-08-20
PublicationDate_xml – month: 08
  year: 2021
  text: 2021-Aug.-20
  day: 20
PublicationDecade 2020
PublicationTitle 2021 IEEE International Conference on Computer Science, Artificial Intelligence and Electronic Engineering (CSAIEE)
PublicationTitleAbbrev CSAIEE
PublicationYear 2021
Publisher IEEE
Publisher_xml – name: IEEE
Score 1.7706246
Snippet K-means is a commonly used algorithm in machine learning. It is an unsupervised learning algorithm. It is regularly used for data clustering. Only the number...
SourceID ieee
SourceType Publisher
StartPage 55
SubjectTerms API
Clustering algorithms
Codes
Data analysis
Java
K-means algorithm
Machine learning
Machine learning algorithms
Social networking (online)
Title K-means clustering algorithm and Python implementation
URI https://ieeexplore.ieee.org/document/9543260
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LSwMxEA61J08qrfgmB49mm-xmd5OjFEtRKgUt9FbymNWi3YrsHvTXO9k-RPHgLSSBPCbwZSbffCHkEo-FVVxKZlLHmVQ2ZxaBiSXcc69zA0UTzBndZ8OJvJ2m0xa52ubCAEBDPoMoFJu3fL90dQiV9XQq8baBDvpOrrJVrtaGnMN1r_9wjQ4S3kBkoB7EIlp3__FvSgMbgz0y2gy4You8RHVlI_f5S4vxvzPaJ93vBD063kLPAWlB2SHZHVsAIg91r3WQP8AWal6fluj-Py-oKT0dfwSlADpfbEjjwSpdMhncPPaHbP0tApujN1AxbVNlbSFzEMJpqzKDiIOwAjKOuXIWfIyW4drnwogcMqvjLMl1kfoCwNgiOSTtclnCEaGpcgYxTLhEB5kdpaUsMmUceG8FGDgmnbDm2dtK-WK2Xu7J39WnZDfse4i4xvyMtKv3Gs4Rsit70djqC9RgmFc
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LSwMxEA6lHvSk0opv9-DRbLO7yW5ylGKp9kHBFnorecxqsd2VsnvQX2-yfYjiwVtICEmYwDcz-eYLQrf2WihOKMWSaYIpVwlWFphwRAwxIpGQVsmcwTDuTujTlE1r6G5XCwMAFfkMfNes3vJNrkuXKmsJRq23YQP0PUYpZetqrS09h4hW-_nehkjWB6GOfBAG_mbCj59TKuDoHKLBdsk1X-TNLwvl689faoz_3dMRan6X6HmjHfgcoxpkDRT38BIs9nh6UToBBDviycVLvpoXr0tPZsYbfTitAG--3NLGnV2aaNJ5GLe7ePMxAp7beKDAQjGuVEoTCAItFI-lxRwLLEDDkHCtwITWNkSYJJBBArESYRwlImUmBZAqjU5QPcszOEUe41paFAt0JJzQDheUpjGXGoxRAUg4Qw135tn7Wvtitjnu-d_dN2i_Ox70Z_3HYe8CHTgbuPxrSC5RvViVcGUBvFDXld2-AP8pm6Q
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2021+IEEE+International+Conference+on+Computer+Science%2C+Artificial+Intelligence+and+Electronic+Engineering+%28CSAIEE%29&rft.atitle=K-means+clustering+algorithm+and+Python+implementation&rft.au=Wu%2C+BoKai&rft.date=2021-08-20&rft.pub=IEEE&rft.spage=55&rft.epage=59&rft_id=info:doi/10.1109%2FCSAIEE54046.2021.9543260&rft.externalDocID=9543260