A Distributed Storage and Computation k-Nearest Neighbor Algorithm Based Cloud-Edge Computing for Cyber-Physical-Social Systems

The k-nearest neighbor (kNN) algorithm is a classic supervised machine learning algorithm. It is widely used in cyber-physical-social systems (CPSS) to analyze and mine data. However, in practical CPSS applications, the standard linear kNN algorithm struggles to efficiently process massive data sets...

Full description

Saved in:

Bibliographic Details
Published in	IEEE access Vol. 8; pp. 50118 - 50130
Main Authors	Zhang, Wei, Chen, Xiaohui, Liu, Yueqi, Xi, Qian
Format	Journal Article
Language	English
Published	Piscataway IEEE 2020 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Algorithms Big Data Boundaries Cloud computing cloud-edge computing Clustering algorithms CPSS Datasets Distributed databases distributed storage and computation Edge computing Error analysis k-nearest neighbor boundary K-nearest neighbors algorithm kNN Machine learning Machine learning algorithms Massive data points Mathematical analysis Partitioning algorithms
Online Access	Get full text
ISSN	2169-3536 2169-3536
DOI	10.1109/ACCESS.2020.2974764

Cover

More Information
Summary:	The k-nearest neighbor (kNN) algorithm is a classic supervised machine learning algorithm. It is widely used in cyber-physical-social systems (CPSS) to analyze and mine data. However, in practical CPSS applications, the standard linear kNN algorithm struggles to efficiently process massive data sets. This paper proposes a distributed storage and computation k-nearest neighbor (D-kNN) algorithm. The D-kNN algorithm has the following advantages: First, the concept of k-nearest neighbor boundaries is proposed and the k-nearest neighbor search within the k-nearest neighbors boundaries can effectively reduce the time complexity of kNN. Second, based on the k-neighbor boundary, massive data sets beyond the main storage space are stored on distributed storage nodes. Third, the algorithm performs k-nearest neighbor searching efficiently by performing distributed calculations at each storage node. Finally, a series of experiments were performed to verify the effectiveness of the D-kNN algorithm. The experimental results show that the D-kNN algorithm based on distributed storage and calculation effectively improves the operation efficiency of k-nearest neighbor search. The algorithm can be easily and flexibly deployed in a cloud-edge computing environment to process massive data sets in CPSS.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2020.2974764