A Distributed Storage and Computation k-Nearest Neighbor Algorithm Based Cloud-Edge Computing for Cyber-Physical-Social Systems
The k-nearest neighbor (kNN) algorithm is a classic supervised machine learning algorithm. It is widely used in cyber-physical-social systems (CPSS) to analyze and mine data. However, in practical CPSS applications, the standard linear kNN algorithm struggles to efficiently process massive data sets...
Saved in:
| Published in | IEEE access Vol. 8; pp. 50118 - 50130 |
|---|---|
| Main Authors | , , , |
| Format | Journal Article |
| Language | English |
| Published |
Piscataway
IEEE
2020
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Subjects | |
| Online Access | Get full text |
| ISSN | 2169-3536 2169-3536 |
| DOI | 10.1109/ACCESS.2020.2974764 |
Cover
| Summary: | The k-nearest neighbor (kNN) algorithm is a classic supervised machine learning algorithm. It is widely used in cyber-physical-social systems (CPSS) to analyze and mine data. However, in practical CPSS applications, the standard linear kNN algorithm struggles to efficiently process massive data sets. This paper proposes a distributed storage and computation k-nearest neighbor (D-kNN) algorithm. The D-kNN algorithm has the following advantages: First, the concept of k-nearest neighbor boundaries is proposed and the k-nearest neighbor search within the k-nearest neighbors boundaries can effectively reduce the time complexity of kNN. Second, based on the k-neighbor boundary, massive data sets beyond the main storage space are stored on distributed storage nodes. Third, the algorithm performs k-nearest neighbor searching efficiently by performing distributed calculations at each storage node. Finally, a series of experiments were performed to verify the effectiveness of the D-kNN algorithm. The experimental results show that the D-kNN algorithm based on distributed storage and calculation effectively improves the operation efficiency of k-nearest neighbor search. The algorithm can be easily and flexibly deployed in a cloud-edge computing environment to process massive data sets in CPSS. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 2169-3536 2169-3536 |
| DOI: | 10.1109/ACCESS.2020.2974764 |