An Operating System Identification Method Based on Active Learning

In operating system (OS) identification, machine learning algorithms are widely adopted, which can receive reasonable accuracy even under encrypted traffic. However, machine learning algorithms require large amounts of labeled data for train. In addition, such algorithms have difficulty dealing with...

Full description

Saved in:

Bibliographic Details
Published in	2022 International Conference on Electrical, Computer and Energy Technologies (ICECET) pp. 1 - 6
Main Authors	Zhang, Daowei, Wang, Qiujie, Wei, Ziling, Chen, Shuhui
Format	Conference Proceeding
Language	English
Published	IEEE 20.07.2022
Subjects	Active Learning Fingerprint recognition Heuristic algorithms Imbalance problem Learning systems Machine learning Machine learning algorithms Operating system Operating systems Passive identification Prediction algorithms Training
Online Access	Get full text
DOI	10.1109/ICECET55527.2022.9873443

Cover

More Information
Summary:	In operating system (OS) identification, machine learning algorithms are widely adopted, which can receive reasonable accuracy even under encrypted traffic. However, machine learning algorithms require large amounts of labeled data for train. In addition, such algorithms have difficulty dealing with data imbalances and predicting the types of OSes that account for a small percentage of traffic. To solve the above challenges, we propose an OS identification algorithm based on active learning (AL) in this paper for the first time. In the algorithm, a query strategy is designed. We test the performance of the proposed algorithm using an unbalanced dataset. The results show that the proposed algorithm can achieve similar performance only using 0.4% of labeled data for training compared with the existing machine learning algorithms. Compared with existing algorithms, the proposed AL-based algorithm only needs 32% of the training time and 3.2% of the training samples to achieve the same accuracy under the full data set. In addition, it also performs better on multi-classification problems than existing algorithms.
DOI:	10.1109/ICECET55527.2022.9873443