PBDT: Python Backdoor Detection Model Based on Combined Features

Application security is essential in today’s highly development period. Backdoor is a means by which attackers can invade the system to achieve illegal purposes and damage users’ rights. It has posed a serious threat to network security. Thus, it is urgent to take adequate measures to defend such at...

Full description

Saved in:

Bibliographic Details
Published in	Security and communication networks Vol. 2021; pp. 1 - 13
Main Authors	Fang, Yong, Xie, Mingyu, Huang, Cheng
Format	Journal Article
Language	English
Published	London Hindawi 14.09.2021 John Wiley & Sons, Inc
Subjects	Algorithms Classifiers Codes Datasets Entropy (Information theory) Functionals JavaScript Language Machine learning Methods Model accuracy Neural networks Programming languages Python Security Semantics Statistical methods
Online Access	Get full text
ISSN	1939-0114 1939-0122 1939-0122
DOI	10.1155/2021/9923234

Cover

More Information
Summary:	Application security is essential in today’s highly development period. Backdoor is a means by which attackers can invade the system to achieve illegal purposes and damage users’ rights. It has posed a serious threat to network security. Thus, it is urgent to take adequate measures to defend such attacks. Previous research work was mainly focused on numerous PHP webshells, with less research on Python backdoor files. Language differences make the method not entirely applicable. This paper proposes a Python backdoor detection model named PBDT based on combined features. The model summarizes the common functional modules and functions in the backdoor files and extracts the number of calls in the text to form sample features. What is more, we consider the text’s statistical characteristics, including the information entropy, the longest string, etc., to identify the obfuscated Python code. Besides, the opcode sequence is used to represent code characteristics, such as TF-IDF vector and FastText classifier, to eliminate the influence of interference items. Finally, we introduce the Random Forest algorithm to build a classifier. Covering most types of backdoors, some samples are obfuscated, the model achieves an accuracy of 97.70%, and the TNR index is as high as 98.66%, showing a good classification performance in Python backdoor detection.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1939-0114 1939-0122 1939-0122
DOI:	10.1155/2021/9923234