An automatic classification algorithm for software vulnerability based on weighted word vector and fusion neural network

To address the problem that the traditional vectored representation of software vulnerability data has high-dimensional sparsity and leads to unsatisfactory automatic classification, this paper proposes an automatic classification algorithm for software vulnerabilities based on weighted word vectors...

Full description

Saved in:

Bibliographic Details
Published in	Computers & security Vol. 126; p. 103070
Main Authors	Wang, Qian, Gao, Yuying, Ren, Jiadong, Zhang, Bing
Format	Journal Article
Language	English
Published	Elsevier Ltd 01.03.2023
Subjects	Neural network Software security Text classification Vulnerability classification Word2Vec Neural network Vulnerability classification Software security Word2Vec Text classification
Online Access	Get full text
ISSN	0167-4048 1872-6208
DOI	10.1016/j.cose.2022.103070

Cover

More Information
Summary:	To address the problem that the traditional vectored representation of software vulnerability data has high-dimensional sparsity and leads to unsatisfactory automatic classification, this paper proposes an automatic classification algorithm for software vulnerabilities based on weighted word vectors and fusion neural network. Firstly, the Term Frequency-Inverse Document Frequency (TF-IDF) algorithm is improved to generate the weighted word vector with low dimension and density according to the category distribution. Secondly, the vulnerability classification model TCNN-BiGRU consists of TextCNN (TCNN) and Bidirectional GRU (BiGRU) is constructed, which has made full use of the advantages of convolutional neural network (CNN) and gate recurrent unit neural network (GRU). TextCNN is used to extract local features of vulnerability description text, BiGRU is used to extract global features of vulnerability description text, and the output feature vectors are fused to achieve dimensionality reduction. Finally, the Dropout method and Early Stopping method are introduced to suppress overfitting, and a Softmax classifier is used to classify the vulnerability category. The classification performance of the proposed algorithm is verified by ablation experiments, sparsity problem expriments and comparative experiments on the vulnerability data from NVD dataset on the indicators of accuracy, macro precision rate, macro recall rate and macro F1-score.
ISSN:	0167-4048 1872-6208
DOI:	10.1016/j.cose.2022.103070