An automatic classification algorithm for software vulnerability based on weighted word vector and fusion neural network

To address the problem that the traditional vectored representation of software vulnerability data has high-dimensional sparsity and leads to unsatisfactory automatic classification, this paper proposes an automatic classification algorithm for software vulnerabilities based on weighted word vectors...

Full description

Saved in:
Bibliographic Details
Published inComputers & security Vol. 126; p. 103070
Main Authors Wang, Qian, Gao, Yuying, Ren, Jiadong, Zhang, Bing
Format Journal Article
LanguageEnglish
Published Elsevier Ltd 01.03.2023
Subjects
Online AccessGet full text
ISSN0167-4048
1872-6208
DOI10.1016/j.cose.2022.103070

Cover

More Information
Summary:To address the problem that the traditional vectored representation of software vulnerability data has high-dimensional sparsity and leads to unsatisfactory automatic classification, this paper proposes an automatic classification algorithm for software vulnerabilities based on weighted word vectors and fusion neural network. Firstly, the Term Frequency-Inverse Document Frequency (TF-IDF) algorithm is improved to generate the weighted word vector with low dimension and density according to the category distribution. Secondly, the vulnerability classification model TCNN-BiGRU consists of TextCNN (TCNN) and Bidirectional GRU (BiGRU) is constructed, which has made full use of the advantages of convolutional neural network (CNN) and gate recurrent unit neural network (GRU). TextCNN is used to extract local features of vulnerability description text, BiGRU is used to extract global features of vulnerability description text, and the output feature vectors are fused to achieve dimensionality reduction. Finally, the Dropout method and Early Stopping method are introduced to suppress overfitting, and a Softmax classifier is used to classify the vulnerability category. The classification performance of the proposed algorithm is verified by ablation experiments, sparsity problem expriments and comparative experiments on the vulnerability data from NVD dataset on the indicators of accuracy, macro precision rate, macro recall rate and macro F1-score.
ISSN:0167-4048
1872-6208
DOI:10.1016/j.cose.2022.103070