An automatic classification algorithm for software vulnerability based on weighted word vector and fusion neural network
To address the problem that the traditional vectored representation of software vulnerability data has high-dimensional sparsity and leads to unsatisfactory automatic classification, this paper proposes an automatic classification algorithm for software vulnerabilities based on weighted word vectors...
        Saved in:
      
    
          | Published in | Computers & security Vol. 126; p. 103070 | 
|---|---|
| Main Authors | , , , | 
| Format | Journal Article | 
| Language | English | 
| Published | 
            Elsevier Ltd
    
        01.03.2023
     | 
| Subjects | |
| Online Access | Get full text | 
| ISSN | 0167-4048 1872-6208  | 
| DOI | 10.1016/j.cose.2022.103070 | 
Cover
| Summary: | To address the problem that the traditional vectored representation of software vulnerability data has high-dimensional sparsity and leads to unsatisfactory automatic classification, this paper proposes an automatic classification algorithm for software vulnerabilities based on weighted word vectors and fusion neural network. Firstly, the Term Frequency-Inverse Document Frequency (TF-IDF) algorithm is improved to generate the weighted word vector with low dimension and density according to the category distribution. Secondly, the vulnerability classification model TCNN-BiGRU consists of TextCNN (TCNN) and Bidirectional GRU (BiGRU) is constructed, which has made full use of the advantages of convolutional neural network (CNN) and gate recurrent unit neural network (GRU). TextCNN is used to extract local features of vulnerability description text, BiGRU is used to extract global features of vulnerability description text, and the output feature vectors are fused to achieve dimensionality reduction. Finally, the Dropout method and Early Stopping method are introduced to suppress overfitting, and a Softmax classifier is used to classify the vulnerability category. The classification performance of the proposed algorithm is verified by ablation experiments, sparsity problem expriments and comparative experiments on the vulnerability data from NVD dataset on the indicators of accuracy, macro precision rate, macro recall rate and macro F1-score. | 
|---|---|
| ISSN: | 0167-4048 1872-6208  | 
| DOI: | 10.1016/j.cose.2022.103070 |