An Approach to Software Defect Prediction Combining Semantic Features and Code Changes
Software defect prediction (SDP), which predicts defective code regions, can help developers reasonably allocate limited resources for locating bugs and prioritizing their testing efforts. Previous work on defect prediction has used machine learning and artificial software metrics. However, traditio...
        Saved in:
      
    
          | Published in | International journal of software engineering and knowledge engineering Vol. 32; no. 9; pp. 1345 - 1368 | 
|---|---|
| Main Authors | , , , | 
| Format | Journal Article | 
| Language | English | 
| Published | 
        Singapore
          World Scientific Publishing Company
    
        01.09.2022
     World Scientific Publishing Co. Pte., Ltd  | 
| Subjects | |
| Online Access | Get full text | 
| ISSN | 0218-1940 1793-6403  | 
| DOI | 10.1142/S0218194022500504 | 
Cover
| Summary: | Software defect prediction (SDP), which predicts defective code regions, can help developers reasonably allocate limited resources for locating bugs and prioritizing their testing efforts. Previous work on defect prediction has used machine learning and artificial software metrics. However, traditional defect prediction features extracted from artificial software metrics often fail to capture the syntactic and semantic information of defective modules. This work on defect prediction mostly focuses on abstract syntax tree (AST). Moreover, because current research on AST technology is relatively mature, it is difficult to further improve the accuracy of defect prediction when only using AST to characterize codes. In this paper, in order to capture more semantic features, we extract semantic information both from the sequences of AST tokens and code change tokens. In addition, to leverage the traditional features extracted from statistical metrics, we also combine the semantic features with traditional defect prediction features to perform SDP, and use the gated fusion mechanism to determine the combination ratio of the two kinds of features. In our empirical studies, 10 open-source Java projects from the PROMISE repository are chosen as our empirical subjects. Experimental results show that our proposed approach can perform better than several state-of-the-art baseline SDP methods. | 
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14  | 
| ISSN: | 0218-1940 1793-6403  | 
| DOI: | 10.1142/S0218194022500504 |