An Approach to Software Defect Prediction Combining Semantic Features and Code Changes

Software defect prediction (SDP), which predicts defective code regions, can help developers reasonably allocate limited resources for locating bugs and prioritizing their testing efforts. Previous work on defect prediction has used machine learning and artificial software metrics. However, traditio...

Full description

Saved in:
Bibliographic Details
Published inInternational journal of software engineering and knowledge engineering Vol. 32; no. 9; pp. 1345 - 1368
Main Authors Tao, Chuanqi, Wang, Tao, Guo, Hongjing, Zhang, Jingxuan
Format Journal Article
LanguageEnglish
Published Singapore World Scientific Publishing Company 01.09.2022
World Scientific Publishing Co. Pte., Ltd
Subjects
Online AccessGet full text
ISSN0218-1940
1793-6403
DOI10.1142/S0218194022500504

Cover

More Information
Summary:Software defect prediction (SDP), which predicts defective code regions, can help developers reasonably allocate limited resources for locating bugs and prioritizing their testing efforts. Previous work on defect prediction has used machine learning and artificial software metrics. However, traditional defect prediction features extracted from artificial software metrics often fail to capture the syntactic and semantic information of defective modules. This work on defect prediction mostly focuses on abstract syntax tree (AST). Moreover, because current research on AST technology is relatively mature, it is difficult to further improve the accuracy of defect prediction when only using AST to characterize codes. In this paper, in order to capture more semantic features, we extract semantic information both from the sequences of AST tokens and code change tokens. In addition, to leverage the traditional features extracted from statistical metrics, we also combine the semantic features with traditional defect prediction features to perform SDP, and use the gated fusion mechanism to determine the combination ratio of the two kinds of features. In our empirical studies, 10 open-source Java projects from the PROMISE repository are chosen as our empirical subjects. Experimental results show that our proposed approach can perform better than several state-of-the-art baseline SDP methods.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0218-1940
1793-6403
DOI:10.1142/S0218194022500504