一种基于主题模型的软件缺陷预测技术研究

软件缺陷预测通常针对代码表面特征训练预测模型并对新样本进行预测,忽视了代码背后隐藏的不同技术方面和主题,从而导致预测不准确。针对这种问题,提出了一种基于主题模型的软件缺陷预测方法。将软件代码库视为不同技术方面和主题的集合,不同的主题或技术方面有不同的缺陷倾向。采用LDA主题模型对不同主题及其缺陷倾向进行建模,根据建模结果计算主题度量,并将传统度量方式和主题度量结合进行模型训练和预测。实验结果显示,该方法相对传统的软件缺陷预测技术有高的准确性,并且可以在软件演化中保证模型相对稳定,可以适用于各种缺陷预测任务。...

Full description

Saved in:
Bibliographic Details
Published in计算机工程与科学 Vol. 38; no. 5; pp. 932 - 937
Main Author 张泽涛 叶立军 程伟 顾军
Format Journal Article
LanguageChinese
Published 上海航天控制技术研究所,上海 201109 2016
上海市空间智能控制技术重点实验室,上海 201109
Subjects
Online AccessGet full text
ISSN1007-130X
DOI10.3969/j.issn.1007-130X.2016.05.013

Cover

More Information
Summary:软件缺陷预测通常针对代码表面特征训练预测模型并对新样本进行预测,忽视了代码背后隐藏的不同技术方面和主题,从而导致预测不准确。针对这种问题,提出了一种基于主题模型的软件缺陷预测方法。将软件代码库视为不同技术方面和主题的集合,不同的主题或技术方面有不同的缺陷倾向。采用LDA主题模型对不同主题及其缺陷倾向进行建模,根据建模结果计算主题度量,并将传统度量方式和主题度量结合进行模型训练和预测。实验结果显示,该方法相对传统的软件缺陷预测技术有高的准确性,并且可以在软件演化中保证模型相对稳定,可以适用于各种缺陷预测任务。
Bibliography:ZHANG Ze-tao ,YE Li-jun ,CHENG Wei ,GU Jun (1. Shanghai Key Laboratory of Aerospce Intelligent Control Technology,Shanghai 201109; 2. Shanghai Insitute of Spaceflight Control Technology, Shanghai 201109, China)
43-1258/TP
Traditional models for defect prediction always consider the textual features of source codes, comments, etc, ignoring hidden topics such as technical aspects, business logics, etc. To solve these problems, we present a new topic-based defect prediction model. The software corpus is assumed to be composed by a collection of different topics and technical aspects which lead to different defect tendencies. A set of topic-based metrics are proposed. Then, the LDA topic model is adopted to gener- ate topics and the corresponding parameters, and the prediction model is trained by both topic metrics as well as some traditional metrics. Experimental results show that the proposed method outperforms tra- ditional defect prediction methods and can also ensure a stable model through the evolution of sof
ISSN:1007-130X
DOI:10.3969/j.issn.1007-130X.2016.05.013