Active defect discovery: A human-in-the-loop learning method

Unsupervised defect detection methods are applied to an unlabeled dataset by producing a ranked list based on defect scores. Unfortunately, many of the top-ranked instances by unsupervised algorithms are not defects, which leads to high false-positive rates. Active Defect Discovery (ADD) is proposed...

Full description

Saved in:

Bibliographic Details
Published in	IIE transactions Vol. ahead-of-print; no. ahead-of-print; pp. 1 - 14
Main Authors	Shen, Bo, Kong, Zhenyu (James)
Format	Journal Article
Language	English
Published	Abingdon Taylor & Francis 02.06.2024 Taylor & Francis Ltd
Subjects	active defect discovery Algorithms Datasets Defects Feature extraction Isolation forest Labeling measurement feedback online gradient descent Sparsity
Online Access	Get full text
ISSN	2472-5854 2472-5862
DOI	10.1080/24725854.2023.2224854

Cover

More Information
Summary:	Unsupervised defect detection methods are applied to an unlabeled dataset by producing a ranked list based on defect scores. Unfortunately, many of the top-ranked instances by unsupervised algorithms are not defects, which leads to high false-positive rates. Active Defect Discovery (ADD) is proposed to overcome this deficiency, which sequentially selects instances to get the labeling information (defects or not). However, labeling is often costly. Therefore, balancing detection accuracy and labeling cost is essential. Along this line, this article proposes a novel ADD method to achieve the goal. Our approach is based on the state-of-the-art unsupervised defect detection method, namely, Isolation Forest, as the baseline defect detector to extract features. Thereafter, the sparsity of the extracted features is utilized to adjust the defect detector so that it can focus on more important features for defect detection. To enforce the sparsity of the features and subsequent improvement of the detection accuracy, a new algorithm based on online gradient descent, namely, Sparse Approximated Linear Defect Discovery (SALDD), is proposed with its theoretical Regret analysis. Extensive experiments are conducted on real-world datasets including healthcare, manufacturing, security, etc. The performance demonstrates that the proposed algorithm significantly outperforms the state-of-the-art algorithms for defect detection.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2472-5854 2472-5862
DOI:	10.1080/24725854.2023.2224854