Evaluation of Multiclass Novelty Detection Algorithms for Data Streams

Data stream mining is an emergent research area that investigates knowledge extraction from large amounts of continuously generated data, produced by non-stationary distribution. Novelty detection, the ability to identify new or previously unknown situations, is a useful ability for learning systems...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on knowledge and data engineering Vol. 27; no. 11; pp. 2961 - 2973
Main Authors	Ribeiro de Faria, Elaine, Ribeiro Goncalves, Isabel, Gama, Joao, Carlos Ponce de Leon Ferreira Carvalho, Andre
Format	Journal Article
Language	English
Published	New York IEEE 01.11.2015 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Algorithms Confusion Consumer goods Context Data mining data streams Data transmission Decision support systems Electronic mail evaluation methodology Evolution Learning Mathematical model Measurement uncertainty Methodology novelty detection Performance evaluation Studies Time measurement Evaluation methodologies novelty detection data streams
Online Access	Get full text
ISSN	1041-4347 2326-3865 1558-2191 1558-2191
DOI	10.1109/TKDE.2015.2441713

Cover

More Information
Summary:	Data stream mining is an emergent research area that investigates knowledge extraction from large amounts of continuously generated data, produced by non-stationary distribution. Novelty detection, the ability to identify new or previously unknown situations, is a useful ability for learning systems, especially when dealing with data streams, where concepts may appear, disappear, or evolve overtime. There are several studies currently investigating the application of novelty detection techniques in data streams. However, there is no consensus regarding how to evaluate the performance of these techniques. In this study, we propose a new evaluation methodology for multiclass novelty detection in data streams able to deal with: i) unsupervised learning, which generates novelty patterns without an association with the true classes, where one class may be composed of a novelty set, ii) confusion matrix that increases overtime, iii) confusion matrix with a column representing unknown examples, i.e., those not explained by the model, and iv) representation of the evaluation measures overtime. We propose a new methodology to associate the novelty patterns detected by the algorithm, in an unsupervised fashion, with the true classes. Finally, we evaluate the performance of the proposed methodology through the use of known novelty detection algorithms with artificial and real data sets.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	1041-4347 2326-3865 1558-2191 1558-2191
DOI:	10.1109/TKDE.2015.2441713