ARM-stream: active recovery of miscategorizations in clustering-based data stream classifiers

The non-stationary feature-class interaction problem arises in dynamic data stream environments, where online classifiers avoid frequent label requests by relying on feature-class interaction assumptions to obtain class information from unlabeled data, even though these very assumptions may become i...

Full description

Saved in:
Bibliographic Details
Published inData mining and knowledge discovery Vol. 39; no. 5; p. 47
Main Authors Cavalcanti, Douglas Monteiro, Cerri, Ricardo, Faria, Elaine Ribeiro
Format Journal Article
LanguageEnglish
Published New York Springer US 01.09.2025
Springer Nature B.V
Subjects
Online AccessGet full text
ISSN1384-5810
1573-756X
DOI10.1007/s10618-025-01124-4

Cover

More Information
Summary:The non-stationary feature-class interaction problem arises in dynamic data stream environments, where online classifiers avoid frequent label requests by relying on feature-class interaction assumptions to obtain class information from unlabeled data, even though these very assumptions may become invalid as this interaction changes unexpectedly. Clustering-based data stream classifiers mitigate this by leveraging active learning strategies. However, because these strategies are integral to the update process, enhancing reliability within the classifier is typically limited to increasing labeling resources, diminishing the classifier’s ability to learn from unlabeled data. In this paper we tackle this problem by decoupling the error-handling concerns from the classifier into a new framework called ARM-Stream, which works as a fail-safe layer that aims to intervene only when the classifier’s update process would fail, otherwise preserving the classifier’s original label resource usage patterns. ARM-Stream’s architecture, defined through abstract modules, ensures easy integration across different classifiers and easy customization and extension of its modules. To the best of our knowledge, ARM-Stream is the first error recovery framework for clustering-based data stream classifiers. As a study case, we test the framework over three classifiers: MINAS, CDSC-AL, and ECHO, chosen for their differing update process. An off-the-shelf source code library for the framework is provided.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1384-5810
1573-756X
DOI:10.1007/s10618-025-01124-4