A Closer Look at Class-Incremental Learning for Multi-Label Audio Classification

The main challenge in class-incremental learning (CIL) using deep learning models is catastrophic forgetting, which refers to the significant drop in performance on the previous tasks when the model is trained sequentially on the new task. In our previous work on CIL for multi-label audio classifica...

Full description

Saved in:
Bibliographic Details
Published inIEEE Transactions on Audio Speech and Language Processing Vol. 33; pp. 1293 - 1306
Main Authors Mulimani, Manjunath, Mesaros, Annamaria
Format Journal Article
LanguageEnglish
Published IEEE 2025
Subjects
Online AccessGet full text
ISSN2998-4173
2998-4173
DOI10.1109/TASLPRO.2025.3547233

Cover

More Information
Summary:The main challenge in class-incremental learning (CIL) using deep learning models is catastrophic forgetting, which refers to the significant drop in performance on the previous tasks when the model is trained sequentially on the new task. In our previous work on CIL for multi-label audio classification, we used an independent learning (IndL) mechanism to mitigate forgetting in the classifier layer. In this work, we take a closer look at the forgetting that occurs in the intermediate layers of the CIL model. We find that earlier layers of the model are less prone to forgetting, but later layers, especially Batch Normalization (BN) layers, are biased toward new tasks. We replace the standard BN layers with Group Normalization and Continual Normalization layers to reduce forgetting. Based on these observations, we only update the layers where forgetting occurs. These simple modifications improve the overall performance of the model. Further, we analyze the effect of exemplars-a small number of samples of the previous tasks-to reduce forgetting while the CIL model learns the new task. We propose a simple but effective exemplar selection strategy from a multi-label dataset using simulated annealing. Experiments are performed on a dataset with 50 sound classes, with an initial classification task containing 30 base classes and 4 incremental phases of 5 classes each. After each phase, the system is tested for multi-label classification with the entire set of classes learned so far. The proposed method outperforms other methods, with an average F1-score of 41.8% and 36.3% over the five phases on random classes of AudioSet and FSD50K respectively, with minimal forgetting of classes in phase 0.
ISSN:2998-4173
2998-4173
DOI:10.1109/TASLPRO.2025.3547233