A Closer Look at Class-Incremental Learning for Multi-Label Audio Classification
The main challenge in class-incremental learning (CIL) using deep learning models is catastrophic forgetting, which refers to the significant drop in performance on the previous tasks when the model is trained sequentially on the new task. In our previous work on CIL for multi-label audio classifica...
        Saved in:
      
    
          | Published in | IEEE Transactions on Audio Speech and Language Processing Vol. 33; pp. 1293 - 1306 | 
|---|---|
| Main Authors | , | 
| Format | Journal Article | 
| Language | English | 
| Published | 
            IEEE
    
        2025
     | 
| Subjects | |
| Online Access | Get full text | 
| ISSN | 2998-4173 2998-4173  | 
| DOI | 10.1109/TASLPRO.2025.3547233 | 
Cover
| Summary: | The main challenge in class-incremental learning (CIL) using deep learning models is catastrophic forgetting, which refers to the significant drop in performance on the previous tasks when the model is trained sequentially on the new task. In our previous work on CIL for multi-label audio classification, we used an independent learning (IndL) mechanism to mitigate forgetting in the classifier layer. In this work, we take a closer look at the forgetting that occurs in the intermediate layers of the CIL model. We find that earlier layers of the model are less prone to forgetting, but later layers, especially Batch Normalization (BN) layers, are biased toward new tasks. We replace the standard BN layers with Group Normalization and Continual Normalization layers to reduce forgetting. Based on these observations, we only update the layers where forgetting occurs. These simple modifications improve the overall performance of the model. Further, we analyze the effect of exemplars-a small number of samples of the previous tasks-to reduce forgetting while the CIL model learns the new task. We propose a simple but effective exemplar selection strategy from a multi-label dataset using simulated annealing. Experiments are performed on a dataset with 50 sound classes, with an initial classification task containing 30 base classes and 4 incremental phases of 5 classes each. After each phase, the system is tested for multi-label classification with the entire set of classes learned so far. The proposed method outperforms other methods, with an average F1-score of 41.8% and 36.3% over the five phases on random classes of AudioSet and FSD50K respectively, with minimal forgetting of classes in phase 0. | 
|---|---|
| ISSN: | 2998-4173 2998-4173  | 
| DOI: | 10.1109/TASLPRO.2025.3547233 |