DISD-Net: A Dynamic Interactive Network With Self-Distillation for Cross-Subject Multi-Modal Emotion Recognition

Multi-modal Emotion Recognition (MER) has demonstrated competitive performance in affective computing, owing to synthesizing information from diverse modalities. However, many existing approaches still face unresolved challenges, such as: (i) how to learn compact yet representative features from mul...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on multimedia Vol. 27; pp. 4643 - 4655
Main Authors Cheng, Cheng, Liu, Wenzhe, Wang, Xinying, Feng, Lin, Jia, Ziyu
Format Journal Article
LanguageEnglish
Published IEEE 2025
Subjects
Online AccessGet full text
ISSN1520-9210
1941-0077
DOI10.1109/TMM.2025.3535344

Cover

More Information
Summary:Multi-modal Emotion Recognition (MER) has demonstrated competitive performance in affective computing, owing to synthesizing information from diverse modalities. However, many existing approaches still face unresolved challenges, such as: (i) how to learn compact yet representative features from multi-modal data simultaneously and (ii) how to address differences among subjects and enhance the generalization of the emotion recognition model, given the diverse nature of individual biological signals. To this end, we propose a Dynamic Interactive Network with Self-Distillation (DISD-Net) for cross-subject MER. The DISD-Net incorporates a dynamin interactive module to capture the intra- and inter-modal interactions from multi-modal data. Additionally, to enhance compactness in modal representations, we leverage the soft labels generated by the DISD-Net model as supplemental training guidance. This involves incorporating self-distillation, aiming to transfer the knowledge that the DISD-Net model contains hard and soft labels to each modality. Finally, domain adaptation (DA) is seamlessly integrated into the dynamic interactive and self-distillation components, forming a unified framework to extract subject-invariant multi-modal emotional features. Experimental results indicate that the proposed model achieves a mean accuracy of 75.00% with a standard deviation of 7.68% for the DEAP dataset and a mean accuracy of 65.65% with a standard deviation of 5.08% for the SEED-IV dataset.
ISSN:1520-9210
1941-0077
DOI:10.1109/TMM.2025.3535344