DISD-Net: A Dynamic Interactive Network With Self-Distillation for Cross-Subject Multi-Modal Emotion Recognition

Multi-modal Emotion Recognition (MER) has demonstrated competitive performance in affective computing, owing to synthesizing information from diverse modalities. However, many existing approaches still face unresolved challenges, such as: (i) how to learn compact yet representative features from mul...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on multimedia Vol. 27; pp. 4643 - 4655
Main Authors	Cheng, Cheng, Liu, Wenzhe, Wang, Xinying, Feng, Lin, Jia, Ziyu
Format	Journal Article
Language	English
Published	IEEE 2025
Subjects	Adaptation models Brain modeling Computational modeling Correlation Domain adaptation (DA) dynamic interactive Electroencephalography Emotion recognition Feature extraction Knowledge engineering multi-modal Emotion Recognition (MER) Representation learning self-distillation Training
Online Access	Get full text
ISSN	1520-9210 1941-0077
DOI	10.1109/TMM.2025.3535344

Cover

More Information
Summary:	Multi-modal Emotion Recognition (MER) has demonstrated competitive performance in affective computing, owing to synthesizing information from diverse modalities. However, many existing approaches still face unresolved challenges, such as: (i) how to learn compact yet representative features from multi-modal data simultaneously and (ii) how to address differences among subjects and enhance the generalization of the emotion recognition model, given the diverse nature of individual biological signals. To this end, we propose a Dynamic Interactive Network with Self-Distillation (DISD-Net) for cross-subject MER. The DISD-Net incorporates a dynamin interactive module to capture the intra- and inter-modal interactions from multi-modal data. Additionally, to enhance compactness in modal representations, we leverage the soft labels generated by the DISD-Net model as supplemental training guidance. This involves incorporating self-distillation, aiming to transfer the knowledge that the DISD-Net model contains hard and soft labels to each modality. Finally, domain adaptation (DA) is seamlessly integrated into the dynamic interactive and self-distillation components, forming a unified framework to extract subject-invariant multi-modal emotional features. Experimental results indicate that the proposed model achieves a mean accuracy of 75.00% with a standard deviation of 7.68% for the DEAP dataset and a mean accuracy of 65.65% with a standard deviation of 5.08% for the SEED-IV dataset.
ISSN:	1520-9210 1941-0077
DOI:	10.1109/TMM.2025.3535344