Cross-Attention-Guided Wavenet for Mel Spectrogram Reconstruction in The ICASSP 2024 Auditory EEG Challenge

This paper provides an overview of our submission to Task 2 of the Auditory EEG Challenge at ICASSP 2024 Signal Processing Grand Challenge (SPGC). We introduce a novel approach, employing a cross-attention-guided WaveNet with a coarse-to-fine generation strategy, aimed at enhancing the detailed reco...

Full description

Saved in:
Bibliographic Details
Published in2024 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW) pp. 7 - 8
Main Authors Fang, Yuan, Li, Hao, Zhang, Xueliang, Chen, Fei, Gao, Guanglai
Format Conference Proceeding
LanguageEnglish
Published IEEE 14.04.2024
Subjects
Online AccessGet full text
DOI10.1109/ICASSPW62465.2024.10627006

Cover

More Information
Summary:This paper provides an overview of our submission to Task 2 of the Auditory EEG Challenge at ICASSP 2024 Signal Processing Grand Challenge (SPGC). We introduce a novel approach, employing a cross-attention-guided WaveNet with a coarse-to-fine generation strategy, aimed at enhancing the detailed reconstruction of Mel spectrograms from time-domain EEG. Specifically, the model utilizes WaveNet to sequentially reconstruct the envelope, 10-band Mel, 80-band Mel, and magnitude from coarse to fine granular levels. To bridge the gap between different modalities, we introduce a cross-attention mechanism, exploring correlations across modalities. A combined loss function is employed to refine the reconstruction performance. Notably, we achieved Pearson correlation values of 0.0651 ± 0.0153 for the validation set and 0.0413 ± 0.0169 for the heldout-subjects test set, securing the second position in the competition. We release the training code for our model online 1 .
DOI:10.1109/ICASSPW62465.2024.10627006