Cross-Attention-Guided Wavenet for Mel Spectrogram Reconstruction in The ICASSP 2024 Auditory EEG Challenge
This paper provides an overview of our submission to Task 2 of the Auditory EEG Challenge at ICASSP 2024 Signal Processing Grand Challenge (SPGC). We introduce a novel approach, employing a cross-attention-guided WaveNet with a coarse-to-fine generation strategy, aimed at enhancing the detailed reco...
        Saved in:
      
    
          | Published in | 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW) pp. 7 - 8 | 
|---|---|
| Main Authors | , , , , | 
| Format | Conference Proceeding | 
| Language | English | 
| Published | 
            IEEE
    
        14.04.2024
     | 
| Subjects | |
| Online Access | Get full text | 
| DOI | 10.1109/ICASSPW62465.2024.10627006 | 
Cover
| Summary: | This paper provides an overview of our submission to Task 2 of the Auditory EEG Challenge at ICASSP 2024 Signal Processing Grand Challenge (SPGC). We introduce a novel approach, employing a cross-attention-guided WaveNet with a coarse-to-fine generation strategy, aimed at enhancing the detailed reconstruction of Mel spectrograms from time-domain EEG. Specifically, the model utilizes WaveNet to sequentially reconstruct the envelope, 10-band Mel, 80-band Mel, and magnitude from coarse to fine granular levels. To bridge the gap between different modalities, we introduce a cross-attention mechanism, exploring correlations across modalities. A combined loss function is employed to refine the reconstruction performance. Notably, we achieved Pearson correlation values of 0.0651 ± 0.0153 for the validation set and 0.0413 ± 0.0169 for the heldout-subjects test set, securing the second position in the competition. We release the training code for our model online 1 . | 
|---|---|
| DOI: | 10.1109/ICASSPW62465.2024.10627006 |