SSM2Mel: State Space Model to Reconstruct Mel Spectrogram from the EEG

Decoding speech from brain signals is a challenging research problem that holds significant importance for studying speech processing in the brain. Although breakthroughs have been made in reconstructing the mel spectrograms of audio stimuli perceived by subjects at the word or letter level using no...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings of the ... IEEE International Conference on Acoustics, Speech and Signal Processing (1998) pp. 1 - 5
Main Authors	Fan, Cunhang, Zhang, Sheng, Zhang, Jingjing, Pan, Zexu, Lv, Zhao
Format	Conference Proceeding
Language	English
Published	IEEE 06.04.2025
Subjects	Brain modeling Data mining Decoding Electrocorticography(EEG) Electroencephalography Feature extraction Image reconstruction imagined speech mel spectrogram Modulation multi-head self-attention Neuroscience Spectrogram Speech processing State Space Model
Online Access	Get full text
ISSN	2379-190X
DOI	10.1109/ICASSP49660.2025.10888785

Cover

More Information
Summary:	Decoding speech from brain signals is a challenging research problem that holds significant importance for studying speech processing in the brain. Although breakthroughs have been made in reconstructing the mel spectrograms of audio stimuli perceived by subjects at the word or letter level using non-invasive electroencephalography (EEG), there is still a critical gap in precisely reconstructing continuous speech features, especially at the minute level. To address this issue, this paper proposes a State Space Model (SSM) to reconstruct the mel spectrogram of continuous speech from EEG, named SSM2Mel. This model introduces a novel Mamba module to effectively model the long sequence of EEG signals for imagined speech. In the SSM2Mel model, the S4-UNet structure is used to enhance the extraction of local features of EEG signals, and the Embedding Strength Modulator (ESM) module is used to incorporate subject-specific information. Experimental results show that our model achieves a Pearson correlation of 0.069 on the SparrKULee dataset, which is a 38% improvement over the previous baseline.
ISSN:	2379-190X
DOI:	10.1109/ICASSP49660.2025.10888785