Music removal by convolutional denoising autoencoder in speech recognition

Music embedding often causes significant performance degradation in automatic speech recognition (ASR). This paper proposes a music-removal method based on denoising autoencoder (DAE) that learns and removes music from music-embedded speech signals. Particularly, we focus on convolutional denoising...

Full description

Saved in:
Bibliographic Details
Published in2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) pp. 338 - 341
Main Authors Mengyuan Zhao, Dong Wang, Zhiyong Zhang, Xuewei Zhang
Format Conference Proceeding
LanguageEnglish
Published Asia-Pacific Signal and Information Processing Association 01.12.2015
Subjects
Online AccessGet full text
DOI10.1109/APSIPA.2015.7415289

Cover

More Information
Summary:Music embedding often causes significant performance degradation in automatic speech recognition (ASR). This paper proposes a music-removal method based on denoising autoencoder (DAE) that learns and removes music from music-embedded speech signals. Particularly, we focus on convolutional denoising autoencoder (CDAE) that can learn local musical patterns by convolutional feature extraction. Our study shows that the CDAE model can learn patterns of music in different genres and the CDAE-based music removal offers significant performance improvement for ASR. Additionally, we demonstrate that this music-removal approach is largely language independent, which means that a model trained with data in one language can be applied to remove music from speech in another language, and models trained with multilingual data may lead to better performance.
DOI:10.1109/APSIPA.2015.7415289