Improved Mask-Based Neural Beamforming for Multichannel Speech Enhancement by Snapshot Matching Masking

In multichannel speech enhancement (SE), time-frequency (T-F) mask-based neural beamforming algorithms take advantage of deep neural networks to predict T-F masks that represent speech and noise dominance. The predicted masks are subsequently leveraged to estimate the speech and noise power spectral...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings of the ... IEEE International Conference on Acoustics, Speech and Signal Processing (1998) pp. 1 - 5
Main Authors	Lee, Ching-Hua, Yang, Chouchang, Shen, Yilin, Jin, Hongxia
Format	Conference Proceeding
Language	English
Published	IEEE 04.06.2023
Subjects	Array signal processing Estimation neural beamforming Neural networks power spectral density Signal processing algorithms Simulation snapshot Speech enhancement Systematics Time-frequency analysis time-frequency mask
Online Access	Get full text
ISSN	2379-190X
DOI	10.1109/ICASSP49357.2023.10096213

Cover

More Information
Summary:	In multichannel speech enhancement (SE), time-frequency (T-F) mask-based neural beamforming algorithms take advantage of deep neural networks to predict T-F masks that represent speech and noise dominance. The predicted masks are subsequently leveraged to estimate the speech and noise power spectral density (PSD) matrices for computing the beamformer filter weights based on signal statistics. However, in the literature most networks are trained to estimate some pre-defined masks, e.g., the ideal binary mask (IBM) and ideal ratio mask (IRM) that lack direct connection to the PSD estimation. In this paper, we propose a new masking strategy to predict the Snapshot Matching Mask (SMM) that aims to minimize the distance between the predicted and the true signal snapshots, thereby estimating the PSD matrices in a more systematic way. Performance of SMM compared with existing IBM- and IRM-based PSD estimation for mask-based neural beamforming is presented on several datasets to demonstrate its effectiveness for the SE task.
ISSN:	2379-190X
DOI:	10.1109/ICASSP49357.2023.10096213