Improved Mask-Based Neural Beamforming for Multichannel Speech Enhancement by Snapshot Matching Masking

In multichannel speech enhancement (SE), time-frequency (T-F) mask-based neural beamforming algorithms take advantage of deep neural networks to predict T-F masks that represent speech and noise dominance. The predicted masks are subsequently leveraged to estimate the speech and noise power spectral...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings of the ... IEEE International Conference on Acoustics, Speech and Signal Processing (1998) pp. 1 - 5
Main Authors	Lee, Ching-Hua, Yang, Chouchang, Shen, Yilin, Jin, Hongxia
Format	Conference Proceeding
Language	English
Published	IEEE 04.06.2023
Subjects	Array signal processing Estimation neural beamforming Neural networks power spectral density Signal processing algorithms Simulation snapshot Speech enhancement Systematics Time-frequency analysis time-frequency mask
Online Access	Get full text
ISSN	2379-190X
DOI	10.1109/ICASSP49357.2023.10096213

Cover

Abstract	In multichannel speech enhancement (SE), time-frequency (T-F) mask-based neural beamforming algorithms take advantage of deep neural networks to predict T-F masks that represent speech and noise dominance. The predicted masks are subsequently leveraged to estimate the speech and noise power spectral density (PSD) matrices for computing the beamformer filter weights based on signal statistics. However, in the literature most networks are trained to estimate some pre-defined masks, e.g., the ideal binary mask (IBM) and ideal ratio mask (IRM) that lack direct connection to the PSD estimation. In this paper, we propose a new masking strategy to predict the Snapshot Matching Mask (SMM) that aims to minimize the distance between the predicted and the true signal snapshots, thereby estimating the PSD matrices in a more systematic way. Performance of SMM compared with existing IBM- and IRM-based PSD estimation for mask-based neural beamforming is presented on several datasets to demonstrate its effectiveness for the SE task.
AbstractList	In multichannel speech enhancement (SE), time-frequency (T-F) mask-based neural beamforming algorithms take advantage of deep neural networks to predict T-F masks that represent speech and noise dominance. The predicted masks are subsequently leveraged to estimate the speech and noise power spectral density (PSD) matrices for computing the beamformer filter weights based on signal statistics. However, in the literature most networks are trained to estimate some pre-defined masks, e.g., the ideal binary mask (IBM) and ideal ratio mask (IRM) that lack direct connection to the PSD estimation. In this paper, we propose a new masking strategy to predict the Snapshot Matching Mask (SMM) that aims to minimize the distance between the predicted and the true signal snapshots, thereby estimating the PSD matrices in a more systematic way. Performance of SMM compared with existing IBM- and IRM-based PSD estimation for mask-based neural beamforming is presented on several datasets to demonstrate its effectiveness for the SE task.
Author	Lee, Ching-Hua Jin, Hongxia Yang, Chouchang Shen, Yilin
Author_xml	– sequence: 1 givenname: Ching-Hua surname: Lee fullname: Lee, Ching-Hua organization: Samsung Research America – sequence: 2 givenname: Chouchang surname: Yang fullname: Yang, Chouchang organization: Samsung Research America – sequence: 3 givenname: Yilin surname: Shen fullname: Shen, Yilin organization: Samsung Research America – sequence: 4 givenname: Hongxia surname: Jin fullname: Jin, Hongxia organization: Samsung Research America
BookMark	eNo1kEFOwzAQRQ0CibZwAxbmACm2J7HjJa1aqNQCUkBiV02dSRNInChJkXp7UgGr92fxn_RnzC587YmxOymmUgp7v5o_JMlraCEyUyUUTKUQVisJZ2wsjYqlBmXMORspMDaQVnxcsXHXfQohYhPGI7ZfVU1bf1PKN9h9BTPshvhMhxZLPiOssrqtCr_nA_nmUPaFy9F7KnnSELmcL_xwO6rI93x35InHpsvrfrD1Lj8VT9qB1-wyw7Kjmz9O2Pty8TZ_CtYvj8OIdVDI2PYBhEqrNFXgHMQAoVMizhCsI4rRoN5laaS01NpJ1JFFszNRJFGBlqCjkGDCbn-9BRFtm7aosD1u_78CPwKcWl0
ContentType	Conference Proceeding
DBID	6IE 6IH CBEJK RIE RIO
DOI	10.1109/ICASSP49357.2023.10096213
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE/IET Electronic Library IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Engineering
EISBN	1728163277 9781728163277
EISSN	2379-190X
EndPage	5
ExternalDocumentID	10096213
Genre	orig-research
GroupedDBID	23M 6IE 6IF 6IH 6IK 6IL 6IM 6IN AAJGR AAWTH ABLEC ACGFS ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP IPLJI M43 OCL RIE RIL RIO RNS
ID	FETCH-LOGICAL-i189t-34262dd23cc38334c208fa39cee8a7a6bfd526166c1a659a7b7551a23613654e3
IEDL.DBID	RIE
IngestDate	Wed Aug 27 02:35:10 EDT 2025
IsPeerReviewed	false
IsScholarly	true
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i189t-34262dd23cc38334c208fa39cee8a7a6bfd526166c1a659a7b7551a23613654e3
PageCount	5
ParticipantIDs	ieee_primary_10096213
PublicationCentury	2000
PublicationDate	2023-June-4
PublicationDateYYYYMMDD	2023-06-04
PublicationDate_xml	– month: 06 year: 2023 text: 2023-June-4 day: 04
PublicationDecade	2020
PublicationTitle	Proceedings of the ... IEEE International Conference on Acoustics, Speech and Signal Processing (1998)
PublicationTitleAbbrev	ICASSP
PublicationYear	2023
Publisher	IEEE
Publisher_xml	– name: IEEE
SSID	ssj0008748
Score	2.2714012
Snippet	In multichannel speech enhancement (SE), time-frequency (T-F) mask-based neural beamforming algorithms take advantage of deep neural networks to predict T-F...
SourceID	ieee
SourceType	Publisher
StartPage	1
SubjectTerms	Array signal processing Estimation neural beamforming Neural networks power spectral density Signal processing algorithms Simulation snapshot Speech enhancement Systematics Time-frequency analysis time-frequency mask
Title	Improved Mask-Based Neural Beamforming for Multichannel Speech Enhancement by Snapshot Matching Masking
URI	https://ieeexplore.ieee.org/document/10096213
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1bS8MwFA66B9EXbxPvRPC1tWvSJH10Y2MKjkEd7G2cXOpksxuue9Bfb5Ju3kDwKaW06SFpc76Tnu87CF0bEDZEU42AU50HVEIagMp5wGLQQhuSEO2zLXqsO6D3w2S4Iqt7LowxxiefmdAd-n_5eqaWbqvMfuEWcMeuRu0mF6wia30uu4JTsYWuViKaN3et2yzr05QkPHQlwsP1zT_KqHgv0tlFvfXzq-SRSbgsZajef0kz_tvAPVT_Iuzh_qcr2kcbpjhAO9-0Bg_RU7V9YDR-gMUkaFrvpbGT5oApbhp4ceDVXohtiz0r11GCCzPF2dwYNcbtYuxeEGcAlm84K2C-GM9K21vp0zF9t7ato0Gn_djqBqsqC8FzQ6RlQJwmvdYxUcpGq4SqOBI5kNSaLIADk7lObJjFmGoAS1LgkluUBU60hbCEGnKEasWsMMcIU2GjTYiBQOKEBC3aiJhdUlKIjIQolyeo7sZsNK-ENEbr4Tr94_wZ2nZT5zOz6Dmqla9Lc2ExQCkv_dx_ABXcsUQ
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1ba8IwFA7DwS4vuzl2XwZ7bVebpJfHKYrbVIQq-CYnl86hqzLrw_brl6TqLjDYU0ppwyFNc85Jvu87CN0qiHSKJipOSGXqUA6xAyINncAHGUlFGJEWbdEJmn36OGCDJVndcmGUUhZ8plxzac_y5VQszFaZ_sN1wO2bGrWbjFLKCrrWeuGNQhptoZuljObdQ-0-Sbo0Jix0TZFwd_X6j0Iq1o809lBnZUEBHxm7i5y74uOXOOO_TdxH5S_KHu6undEB2lDZIdr9pjZ4hJ6LDQQlcRvmY6eq_ZfERpwDJriq4NWEr_pBrFtsebmGFJypCU5mSokRrmcjM0WMAZi_4ySD2Xw0zXVvuQVk2m51W0b9Rr1XazrLOgvOSyWKc4cYVXopfSKEzlcJFb4XpUBibXIEIQQ8lUwnWkEgKhCwGEIe6jgLjGwLCRhV5BiVsmmmThCmkc43wQcCzEgJ6njDC_SiEoOnOHgpP0VlM2bDWSGlMVwN19kf96_RdrPXbg1bD52nc7RjPqPFadELVMrfFupSRwQ5v7Lz4BONsLSR
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=Proceedings+of+the+...+IEEE+International+Conference+on+Acoustics%2C+Speech+and+Signal+Processing+%281998%29&rft.atitle=Improved+Mask-Based+Neural+Beamforming+for+Multichannel+Speech+Enhancement+by+Snapshot+Matching+Masking&rft.au=Lee%2C+Ching-Hua&rft.au=Yang%2C+Chouchang&rft.au=Shen%2C+Yilin&rft.au=Jin%2C+Hongxia&rft.date=2023-06-04&rft.pub=IEEE&rft.eissn=2379-190X&rft.spage=1&rft.epage=5&rft_id=info:doi/10.1109%2FICASSP49357.2023.10096213&rft.externalDocID=10096213