Improved Mask-Based Neural Beamforming for Multichannel Speech Enhancement by Snapshot Matching Masking
In multichannel speech enhancement (SE), time-frequency (T-F) mask-based neural beamforming algorithms take advantage of deep neural networks to predict T-F masks that represent speech and noise dominance. The predicted masks are subsequently leveraged to estimate the speech and noise power spectral...
Saved in:
| Published in | Proceedings of the ... IEEE International Conference on Acoustics, Speech and Signal Processing (1998) pp. 1 - 5 |
|---|---|
| Main Authors | , , , |
| Format | Conference Proceeding |
| Language | English |
| Published |
IEEE
04.06.2023
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 2379-190X |
| DOI | 10.1109/ICASSP49357.2023.10096213 |
Cover
| Abstract | In multichannel speech enhancement (SE), time-frequency (T-F) mask-based neural beamforming algorithms take advantage of deep neural networks to predict T-F masks that represent speech and noise dominance. The predicted masks are subsequently leveraged to estimate the speech and noise power spectral density (PSD) matrices for computing the beamformer filter weights based on signal statistics. However, in the literature most networks are trained to estimate some pre-defined masks, e.g., the ideal binary mask (IBM) and ideal ratio mask (IRM) that lack direct connection to the PSD estimation. In this paper, we propose a new masking strategy to predict the Snapshot Matching Mask (SMM) that aims to minimize the distance between the predicted and the true signal snapshots, thereby estimating the PSD matrices in a more systematic way. Performance of SMM compared with existing IBM- and IRM-based PSD estimation for mask-based neural beamforming is presented on several datasets to demonstrate its effectiveness for the SE task. |
|---|---|
| AbstractList | In multichannel speech enhancement (SE), time-frequency (T-F) mask-based neural beamforming algorithms take advantage of deep neural networks to predict T-F masks that represent speech and noise dominance. The predicted masks are subsequently leveraged to estimate the speech and noise power spectral density (PSD) matrices for computing the beamformer filter weights based on signal statistics. However, in the literature most networks are trained to estimate some pre-defined masks, e.g., the ideal binary mask (IBM) and ideal ratio mask (IRM) that lack direct connection to the PSD estimation. In this paper, we propose a new masking strategy to predict the Snapshot Matching Mask (SMM) that aims to minimize the distance between the predicted and the true signal snapshots, thereby estimating the PSD matrices in a more systematic way. Performance of SMM compared with existing IBM- and IRM-based PSD estimation for mask-based neural beamforming is presented on several datasets to demonstrate its effectiveness for the SE task. |
| Author | Lee, Ching-Hua Jin, Hongxia Yang, Chouchang Shen, Yilin |
| Author_xml | – sequence: 1 givenname: Ching-Hua surname: Lee fullname: Lee, Ching-Hua organization: Samsung Research America – sequence: 2 givenname: Chouchang surname: Yang fullname: Yang, Chouchang organization: Samsung Research America – sequence: 3 givenname: Yilin surname: Shen fullname: Shen, Yilin organization: Samsung Research America – sequence: 4 givenname: Hongxia surname: Jin fullname: Jin, Hongxia organization: Samsung Research America |
| BookMark | eNo1kEFOwzAQRQ0CibZwAxbmACm2J7HjJa1aqNQCUkBiV02dSRNInChJkXp7UgGr92fxn_RnzC587YmxOymmUgp7v5o_JMlraCEyUyUUTKUQVisJZ2wsjYqlBmXMORspMDaQVnxcsXHXfQohYhPGI7ZfVU1bf1PKN9h9BTPshvhMhxZLPiOssrqtCr_nA_nmUPaFy9F7KnnSELmcL_xwO6rI93x35InHpsvrfrD1Lj8VT9qB1-wyw7Kjmz9O2Pty8TZ_CtYvj8OIdVDI2PYBhEqrNFXgHMQAoVMizhCsI4rRoN5laaS01NpJ1JFFszNRJFGBlqCjkGDCbn-9BRFtm7aosD1u_78CPwKcWl0 |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IH CBEJK RIE RIO |
| DOI | 10.1109/ICASSP49357.2023.10096213 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE/IET Electronic Library IEEE Proceedings Order Plans (POP) 1998-present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering |
| EISBN | 1728163277 9781728163277 |
| EISSN | 2379-190X |
| EndPage | 5 |
| ExternalDocumentID | 10096213 |
| Genre | orig-research |
| GroupedDBID | 23M 6IE 6IF 6IH 6IK 6IL 6IM 6IN AAJGR AAWTH ABLEC ACGFS ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP IPLJI M43 OCL RIE RIL RIO RNS |
| ID | FETCH-LOGICAL-i189t-34262dd23cc38334c208fa39cee8a7a6bfd526166c1a659a7b7551a23613654e3 |
| IEDL.DBID | RIE |
| IngestDate | Wed Aug 27 02:35:10 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i189t-34262dd23cc38334c208fa39cee8a7a6bfd526166c1a659a7b7551a23613654e3 |
| PageCount | 5 |
| ParticipantIDs | ieee_primary_10096213 |
| PublicationCentury | 2000 |
| PublicationDate | 2023-June-4 |
| PublicationDateYYYYMMDD | 2023-06-04 |
| PublicationDate_xml | – month: 06 year: 2023 text: 2023-June-4 day: 04 |
| PublicationDecade | 2020 |
| PublicationTitle | Proceedings of the ... IEEE International Conference on Acoustics, Speech and Signal Processing (1998) |
| PublicationTitleAbbrev | ICASSP |
| PublicationYear | 2023 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0008748 |
| Score | 2.2714012 |
| Snippet | In multichannel speech enhancement (SE), time-frequency (T-F) mask-based neural beamforming algorithms take advantage of deep neural networks to predict T-F... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 1 |
| SubjectTerms | Array signal processing Estimation neural beamforming Neural networks power spectral density Signal processing algorithms Simulation snapshot Speech enhancement Systematics Time-frequency analysis time-frequency mask |
| Title | Improved Mask-Based Neural Beamforming for Multichannel Speech Enhancement by Snapshot Matching Masking |
| URI | https://ieeexplore.ieee.org/document/10096213 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1bS8MwFA66B9EXbxPvRPC1tWvSJH10Y2MKjkEd7G2cXOpksxuue9Bfb5Ju3kDwKaW06SFpc76Tnu87CF0bEDZEU42AU50HVEIagMp5wGLQQhuSEO2zLXqsO6D3w2S4Iqt7LowxxiefmdAd-n_5eqaWbqvMfuEWcMeuRu0mF6wia30uu4JTsYWuViKaN3et2yzr05QkPHQlwsP1zT_KqHgv0tlFvfXzq-SRSbgsZajef0kz_tvAPVT_Iuzh_qcr2kcbpjhAO9-0Bg_RU7V9YDR-gMUkaFrvpbGT5oApbhp4ceDVXohtiz0r11GCCzPF2dwYNcbtYuxeEGcAlm84K2C-GM9K21vp0zF9t7ato0Gn_djqBqsqC8FzQ6RlQJwmvdYxUcpGq4SqOBI5kNSaLIADk7lObJjFmGoAS1LgkluUBU60hbCEGnKEasWsMMcIU2GjTYiBQOKEBC3aiJhdUlKIjIQolyeo7sZsNK-ENEbr4Tr94_wZ2nZT5zOz6Dmqla9Lc2ExQCkv_dx_ABXcsUQ |
| linkProvider | IEEE |
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1ba8IwFA7DwS4vuzl2XwZ7bVebpJfHKYrbVIQq-CYnl86hqzLrw_brl6TqLjDYU0ppwyFNc85Jvu87CN0qiHSKJipOSGXqUA6xAyINncAHGUlFGJEWbdEJmn36OGCDJVndcmGUUhZ8plxzac_y5VQszFaZ_sN1wO2bGrWbjFLKCrrWeuGNQhptoZuljObdQ-0-Sbo0Jix0TZFwd_X6j0Iq1o809lBnZUEBHxm7i5y74uOXOOO_TdxH5S_KHu6undEB2lDZIdr9pjZ4hJ6LDQQlcRvmY6eq_ZfERpwDJriq4NWEr_pBrFtsebmGFJypCU5mSokRrmcjM0WMAZi_4ySD2Xw0zXVvuQVk2m51W0b9Rr1XazrLOgvOSyWKc4cYVXopfSKEzlcJFb4XpUBibXIEIQQ8lUwnWkEgKhCwGEIe6jgLjGwLCRhV5BiVsmmmThCmkc43wQcCzEgJ6njDC_SiEoOnOHgpP0VlM2bDWSGlMVwN19kf96_RdrPXbg1bD52nc7RjPqPFadELVMrfFupSRwQ5v7Lz4BONsLSR |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=Proceedings+of+the+...+IEEE+International+Conference+on+Acoustics%2C+Speech+and+Signal+Processing+%281998%29&rft.atitle=Improved+Mask-Based+Neural+Beamforming+for+Multichannel+Speech+Enhancement+by+Snapshot+Matching+Masking&rft.au=Lee%2C+Ching-Hua&rft.au=Yang%2C+Chouchang&rft.au=Shen%2C+Yilin&rft.au=Jin%2C+Hongxia&rft.date=2023-06-04&rft.pub=IEEE&rft.eissn=2379-190X&rft.spage=1&rft.epage=5&rft_id=info:doi/10.1109%2FICASSP49357.2023.10096213&rft.externalDocID=10096213 |