Audio-visual speech enhancement using deep neural networks
This paper proposes a novel framework that integrates audio and visual information for speech enhancement. Most speech enhancement approaches consider audio features only to design filters or transfer functions to convert noisy speech signals to clean ones. Visual data, which provide useful compleme...
Saved in:
| Published in | 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) pp. 1 - 6 |
|---|---|
| Main Authors | , , , , , , |
| Format | Conference Proceeding |
| Language | English |
| Published |
Asia Pacific Signal and Information Processing Association
01.12.2016
|
| Subjects | |
| Online Access | Get full text |
| DOI | 10.1109/APSIPA.2016.7820732 |
Cover
| Abstract | This paper proposes a novel framework that integrates audio and visual information for speech enhancement. Most speech enhancement approaches consider audio features only to design filters or transfer functions to convert noisy speech signals to clean ones. Visual data, which provide useful complementary information to audio data, have been integrated with audio data in many speech-related approaches to attain more effective speech processing performance. This paper presents our investigation into the use of the visual features of the motion of lips as additional visual information to improve the speech enhancement capability of deep neural network (DNN) speech enhancement performance. The experimental results show that the performance of DNN with audio-visual inputs exceeds that of DNN with audio inputs only in four standardized objective evaluations, thereby confirming the effectiveness of the inclusion of visual information into an audio-only speech enhancement framework. |
|---|---|
| AbstractList | This paper proposes a novel framework that integrates audio and visual information for speech enhancement. Most speech enhancement approaches consider audio features only to design filters or transfer functions to convert noisy speech signals to clean ones. Visual data, which provide useful complementary information to audio data, have been integrated with audio data in many speech-related approaches to attain more effective speech processing performance. This paper presents our investigation into the use of the visual features of the motion of lips as additional visual information to improve the speech enhancement capability of deep neural network (DNN) speech enhancement performance. The experimental results show that the performance of DNN with audio-visual inputs exceeds that of DNN with audio inputs only in four standardized objective evaluations, thereby confirming the effectiveness of the inclusion of visual information into an audio-only speech enhancement framework. |
| Author | Jen-Cheng Hou Ying-Hui Lai Hsiu-Wen Chang Syu-Siang Wang Hsin-Min Wang Yu Tsao Jen-Chun Lin |
| Author_xml | – sequence: 1 surname: Jen-Cheng Hou fullname: Jen-Cheng Hou email: coolkiu@citi.sinica.edu.tw organization: Res. Center for Inf. Technol. Innovation, Taipei, Taiwan – sequence: 2 surname: Syu-Siang Wang fullname: Syu-Siang Wang organization: Res. Center for Inf. Technol. Innovation, Taipei, Taiwan – sequence: 3 surname: Ying-Hui Lai fullname: Ying-Hui Lai email: yhlai@ee.yzu.edu.tw organization: Dept. of Electr. Eng., Yuan Ze Univ., Taoyuan, Taiwan – sequence: 4 surname: Jen-Chun Lin fullname: Jen-Chun Lin organization: Inst. of Inf. Sci., Taipei, Taiwan – sequence: 5 surname: Yu Tsao fullname: Yu Tsao organization: Res. Center for Inf. Technol. Innovation, Taipei, Taiwan – sequence: 6 surname: Hsiu-Wen Chang fullname: Hsiu-Wen Chang email: hsiuwen@mmc.edu.tw organization: Dept. of Audiology & Speech Language Pathology, Mackay Med. Coll., Taiwan – sequence: 7 surname: Hsin-Min Wang fullname: Hsin-Min Wang email: whm@iis.sinica.edu.tw organization: Inst. of Inf. Sci., Taipei, Taiwan |
| BookMark | eNotj8tqAjEUQCPUhVq_wE1-YKa5SZqHu0H6EASFupdMcqcGNTNMZlr69y3U1dkcDpw5eUhtQkJWwEoAZp-qw8f2UJWcgSq14UwLPiFzawxIrQw3M7KuxhDb4ivm0V1p7hD9mWI6u-TxhmmgY47pkwbEjiYc-z8p4fDd9pf8SKaNu2Zc3rkgx9eX4-a92O3ftptqV0TLhsIyKYw2NhjdNKq2SkLQGKBuUD0LxX1gQTqmhLdeaw9eMwQEjkHKGoISC7L6z0ZEPHV9vLn-53S_Eb_xAUSe |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/APSIPA.2016.7820732 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| EISBN | 9881476828 9789881476821 |
| EndPage | 6 |
| ExternalDocumentID | 7820732 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IL CBEJK RIE RIL |
| ID | FETCH-LOGICAL-i90t-90438789d87ff6b9641d7ed1bfe65362cd0d4a063c9c77c1c70e1e12ed44b1d63 |
| IEDL.DBID | RIE |
| IngestDate | Thu Jun 29 18:38:22 EDT 2023 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i90t-90438789d87ff6b9641d7ed1bfe65362cd0d4a063c9c77c1c70e1e12ed44b1d63 |
| PageCount | 6 |
| ParticipantIDs | ieee_primary_7820732 |
| PublicationCentury | 2000 |
| PublicationDate | 2016-Dec. |
| PublicationDateYYYYMMDD | 2016-12-01 |
| PublicationDate_xml | – month: 12 year: 2016 text: 2016-Dec. |
| PublicationDecade | 2010 |
| PublicationTitle | 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) |
| PublicationTitleAbbrev | APSIPA |
| PublicationYear | 2016 |
| Publisher | Asia Pacific Signal and Information Processing Association |
| Publisher_xml | – name: Asia Pacific Signal and Information Processing Association |
| Score | 1.771901 |
| Snippet | This paper proposes a novel framework that integrates audio and visual information for speech enhancement. Most speech enhancement approaches consider audio... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 1 |
| SubjectTerms | Feature extraction Noise measurement Signal to noise ratio Speech Speech enhancement Training Visualization |
| Title | Audio-visual speech enhancement using deep neural networks |
| URI | https://ieeexplore.ieee.org/document/7820732 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LSwMxEA61J08qrfgmB49mu9ls8_BWxFKFSsEKvZVNMrFF2Ra768Ffb7K7VhQP3kIIecxAviHzzReELmXinHHKEqmFISm1knhcZ4SFXHuqnUh4KE4eP_DRU3o_689a6GpbCwMAFfkMotCscvl2ZcrwVNYL2m6C-Qt3R0he12o1QkI0Vr3B5PFuMghsLR41I398mVIhxnAPjb_WqokiL1FZ6Mh8_JJh_O9m9lH3uzYPT7aoc4BakHfQ9aC0yxV5X27K7BVv1gBmgSFfBJeGaXCgtz9jC7DGQcHSD8pr_vemi6bD2-nNiDS_IpCliguiQupOSGWlcI5rxb19BViqHfC-RyNjY5tmPvAwyghhqBExUKAJ2DTV1HJ2iNr5KocjhPsZp9Z5zFcUUh0iReUSlimbOcaskceoE449X9e6F_PmxCd_d5-i3WD6mupxhtrFWwnnHrALfVF56hMHPZgK |
| linkProvider | IEEE |
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFA5jHvSksom_7cGj6Zo2TRpvQ5RNtzFwwm6jSV7cULrhWg_-9SZtnSgevIUQ8utBvkfe976H0GUSGqOM0DiRXGFKdIItrkc4crF2Kg0PmUtOHo5Y74neT-NpA11tcmEAoCSfge-aZSxfL1Xhvso6TtuNR_bB3YoppXGVrVVLCZFAdLrjx_646_hazK_H_iiaUmLG3S4afq1WUUVe_CKXvvr4JcT43-3sofZ3dp433uDOPmpA1kLX3UIvlvh9sS7SV2-9AlBzD7K5M6qbxnME92dPA6w8p2FpB2UVA3zdRpO728lND9d1EfBCBDkWLnjHE6ETbgyTgtkb5qCJNMBii0dKB5qm1vVQQnGuiOIBECAhaEol0Sw6QM1smcEh8uKUEW0s6gsCVDpfUZgwSoVOTRRplRyhljv2bFUpX8zqEx__3X2BtnuT4WA26I8eTtCOM0NF_DhFzfytgDML37k8L632CS5Mm1c |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2016+Asia-Pacific+Signal+and+Information+Processing+Association+Annual+Summit+and+Conference+%28APSIPA%29&rft.atitle=Audio-visual+speech+enhancement+using+deep+neural+networks&rft.au=Jen-Cheng+Hou&rft.au=Syu-Siang+Wang&rft.au=Ying-Hui+Lai&rft.au=Jen-Chun+Lin&rft.date=2016-12-01&rft.pub=Asia+Pacific+Signal+and+Information+Processing+Association&rft.spage=1&rft.epage=6&rft_id=info:doi/10.1109%2FAPSIPA.2016.7820732&rft.externalDocID=7820732 |