Using deformable templates to infer visual speech dynamics

The visual image of a talker provides information complementary to the acoustic speech waveform, and enables improved recognition accuracy, especially in environments corrupted by high acoustic noise or multiple talkers. Because most of the phonologically relevant visual information is from the mout...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers Vol. 1; pp. 578 - 582 vol.1
Main Authors	Hennecke, M.E., Prasad, K.V., Stork, D.G.
Format	Conference Proceeding
Language	English
Published	IEEE Comput. Soc. Press 1994
Subjects	Acoustic noise Acoustic waves Data mining Image recognition Lighting Lips Mouth Noise robustness Speech enhancement Speech recognition
Online Access	Get full text
ISBN	0818664053 9780818664052
ISSN	1058-6393
DOI	10.1109/ACSSC.1994.471518

Cover

Abstract	The visual image of a talker provides information complementary to the acoustic speech waveform, and enables improved recognition accuracy, especially in environments corrupted by high acoustic noise or multiple talkers. Because most of the phonologically relevant visual information is from the mouth and lips, it is important to infer accurately and robustly their dynamics; moreover it is desirable to extract this information without the use of invasive markers or patterned illumination. We describe the use of deformable templates for speechreading, in order to infer the dynamics of lip contours throughout an image sequence. Template computations can be done relatively quickly and the resulting small number of shape description parameters are quite robust to visual noise and variations in illumination. Such templates delineate the inside of the mouth, so that the teeth and the tongue can also be found.< >
AbstractList	The visual image of a talker provides information complementary to the acoustic speech waveform, and enables improved recognition accuracy, especially in environments corrupted by high acoustic noise or multiple talkers. Because most of the phonologically relevant visual information is from the mouth and lips, it is important to infer accurately and robustly their dynamics; moreover it is desirable to extract this information without the use of invasive markers or patterned illumination. We describe the use of deformable templates for speechreading, in order to infer the dynamics of lip contours throughout an image sequence. Template computations can be done relatively quickly and the resulting small number of shape description parameters are quite robust to visual noise and variations in illumination. Such templates delineate the inside of the mouth, so that the teeth and the tongue can also be found.< >
Author	Hennecke, M.E. Stork, D.G. Prasad, K.V.
Author_xml	– sequence: 1 givenname: M.E. surname: Hennecke fullname: Hennecke, M.E. organization: Dept. of Electr. Eng., Stanford Univ., CA, USA – sequence: 2 givenname: K.V. surname: Prasad fullname: Prasad, K.V. organization: Dept. of Electr. Eng., Stanford Univ., CA, USA – sequence: 3 givenname: D.G. surname: Stork fullname: Stork, D.G.
BookMark	eNotj91KwzAYQANOcN18AL3KC7Tmy5emqXej6BQGXmy7Hmn6RSP9o6nC3t7BvDicuwMnYYt-6ImxBxAZgCifNtV-X2VQlipTBeRgblgiDBitlchxwZYgcpNqLPGOJTF-CyGFNHLJno8x9J-8IT9Mna1b4jN1Y2tninweeOg9Tfw3xB_b8jgSuS_enHvbBRfX7NbbNtL9v1fs8PpyqN7S3cf2vdrs0gCFmlNCpa1rpEC8oEGpHLTTBhqDzmENoH0BTV4K5QzUha6l92QuW9JgXeCKPV6zgYhO4xQ6O51P10v8A26yRx8
ContentType	Conference Proceeding
DBID	6IE 6IL CBEJK RIE RIL
DOI	10.1109/ACSSC.1994.471518
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE/IET Electronic Library IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE/IET Electronic Library url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Engineering
EndPage	582 vol.1
ExternalDocumentID	471518
GroupedDBID	29F 6IE 6IF 6IH 6IK 6IL 6IM 6IN AAJGR AAWTH ABLEC ACGFS ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP IPLJI M43 OCL RIE RIL RNS
ID	FETCH-LOGICAL-i174t-e346acd20332036144516c681d83cc3b116f71d5904c81b76b2ffe8110283b73
IEDL.DBID	RIE
ISBN	0818664053 9780818664052
ISSN	1058-6393
IngestDate	Tue Aug 26 23:01:09 EDT 2025
IsPeerReviewed	false
IsScholarly	true
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i174t-e346acd20332036144516c681d83cc3b116f71d5904c81b76b2ffe8110283b73
ParticipantIDs	ieee_primary_471518
PublicationCentury	1900
PublicationDate	19940000
PublicationDateYYYYMMDD	1994-01-01
PublicationDate_xml	– year: 1994 text: 19940000
PublicationDecade	1990
PublicationTitle	Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers
PublicationTitleAbbrev	ACSSC
PublicationYear	1994
Publisher	IEEE Comput. Soc. Press
Publisher_xml	– name: IEEE Comput. Soc. Press
SSID	ssj0020282 ssj0000451461
Score	1.4913826
Snippet	The visual image of a talker provides information complementary to the acoustic speech waveform, and enables improved recognition accuracy, especially in...
SourceID	ieee
SourceType	Publisher
StartPage	578
SubjectTerms	Acoustic noise Acoustic waves Data mining Image recognition Lighting Lips Mouth Noise robustness Speech enhancement Speech recognition
Title	Using deformable templates to infer visual speech dynamics
URI	https://ieeexplore.ieee.org/document/471518
Volume	1
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjZ1LS8NAEMcX25Ne1FrxzR68bprNJpvEmxRLERShFXor-5hisbSlSTz46d3ZxPrAg7ckh4RNlsx_Xr8h5DqyqeUggfFMGRYbkTKlTMqsjXB7WGEBQwMPj3L4HN9PkknD2fa9MADgi88gwEOfy7crU2GorOd-pAnPWqSVZrJu1dqGUxCT4sloja-FroRPdCYZc0ZYePQjwt2cQBENeOfzPGqynTzMe7f90aiPLXxxUD_tx9QVb3QG-3U3d-FZhVhr8hpUpQ7M-y-S4z_Xc0C6X9199Glrtw7JDiw7ZO8bmPCI3PhKAmrBS1q9AIoIqwXqUlquKFZwbejbvKjUghZrAPNCbT3avuiS8eBu3B-yZsoCmztvpGQgYqmMjUIhMCnJkVgmjXQ6NhPGCM25nKXcJnkYG6dxU6mj2Qwy7pWJTsUxaS9XSzghFCIV525TqNzdU1urlXHOkeE61Cq3Nj0lHXwH03XN0ZjWyz_78-o52a2pxRjsuCDtclPBpTP_pb7yH_4DpJCouA
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjZ05T8MwFMctKAOwAKWIGw-sTuPYudhQRVWgrZBapG6Vj1dRUbVVkzDw6bGdUA4xsCUZEjl68vu_62eErgMdawoREJoIRbhiMRFCxUTrwJqHZhpsaqDXjzrP_GEUjirOtpuFAQDXfAaevXS1fL1QhU2VNc1GGtJkE22FnPOwHNZaJ1QsKMWx0apoywYTrtQZJsS4YebgjxbvZiQKq9A7n_dBVe-kftq8bQ0GLTvEx73yez_OXXFup71XznNnjlZou01evSKXnnr_xXL854r2UeNrvg8_rT3XAdqAeR3tfkMTHqIb10uANThRK2eALcRqZpUpzhfY9nCt8Ns0K8QMZ0sA9YJ1ebh91kDD9t2w1SHVOQtkauKRnADjkVA68BmzZUlqmWWRioySTZhSTFIaTWKqw9TnyqjcOJLBZAIJddpExuwI1eaLORwjDIHgqTELkZp3Sq2lUCY8UlT6UqRaxyeobv_BeFmSNMbl8k__fHqFtjvDXnfcve8_nqGdkmFsUx_nqJavCrgwYiCXl84IPgCyz6wF
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+of+1994+28th+Asilomar+Conference+on+Signals%2C+Systems+and+Computers&rft.atitle=Using+deformable+templates+to+infer+visual+speech+dynamics&rft.au=Hennecke%2C+M.E.&rft.au=Prasad%2C+K.V.&rft.au=Stork%2C+D.G.&rft.date=1994-01-01&rft.pub=IEEE+Comput.+Soc.+Press&rft.isbn=9780818664052&rft.issn=1058-6393&rft.volume=1&rft.spage=578&rft.epage=582+vol.1&rft_id=info:doi/10.1109%2FACSSC.1994.471518&rft.externalDocID=471518
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1058-6393&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1058-6393&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1058-6393&client=summon