Using deformable templates to infer visual speech dynamics

The visual image of a talker provides information complementary to the acoustic speech waveform, and enables improved recognition accuracy, especially in environments corrupted by high acoustic noise or multiple talkers. Because most of the phonologically relevant visual information is from the mout...

Full description

Saved in:
Bibliographic Details
Published inProceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers Vol. 1; pp. 578 - 582 vol.1
Main Authors Hennecke, M.E., Prasad, K.V., Stork, D.G.
Format Conference Proceeding
LanguageEnglish
Published IEEE Comput. Soc. Press 1994
Subjects
Online AccessGet full text
ISBN0818664053
9780818664052
ISSN1058-6393
DOI10.1109/ACSSC.1994.471518

Cover

Abstract The visual image of a talker provides information complementary to the acoustic speech waveform, and enables improved recognition accuracy, especially in environments corrupted by high acoustic noise or multiple talkers. Because most of the phonologically relevant visual information is from the mouth and lips, it is important to infer accurately and robustly their dynamics; moreover it is desirable to extract this information without the use of invasive markers or patterned illumination. We describe the use of deformable templates for speechreading, in order to infer the dynamics of lip contours throughout an image sequence. Template computations can be done relatively quickly and the resulting small number of shape description parameters are quite robust to visual noise and variations in illumination. Such templates delineate the inside of the mouth, so that the teeth and the tongue can also be found.< >
AbstractList The visual image of a talker provides information complementary to the acoustic speech waveform, and enables improved recognition accuracy, especially in environments corrupted by high acoustic noise or multiple talkers. Because most of the phonologically relevant visual information is from the mouth and lips, it is important to infer accurately and robustly their dynamics; moreover it is desirable to extract this information without the use of invasive markers or patterned illumination. We describe the use of deformable templates for speechreading, in order to infer the dynamics of lip contours throughout an image sequence. Template computations can be done relatively quickly and the resulting small number of shape description parameters are quite robust to visual noise and variations in illumination. Such templates delineate the inside of the mouth, so that the teeth and the tongue can also be found.< >
Author Hennecke, M.E.
Stork, D.G.
Prasad, K.V.
Author_xml – sequence: 1
  givenname: M.E.
  surname: Hennecke
  fullname: Hennecke, M.E.
  organization: Dept. of Electr. Eng., Stanford Univ., CA, USA
– sequence: 2
  givenname: K.V.
  surname: Prasad
  fullname: Prasad, K.V.
  organization: Dept. of Electr. Eng., Stanford Univ., CA, USA
– sequence: 3
  givenname: D.G.
  surname: Stork
  fullname: Stork, D.G.
BookMark eNotj91KwzAYQANOcN18AL3KC7Tmy5emqXej6BQGXmy7Hmn6RSP9o6nC3t7BvDicuwMnYYt-6ImxBxAZgCifNtV-X2VQlipTBeRgblgiDBitlchxwZYgcpNqLPGOJTF-CyGFNHLJno8x9J-8IT9Mna1b4jN1Y2tninweeOg9Tfw3xB_b8jgSuS_enHvbBRfX7NbbNtL9v1fs8PpyqN7S3cf2vdrs0gCFmlNCpa1rpEC8oEGpHLTTBhqDzmENoH0BTV4K5QzUha6l92QuW9JgXeCKPV6zgYhO4xQ6O51P10v8A26yRx8
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/ACSSC.1994.471518
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE/IET Electronic Library
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE/IET Electronic Library
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EndPage 582 vol.1
ExternalDocumentID 471518
GroupedDBID 29F
6IE
6IF
6IH
6IK
6IL
6IM
6IN
AAJGR
AAWTH
ABLEC
ACGFS
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IJVOP
IPLJI
M43
OCL
RIE
RIL
RNS
ID FETCH-LOGICAL-i174t-e346acd20332036144516c681d83cc3b116f71d5904c81b76b2ffe8110283b73
IEDL.DBID RIE
ISBN 0818664053
9780818664052
ISSN 1058-6393
IngestDate Tue Aug 26 23:01:09 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i174t-e346acd20332036144516c681d83cc3b116f71d5904c81b76b2ffe8110283b73
ParticipantIDs ieee_primary_471518
PublicationCentury 1900
PublicationDate 19940000
PublicationDateYYYYMMDD 1994-01-01
PublicationDate_xml – year: 1994
  text: 19940000
PublicationDecade 1990
PublicationTitle Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers
PublicationTitleAbbrev ACSSC
PublicationYear 1994
Publisher IEEE Comput. Soc. Press
Publisher_xml – name: IEEE Comput. Soc. Press
SSID ssj0020282
ssj0000451461
Score 1.4913826
Snippet The visual image of a talker provides information complementary to the acoustic speech waveform, and enables improved recognition accuracy, especially in...
SourceID ieee
SourceType Publisher
StartPage 578
SubjectTerms Acoustic noise
Acoustic waves
Data mining
Image recognition
Lighting
Lips
Mouth
Noise robustness
Speech enhancement
Speech recognition
Title Using deformable templates to infer visual speech dynamics
URI https://ieeexplore.ieee.org/document/471518
Volume 1
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjZ1LS8NAEMcX25Ne1FrxzR68bprNJpvEmxRLERShFXor-5hisbSlSTz46d3ZxPrAg7ckh4RNlsx_Xr8h5DqyqeUggfFMGRYbkTKlTMqsjXB7WGEBQwMPj3L4HN9PkknD2fa9MADgi88gwEOfy7crU2GorOd-pAnPWqSVZrJu1dqGUxCT4sloja-FroRPdCYZc0ZYePQjwt2cQBENeOfzPGqynTzMe7f90aiPLXxxUD_tx9QVb3QG-3U3d-FZhVhr8hpUpQ7M-y-S4z_Xc0C6X9199Glrtw7JDiw7ZO8bmPCI3PhKAmrBS1q9AIoIqwXqUlquKFZwbejbvKjUghZrAPNCbT3avuiS8eBu3B-yZsoCmztvpGQgYqmMjUIhMCnJkVgmjXQ6NhPGCM25nKXcJnkYG6dxU6mj2Qwy7pWJTsUxaS9XSzghFCIV525TqNzdU1urlXHOkeE61Cq3Nj0lHXwH03XN0ZjWyz_78-o52a2pxRjsuCDtclPBpTP_pb7yH_4DpJCouA
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjZ05T8MwFMctKAOwAKWIGw-sTuPYudhQRVWgrZBapG6Vj1dRUbVVkzDw6bGdUA4xsCUZEjl68vu_62eErgMdawoREJoIRbhiMRFCxUTrwJqHZhpsaqDXjzrP_GEUjirOtpuFAQDXfAaevXS1fL1QhU2VNc1GGtJkE22FnPOwHNZaJ1QsKMWx0apoywYTrtQZJsS4YebgjxbvZiQKq9A7n_dBVe-kftq8bQ0GLTvEx73yez_OXXFup71XznNnjlZou01evSKXnnr_xXL854r2UeNrvg8_rT3XAdqAeR3tfkMTHqIb10uANThRK2eALcRqZpUpzhfY9nCt8Ns0K8QMZ0sA9YJ1ebh91kDD9t2w1SHVOQtkauKRnADjkVA68BmzZUlqmWWRioySTZhSTFIaTWKqw9TnyqjcOJLBZAIJddpExuwI1eaLORwjDIHgqTELkZp3Sq2lUCY8UlT6UqRaxyeobv_BeFmSNMbl8k__fHqFtjvDXnfcve8_nqGdkmFsUx_nqJavCrgwYiCXl84IPgCyz6wF
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+of+1994+28th+Asilomar+Conference+on+Signals%2C+Systems+and+Computers&rft.atitle=Using+deformable+templates+to+infer+visual+speech+dynamics&rft.au=Hennecke%2C+M.E.&rft.au=Prasad%2C+K.V.&rft.au=Stork%2C+D.G.&rft.date=1994-01-01&rft.pub=IEEE+Comput.+Soc.+Press&rft.isbn=9780818664052&rft.issn=1058-6393&rft.volume=1&rft.spage=578&rft.epage=582+vol.1&rft_id=info:doi/10.1109%2FACSSC.1994.471518&rft.externalDocID=471518
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1058-6393&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1058-6393&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1058-6393&client=summon