Using deformable templates to infer visual speech dynamics

The visual image of a talker provides information complementary to the acoustic speech waveform, and enables improved recognition accuracy, especially in environments corrupted by high acoustic noise or multiple talkers. Because most of the phonologically relevant visual information is from the mout...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers Vol. 1; pp. 578 - 582 vol.1
Main Authors	Hennecke, M.E., Prasad, K.V., Stork, D.G.
Format	Conference Proceeding
Language	English
Published	IEEE Comput. Soc. Press 1994
Subjects	Acoustic noise Acoustic waves Data mining Image recognition Lighting Lips Mouth Noise robustness Speech enhancement Speech recognition
Online Access	Get full text
ISBN	0818664053 9780818664052
ISSN	1058-6393
DOI	10.1109/ACSSC.1994.471518

Cover

More Information
Summary:	The visual image of a talker provides information complementary to the acoustic speech waveform, and enables improved recognition accuracy, especially in environments corrupted by high acoustic noise or multiple talkers. Because most of the phonologically relevant visual information is from the mouth and lips, it is important to infer accurately and robustly their dynamics; moreover it is desirable to extract this information without the use of invasive markers or patterned illumination. We describe the use of deformable templates for speechreading, in order to infer the dynamics of lip contours throughout an image sequence. Template computations can be done relatively quickly and the resulting small number of shape description parameters are quite robust to visual noise and variations in illumination. Such templates delineate the inside of the mouth, so that the teeth and the tongue can also be found.< >
ISBN:	0818664053 9780818664052
ISSN:	1058-6393
DOI:	10.1109/ACSSC.1994.471518