Expressive Facial Animation Synthesis by Learning Speech Coarticulation and Expression Spaces

Synthesizing expressive facial animation is a very challenging topic within the graphics community. In this paper, we present an expressive facial animation synthesis system enabled by automated learning from facial motion capture data. Accurate 3D motions of the markers on the face of a human subje...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on visualization and computer graphics Vol. 12; no. 6; pp. 1523 - 1534
Main Authors	Deng, Z., Neumann, U., Lewis, J.P., Kim, T.-Y., Bulut, M., Shrikanth Narayanan
Format	Journal Article
Language	English
Published	United States IEEE 01.11.2006 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Animation animation synthesis Artificial Intelligence Concatenated codes data-driven Dynamics expressive speech Face Face - anatomy & histology Face - physiology Facial Facial animation Facial Expression Graphics Human subjects Humans Image Interpretation, Computer-Assisted - methods Imaging, Three-Dimensional - methods Mathematical models Models, Biological Motion analysis motion capture Principal component analysis Signal processing Signal synthesis Speech Speech - physiology speech coarticulation Speech Production Measurement - methods Speech synthesis Synthesis texture synthesis Visual
Online Access	Get full text
ISSN	1077-2626 1941-0506
DOI	10.1109/TVCG.2006.90

Cover

More Information
Summary:	Synthesizing expressive facial animation is a very challenging topic within the graphics community. In this paper, we present an expressive facial animation synthesis system enabled by automated learning from facial motion capture data. Accurate 3D motions of the markers on the face of a human subject are captured while he/she recites a predesigned corpus, with specific spoken and visual expressions. We present a novel motion capture mining technique that "learns" speech coarticulation models for diphones and triphones from the recorded data. A phoneme-independent expression eigenspace (PIEES) that encloses the dynamic expression signals is constructed by motion signal processing (phoneme-based time-warping and subtraction) and principal component analysis (PCA) reduction. New expressive facial animations are synthesized as follows: First, the learned coarticulation models are concatenated to synthesize neutral visual speech according to novel speech input, then a texture-synthesis-based approach is used to generate a novel dynamic expression signal from the PIEES model, and finally the synthesized expression signal is blended with the synthesized neutral visual speech to create the final expressive facial animation. Our experiments demonstrate that the system can effectively synthesize realistic expressive facial animation
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 ObjectType-Article-2 ObjectType-Undefined-1 ObjectType-Feature-3 content type line 23 ObjectType-Feature-1
ISSN:	1077-2626 1941-0506
DOI:	10.1109/TVCG.2006.90