OuluVS2: A multi-view audiovisual database for non-rigid mouth motion analysis

Visual speech constitutes a large part of our nonrigid facial motion and contains important information that allows machines to interact with human users, for instance, through automatic visual speech recognition (VSR) and speaker verification. One of the major obstacles to research of non-rigid mou...

Full description

Saved in:

Bibliographic Details
Published in	2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG) Vol. 1; pp. 1 - 5
Main Authors	Anina, Iryna, Ziheng Zhou, Guoying Zhao, Pietikainen, Matti
Format	Conference Proceeding
Language	English
Published	IEEE 01.05.2015
Subjects	Cameras Mouth Speech Synchronization Videos Visualization
Online Access	Get full text
DOI	10.1109/FG.2015.7163155

Cover

More Information
Summary:	Visual speech constitutes a large part of our nonrigid facial motion and contains important information that allows machines to interact with human users, for instance, through automatic visual speech recognition (VSR) and speaker verification. One of the major obstacles to research of non-rigid mouth motion analysis is the absence of suitable databases. Those available for public research either lack a sufficient number of speakers or utterances or contain constrained view points, which limits their representativeness and usefulness. This paper introduces a newly collected multi-view audiovisual database for non-rigid mouth motion analysis. It includes more than 50 speakers uttering three types of utterances and more importantly, thousands of videos simultaneously recorded by six cameras from five different views spanned between the frontal and profile views. Moreover, a simple VSR system has been developed and tested on the database to provide some baseline performance.
DOI:	10.1109/FG.2015.7163155