Bimodal person identification using voice data and face images

The paper considers bimodal person identification problem by analyzing the speaker’s face and voice. Two speaker identification algorithms are developed and compared. The idea of the first algorithm consists of extracting features from the speech signal in the form of mel frequency cepstral coeffici...

Full description

Saved in:

Bibliographic Details
Main Authors	Khryashchev, V. V, Topnikov, A. I, Stefanidi, A. F, Priorov, A. L
Format	Conference Proceeding
Language	English
Published	SPIE 15.03.2019
Online Access	Get full text
ISBN	9781510627482 1510627480
ISSN	0277-786X
DOI	10.1117/12.2523138

Cover

More Information
Summary:	The paper considers bimodal person identification problem by analyzing the speaker’s face and voice. Two speaker identification algorithms are developed and compared. The idea of the first algorithm consists of extracting features from the speech signal in the form of mel frequency cepstral coefficients and, with this basis, forming a speaker model using Gaussian mixtures. Second approach is based on the use of a universal background model obtained from the records of a large number of speakers. For face identification, a neural network with 13 convolutional layers was used. For the learning and testing, the databases of speech signals and face images of 100 people were formed. The final bimodal identification system shows the high level of accuracy identification of more than 95%. The results of this experiment demonstrated the possibility of applying the proposed algorithms to the person identification problem in real-life systems.
Bibliography:	Conference Date: 2018-11-01\|2018-11-03 Conference Location: Munich, Germany
ISBN:	9781510627482 1510627480
ISSN:	0277-786X
DOI:	10.1117/12.2523138