Speaker indexing and adaptation using speaker clustering based on statistical model selection

The paper addresses unsupervised speaker indexing and automatic speech recognition of discussions. In speaker indexing, there are two cases, where the number of speakers is unknown beforehand and where the number is known. When the specified number is unknown, it is difficult to apply to various dat...

Full description

Saved in:

Bibliographic Details
Published in	2004 IEEE International Conference on Acoustics, Speech and Signal Processing Vol. 1; pp. I - 353
Main Authors	Nishida, M., Kawahara, T.
Format	Conference Proceeding
Language	English Japanese
Published	Piscataway, N.J IEEE 28.09.2004
Subjects	Acoustic testing Applied sciences Automatic speech recognition Bayesian methods Exact sciences and technology Gaussian distribution Indexing Informatics Information, signal and communications theory Loudspeakers Robustness Signal and communications theory Signal processing Signal representation. Spectral analysis Signal, noise Speech processing Speech recognition Telecommunications and information theory Voice mail Performance evaluation Automatic classification Archive Model selection Mixture theory Speaker adaptation Unsupervised classification Signal classification Gaussian process Statistical model Automatic indexing Speech recognition Automatic recognition Speech processing
Online Access	Get full text
ISBN	9780780384842 0780384849
ISSN	1520-6149
DOI	10.1109/ICASSP.2004.1325995

Cover

More Information
Summary:	The paper addresses unsupervised speaker indexing and automatic speech recognition of discussions. In speaker indexing, there are two cases, where the number of speakers is unknown beforehand and where the number is known. When the specified number is unknown, it is difficult to apply to various data because it needs to determine several parameters like threshold. In addition, serious problems arise in applying a uniform model because variations in the utterance durations of speakers are large. We thus propose a method which can robustly perform speaker indexing for the two cases using a flexible framework in which an optimal speaker model (GMM or VQ) is selected based on the BIC (Bayesian information criterion). Moreover, we propose a combination method of speaker adaptation based on speaker selection and the indexing method. For real discussion archives, we demonstrated that indexing performance is higher than that of conventional methods for the two cases and speech recognition performance was improved by the combination method.
ISBN:	9780780384842 0780384849
ISSN:	1520-6149
DOI:	10.1109/ICASSP.2004.1325995