Multi-class learning algorithm for deep neural network-based statistical parametric speech synthesis

This paper proposes a multi-class learning (MCL) algorithm for a deep neural network (DNN)-based statistical parametric speech synthesis (SPSS) system. Although the DNN-based SPSS system improves the modeling accuracy of statistical parameters, its synthesized speech is often muffled because the tra...

Full description

Saved in:

Bibliographic Details
Published in	2016 24th European Signal Processing Conference (EUSIPCO) pp. 1951 - 1955
Main Authors	Eunwoo Song, Hong-Goo Kang
Format	Conference Proceeding
Language	English
Published	EURASIP 01.08.2016
Subjects	Acoustics Clustering algorithms Context context clustering deep neural network Hidden Markov models shared hidden layer Signal processing algorithms Speech Statistical parametric speech synthesis Training
Online Access	Get full text
ISSN	2076-1465
DOI	10.1109/EUSIPCO.2016.7760589

Cover

More Information
Summary:	This paper proposes a multi-class learning (MCL) algorithm for a deep neural network (DNN)-based statistical parametric speech synthesis (SPSS) system. Although the DNN-based SPSS system improves the modeling accuracy of statistical parameters, its synthesized speech is often muffled because the training process only considers the global characteristics of the entire set of training data, but does not explicitly consider any local variations. We introduce a DNN-based context clustering algorithm that implicitly divides the training data into several classes, and train them via a shared hidden layer-based MCL algorithm. Since the proposed MCL method efficiently models both the universal and class-dependent characteristics of various phonetic information, it not only avoids the model over-fitting problem but also reduces the over-smoothing effect. Objective and subjective test results also verify that the proposed algorithm performs much better than the conventional method.
ISSN:	2076-1465
DOI:	10.1109/EUSIPCO.2016.7760589