Multi-class learning algorithm for deep neural network-based statistical parametric speech synthesis

This paper proposes a multi-class learning (MCL) algorithm for a deep neural network (DNN)-based statistical parametric speech synthesis (SPSS) system. Although the DNN-based SPSS system improves the modeling accuracy of statistical parameters, its synthesized speech is often muffled because the tra...

Full description

Saved in:
Bibliographic Details
Published in2016 24th European Signal Processing Conference (EUSIPCO) pp. 1951 - 1955
Main Authors Eunwoo Song, Hong-Goo Kang
Format Conference Proceeding
LanguageEnglish
Published EURASIP 01.08.2016
Subjects
Online AccessGet full text
ISSN2076-1465
DOI10.1109/EUSIPCO.2016.7760589

Cover

Abstract This paper proposes a multi-class learning (MCL) algorithm for a deep neural network (DNN)-based statistical parametric speech synthesis (SPSS) system. Although the DNN-based SPSS system improves the modeling accuracy of statistical parameters, its synthesized speech is often muffled because the training process only considers the global characteristics of the entire set of training data, but does not explicitly consider any local variations. We introduce a DNN-based context clustering algorithm that implicitly divides the training data into several classes, and train them via a shared hidden layer-based MCL algorithm. Since the proposed MCL method efficiently models both the universal and class-dependent characteristics of various phonetic information, it not only avoids the model over-fitting problem but also reduces the over-smoothing effect. Objective and subjective test results also verify that the proposed algorithm performs much better than the conventional method.
AbstractList This paper proposes a multi-class learning (MCL) algorithm for a deep neural network (DNN)-based statistical parametric speech synthesis (SPSS) system. Although the DNN-based SPSS system improves the modeling accuracy of statistical parameters, its synthesized speech is often muffled because the training process only considers the global characteristics of the entire set of training data, but does not explicitly consider any local variations. We introduce a DNN-based context clustering algorithm that implicitly divides the training data into several classes, and train them via a shared hidden layer-based MCL algorithm. Since the proposed MCL method efficiently models both the universal and class-dependent characteristics of various phonetic information, it not only avoids the model over-fitting problem but also reduces the over-smoothing effect. Objective and subjective test results also verify that the proposed algorithm performs much better than the conventional method.
Author Eunwoo Song
Hong-Goo Kang
Author_xml – sequence: 1
  surname: Eunwoo Song
  fullname: Eunwoo Song
  organization: Dept. of Electr. & Electron. Eng., Yonsei Univ., Seoul, South Korea
– sequence: 2
  surname: Hong-Goo Kang
  fullname: Hong-Goo Kang
  organization: Dept. of Electr. & Electron. Eng., Yonsei Univ., Seoul, South Korea
BookMark eNotkM1KAzEURqMo2NY-gS7yAjMmmUlyZyml1kKlgnZdMsmdNjh_JCnSt7dgV4ePA9_iTMldP_RIyDNnOeeselnuvtafi20uGFe51opJqG7IlFWVACWUlLdkIphWGS-VfCDzGH3NBDDQnKkJcR-nNvnMtiZG2qIJve8P1LSHIfh07GgzBOoQR9rjKZj2gvQ7hJ-sNhEdjckkH5O3FzOaYDpMwVsaR0R7pPHcpyNGHx_JfWPaiPMrZ2T3tvxevGeb7Wq9eN1knmuZMl0ZJ3nZCHBlAehKB1AxwUpuNW_qQhlZ17oE7QBrAF5cRiNVYVFLsBqKGXn6__WIuB-D70w4769Rij81HVqZ
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/EUSIPCO.2016.7760589
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 0992862655
9780992862657
EISSN 2076-1465
EndPage 1955
ExternalDocumentID 7760589
Genre orig-research
GroupedDBID 6IE
6IL
ACGFS
ALMA_UNASSIGNED_HOLDINGS
CBEJK
RIE
RIL
ID FETCH-LOGICAL-i175t-79ad514f28d438ed4d88902041c71fb36a5bb7487d8eb8813b74f563ce758c783
IEDL.DBID RIE
IngestDate Wed Aug 27 01:51:21 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i175t-79ad514f28d438ed4d88902041c71fb36a5bb7487d8eb8813b74f563ce758c783
PageCount 5
ParticipantIDs ieee_primary_7760589
PublicationCentury 2000
PublicationDate 2016-Aug.
PublicationDateYYYYMMDD 2016-08-01
PublicationDate_xml – month: 08
  year: 2016
  text: 2016-Aug.
PublicationDecade 2010
PublicationTitle 2016 24th European Signal Processing Conference (EUSIPCO)
PublicationTitleAbbrev EUSIPCO
PublicationYear 2016
Publisher EURASIP
Publisher_xml – name: EURASIP
SSID ssib028087106
ssib025355106
Score 1.6174415
Snippet This paper proposes a multi-class learning (MCL) algorithm for a deep neural network (DNN)-based statistical parametric speech synthesis (SPSS) system....
SourceID ieee
SourceType Publisher
StartPage 1951
SubjectTerms Acoustics
Clustering algorithms
Context
context clustering
deep neural network
Hidden Markov models
shared hidden layer
Signal processing algorithms
Speech
Statistical parametric speech synthesis
Training
Title Multi-class learning algorithm for deep neural network-based statistical parametric speech synthesis
URI https://ieeexplore.ieee.org/document/7760589
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3JbsIwELWAU09tBVV3-dBjE5J4zRmBaCVapBaJG0rsCaCyiYRD-_W1nUAX9dBTnFiKLDvSm8m89wahu0xGWcyU8ASNtUcJTbw0tImrImDgIOGJE4UNnnh_RB_HbFxD9wctDAA48hn4duhq-XqtdvZXWVsIW8SL66guJC-1WvtvJ2IGOL9VDCMZmFQg4JVaLgzidnf08jDsPFs6F_erV_3oqeIgpXeMBvvFlEySN39XpL76-OXT-N_VnqDWl3gPDw-wdIpqsGoi7YS2nrKxMq46RUxxspiut_NitsQmdMUaYIOtv2WyMBfHDvcsyGlsVUfO0NnMWK_wpW3DpXC-AVAznL-vTBSZz_MWGvW6r52-VzVY8OYmaig8ESfaBExZJDUlEjTV0pYdAxoqEWYp4QlLU2FSGi0hlTIk5iZjnCgwWYYSkpyhxmq9gnOEeRCpkCnGdBhTRqIUBJAAlATr6EXhAjXtDk02pYfGpNqcy78fX6Eje0ol0e4aNYrtDm4M-BfprTv1T6JWrzs
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT8IwFG4QD3pSA8bf9uDRwba2a3cmEFBAEiHhRrb2DYgwCBsH_ettt4E_4sHTujVZmq7J9729930PoYdIuJHPJLc49ZVFCQ2s0DGBqySg4SDwgkwU1ut77RF9GrNxCT3utTAAkBWfQc0Ms1y-Wsmt-VVW59wk8fwDdMgopSxXa-1Oj8s0dH7LGbrC1sGA7RV6Ocf2683Ra2fQeDEFXV6teNmPrioZqLROUG-3nLyW5K22TcOa_Pjl1Pjf9Z6i6pd8Dw_2wHSGShBXkMqktpY0bBkXvSKmOFhMV5t5OltiTV6xAlhj43AZLPQlqw-3DMwpbHRHmaWznjFu4UvTiEviZA0gZzh5jzWPTOZJFY1azWGjbRUtFqy55g2pxf1AacoUuUJRIkBRJUzi0aaO5E4UEi9gYch1UKMEhEI4RN9EzCMSdJwhuSDnqByvYrhA2LNd6TDJmHJ8yogbAgdigxRgPL0oXKKK2aHJOnfRmBSbc_X343t01B72upNup_98jY7NF8vL7m5QOd1s4VZTgTS8y07AJ_ySsog
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=2016+24th+European+Signal+Processing+Conference+%28EUSIPCO%29&rft.atitle=Multi-class+learning+algorithm+for+deep+neural+network-based+statistical+parametric+speech+synthesis&rft.au=Eunwoo+Song&rft.au=Hong-Goo+Kang&rft.date=2016-08-01&rft.pub=EURASIP&rft.eissn=2076-1465&rft.spage=1951&rft.epage=1955&rft_id=info:doi/10.1109%2FEUSIPCO.2016.7760589&rft.externalDocID=7760589