Multi-class learning algorithm for deep neural network-based statistical parametric speech synthesis
This paper proposes a multi-class learning (MCL) algorithm for a deep neural network (DNN)-based statistical parametric speech synthesis (SPSS) system. Although the DNN-based SPSS system improves the modeling accuracy of statistical parameters, its synthesized speech is often muffled because the tra...
Saved in:
| Published in | 2016 24th European Signal Processing Conference (EUSIPCO) pp. 1951 - 1955 |
|---|---|
| Main Authors | , |
| Format | Conference Proceeding |
| Language | English |
| Published |
EURASIP
01.08.2016
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 2076-1465 |
| DOI | 10.1109/EUSIPCO.2016.7760589 |
Cover
| Abstract | This paper proposes a multi-class learning (MCL) algorithm for a deep neural network (DNN)-based statistical parametric speech synthesis (SPSS) system. Although the DNN-based SPSS system improves the modeling accuracy of statistical parameters, its synthesized speech is often muffled because the training process only considers the global characteristics of the entire set of training data, but does not explicitly consider any local variations. We introduce a DNN-based context clustering algorithm that implicitly divides the training data into several classes, and train them via a shared hidden layer-based MCL algorithm. Since the proposed MCL method efficiently models both the universal and class-dependent characteristics of various phonetic information, it not only avoids the model over-fitting problem but also reduces the over-smoothing effect. Objective and subjective test results also verify that the proposed algorithm performs much better than the conventional method. |
|---|---|
| AbstractList | This paper proposes a multi-class learning (MCL) algorithm for a deep neural network (DNN)-based statistical parametric speech synthesis (SPSS) system. Although the DNN-based SPSS system improves the modeling accuracy of statistical parameters, its synthesized speech is often muffled because the training process only considers the global characteristics of the entire set of training data, but does not explicitly consider any local variations. We introduce a DNN-based context clustering algorithm that implicitly divides the training data into several classes, and train them via a shared hidden layer-based MCL algorithm. Since the proposed MCL method efficiently models both the universal and class-dependent characteristics of various phonetic information, it not only avoids the model over-fitting problem but also reduces the over-smoothing effect. Objective and subjective test results also verify that the proposed algorithm performs much better than the conventional method. |
| Author | Eunwoo Song Hong-Goo Kang |
| Author_xml | – sequence: 1 surname: Eunwoo Song fullname: Eunwoo Song organization: Dept. of Electr. & Electron. Eng., Yonsei Univ., Seoul, South Korea – sequence: 2 surname: Hong-Goo Kang fullname: Hong-Goo Kang organization: Dept. of Electr. & Electron. Eng., Yonsei Univ., Seoul, South Korea |
| BookMark | eNotkM1KAzEURqMo2NY-gS7yAjMmmUlyZyml1kKlgnZdMsmdNjh_JCnSt7dgV4ePA9_iTMldP_RIyDNnOeeselnuvtafi20uGFe51opJqG7IlFWVACWUlLdkIphWGS-VfCDzGH3NBDDQnKkJcR-nNvnMtiZG2qIJve8P1LSHIfh07GgzBOoQR9rjKZj2gvQ7hJ-sNhEdjckkH5O3FzOaYDpMwVsaR0R7pPHcpyNGHx_JfWPaiPMrZ2T3tvxevGeb7Wq9eN1knmuZMl0ZJ3nZCHBlAehKB1AxwUpuNW_qQhlZ17oE7QBrAF5cRiNVYVFLsBqKGXn6__WIuB-D70w4769Rij81HVqZ |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/EUSIPCO.2016.7760589 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| EISBN | 0992862655 9780992862657 |
| EISSN | 2076-1465 |
| EndPage | 1955 |
| ExternalDocumentID | 7760589 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IL ACGFS ALMA_UNASSIGNED_HOLDINGS CBEJK RIE RIL |
| ID | FETCH-LOGICAL-i175t-79ad514f28d438ed4d88902041c71fb36a5bb7487d8eb8813b74f563ce758c783 |
| IEDL.DBID | RIE |
| IngestDate | Wed Aug 27 01:51:21 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i175t-79ad514f28d438ed4d88902041c71fb36a5bb7487d8eb8813b74f563ce758c783 |
| PageCount | 5 |
| ParticipantIDs | ieee_primary_7760589 |
| PublicationCentury | 2000 |
| PublicationDate | 2016-Aug. |
| PublicationDateYYYYMMDD | 2016-08-01 |
| PublicationDate_xml | – month: 08 year: 2016 text: 2016-Aug. |
| PublicationDecade | 2010 |
| PublicationTitle | 2016 24th European Signal Processing Conference (EUSIPCO) |
| PublicationTitleAbbrev | EUSIPCO |
| PublicationYear | 2016 |
| Publisher | EURASIP |
| Publisher_xml | – name: EURASIP |
| SSID | ssib028087106 ssib025355106 |
| Score | 1.6174415 |
| Snippet | This paper proposes a multi-class learning (MCL) algorithm for a deep neural network (DNN)-based statistical parametric speech synthesis (SPSS) system.... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 1951 |
| SubjectTerms | Acoustics Clustering algorithms Context context clustering deep neural network Hidden Markov models shared hidden layer Signal processing algorithms Speech Statistical parametric speech synthesis Training |
| Title | Multi-class learning algorithm for deep neural network-based statistical parametric speech synthesis |
| URI | https://ieeexplore.ieee.org/document/7760589 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3JbsIwELWAU09tBVV3-dBjE5J4zRmBaCVapBaJG0rsCaCyiYRD-_W1nUAX9dBTnFiKLDvSm8m89wahu0xGWcyU8ASNtUcJTbw0tImrImDgIOGJE4UNnnh_RB_HbFxD9wctDAA48hn4duhq-XqtdvZXWVsIW8SL66guJC-1WvtvJ2IGOL9VDCMZmFQg4JVaLgzidnf08jDsPFs6F_erV_3oqeIgpXeMBvvFlEySN39XpL76-OXT-N_VnqDWl3gPDw-wdIpqsGoi7YS2nrKxMq46RUxxspiut_NitsQmdMUaYIOtv2WyMBfHDvcsyGlsVUfO0NnMWK_wpW3DpXC-AVAznL-vTBSZz_MWGvW6r52-VzVY8OYmaig8ESfaBExZJDUlEjTV0pYdAxoqEWYp4QlLU2FSGi0hlTIk5iZjnCgwWYYSkpyhxmq9gnOEeRCpkCnGdBhTRqIUBJAAlATr6EXhAjXtDk02pYfGpNqcy78fX6Eje0ol0e4aNYrtDm4M-BfprTv1T6JWrzs |
| linkProvider | IEEE |
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT8IwFG4QD3pSA8bf9uDRwba2a3cmEFBAEiHhRrb2DYgwCBsH_ettt4E_4sHTujVZmq7J9729930PoYdIuJHPJLc49ZVFCQ2s0DGBqySg4SDwgkwU1ut77RF9GrNxCT3utTAAkBWfQc0Ms1y-Wsmt-VVW59wk8fwDdMgopSxXa-1Oj8s0dH7LGbrC1sGA7RV6Ocf2683Ra2fQeDEFXV6teNmPrioZqLROUG-3nLyW5K22TcOa_Pjl1Pjf9Z6i6pd8Dw_2wHSGShBXkMqktpY0bBkXvSKmOFhMV5t5OltiTV6xAlhj43AZLPQlqw-3DMwpbHRHmaWznjFu4UvTiEviZA0gZzh5jzWPTOZJFY1azWGjbRUtFqy55g2pxf1AacoUuUJRIkBRJUzi0aaO5E4UEi9gYch1UKMEhEI4RN9EzCMSdJwhuSDnqByvYrhA2LNd6TDJmHJ8yogbAgdigxRgPL0oXKKK2aHJOnfRmBSbc_X343t01B72upNup_98jY7NF8vL7m5QOd1s4VZTgTS8y07AJ_ySsog |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=2016+24th+European+Signal+Processing+Conference+%28EUSIPCO%29&rft.atitle=Multi-class+learning+algorithm+for+deep+neural+network-based+statistical+parametric+speech+synthesis&rft.au=Eunwoo+Song&rft.au=Hong-Goo+Kang&rft.date=2016-08-01&rft.pub=EURASIP&rft.eissn=2076-1465&rft.spage=1951&rft.epage=1955&rft_id=info:doi/10.1109%2FEUSIPCO.2016.7760589&rft.externalDocID=7760589 |