Voice Conversion Using Partial Least Squares Regression
Voice conversion can be formulated as finding a mapping function which transforms the features of the source speaker to those of the target speaker. Gaussian mixture model (GMM)-based conversion is commonly used, but it is subject to overfitting. In this paper, we propose to use partial least square...
Saved in:
| Published in | IEEE transactions on audio, speech, and language processing Vol. 18; no. 5; pp. 912 - 921 |
|---|---|
| Main Authors | , , , |
| Format | Journal Article |
| Language | English |
| Published |
IEEE
01.07.2010
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 1558-7916 1558-7924 |
| DOI | 10.1109/TASL.2010.2041699 |
Cover
| Abstract | Voice conversion can be formulated as finding a mapping function which transforms the features of the source speaker to those of the target speaker. Gaussian mixture model (GMM)-based conversion is commonly used, but it is subject to overfitting. In this paper, we propose to use partial least squares (PLS)-based transforms in voice conversion. To prevent overfitting, the degrees of freedom in the mapping can be controlled by choosing a suitable number of components. We propose a technique to combine PLS with GMMs, enabling the use of multiple local linear mappings. To further improve the perceptual quality of the mapping where rapid transitions between GMM components produce audible artefacts, we propose to low-pass filter the component posterior probabilities. The conducted experiments show that the proposed technique results in better subjective and objective quality than the baseline joint density GMM approach. In speech quality conversion preference tests, the proposed method achieved 67% preference score against the smoothed joint density GMM method and 84% preference score against the unsmoothed joint density GMM method. In objective tests the proposed method produced a lower Mel-cepstral distortion than the reference methods. |
|---|---|
| AbstractList | Voice conversion can be formulated as finding a mapping function which transforms the features of the source speaker to those of the target speaker. Gaussian mixture model (GMM)-based conversion is commonly used, but it is subject to overfitting. In this paper, we propose to use partial least squares (PLS)-based transforms in voice conversion. To prevent overfitting, the degrees of freedom in the mapping can be controlled by choosing a suitable number of components. We propose a technique to combine PLS with GMMs, enabling the use of multiple local linear mappings. To further improve the perceptual quality of the mapping where rapid transitions between GMM components produce audible artefacts, we propose to low-pass filter the component posterior probabilities. The conducted experiments show that the proposed technique results in better subjective and objective quality than the baseline joint density GMM approach. In speech quality conversion preference tests, the proposed method achieved 67% preference score against the smoothed joint density GMM method and 84% preference score against the unsmoothed joint density GMM method. In objective tests the proposed method produced a lower Mel-cepstral distortion than the reference methods. |
| Author | Nurminen, Jani Gabbouj, Moncef Virtanen, Tuomas Helander, Elina |
| Author_xml | – sequence: 1 givenname: Elina surname: Helander fullname: Helander, Elina email: elina.helander@tut.fi organization: Dept. of Signal Process., Tampere Univ. of Technol., Tampere, Finland – sequence: 2 givenname: Tuomas surname: Virtanen fullname: Virtanen, Tuomas email: tuomas.virtanen@tut.fi organization: Dept. of Signal Process., Tampere Univ. of Technol., Tampere, Finland – sequence: 3 givenname: Jani surname: Nurminen fullname: Nurminen, Jani email: jani.k.nurminen@nokia.com organization: Nokia Devices R&D, Tampere, Finland – sequence: 4 givenname: Moncef surname: Gabbouj fullname: Gabbouj, Moncef email: moncef.gabbouj@tut.fi organization: Dept. of Signal Process., Tampere Univ. of Technol., Tampere, Finland |
| BookMark | eNp9UE1LAzEUDFLBtvoDxMvePG3N52ZzLMUvWFBs6zXE-FIi29022Qr-e7Ns6cGDp5nHzDzemwkaNW0DCF0TPCMEq7vVfFnNKE4jxZwUSp2hMRGizKWifHTipLhAkxi_MOas4GSM5HvrLWSLtvmGEH3bZOvom032akLnTZ1VYGKXLfcHEyBmb7BJ0Nsu0bkzdYSrI07R-uF-tXjKq5fH58W8yi1TpMuFgFLhEihn_NNhZ8EJ52RiShnJHXFSUGKMkgaXSWHsIwkUSmmpoVayKbod9u5Cuz9A7PTWRwt1bRpoD1FLwYoy_SuSUw5OG9oYAzhtfWe6dGsXjK81wbpvSvdN6b4pfWwqJcmf5C74rQk__2ZuhowHgJNfcC6wKNgvBxF2Bw |
| CODEN | ITASD8 |
| CitedBy_id | crossref_primary_10_1016_j_engappai_2022_105279 crossref_primary_10_1016_j_asoc_2014_06_040 crossref_primary_10_1109_TASLP_2022_3156757 crossref_primary_10_1109_TASLP_2019_2955289 crossref_primary_10_1155_2014_357048 crossref_primary_10_1109_LSP_2012_2225615 crossref_primary_10_1007_s00034_017_0639_x crossref_primary_10_1007_s10772_020_09691_1 crossref_primary_10_1017_ATSIP_2018_23 crossref_primary_10_1007_s00034_017_0660_0 crossref_primary_10_1109_TASLP_2016_2593263 crossref_primary_10_1587_transfun_E96_A_1946 crossref_primary_10_1007_s11042_015_3039_x crossref_primary_10_1186_1687_4722_2013_28 crossref_primary_10_1016_j_specom_2021_11_006 crossref_primary_10_1186_s13636_019_0160_1 crossref_primary_10_1109_TASLP_2018_2860682 crossref_primary_10_1109_TASLP_2022_3190715 crossref_primary_10_11834_jig_230476 crossref_primary_10_1016_j_specom_2022_09_002 crossref_primary_10_1109_LSP_2019_2961213 crossref_primary_10_1186_s13636_017_0112_6 crossref_primary_10_1186_s13636_017_0116_2 crossref_primary_10_1186_s13636_015_0075_4 crossref_primary_10_1587_transinf_E97_D_1411 crossref_primary_10_1007_s00034_022_01998_5 crossref_primary_10_1007_s00521_015_2030_9 crossref_primary_10_1186_1687_4722_2014_5 crossref_primary_10_1109_TASLP_2017_2723721 crossref_primary_10_1109_TASLP_2019_2923951 crossref_primary_10_1016_j_csl_2014_03_001 crossref_primary_10_1016_j_iot_2020_100180 crossref_primary_10_1145_2738048 crossref_primary_10_1007_s11045_017_0470_3 crossref_primary_10_1109_ACCESS_2021_3065460 crossref_primary_10_1016_j_csl_2021_101243 crossref_primary_10_1016_S1005_8885_17_60234_6 crossref_primary_10_1007_s11760_021_02119_6 crossref_primary_10_1109_TASLP_2019_2910637 crossref_primary_10_3390_pr10081562 crossref_primary_10_1016_j_protcy_2013_12_366 crossref_primary_10_3390_app10010151 crossref_primary_10_1016_j_specom_2017_01_008 crossref_primary_10_1109_ACCESS_2020_2988781 crossref_primary_10_1587_transinf_E97_D_1403 crossref_primary_10_1007_s11042_014_2180_2 crossref_primary_10_1109_TASLP_2020_3001456 crossref_primary_10_1109_TASLP_2019_2917232 crossref_primary_10_1109_TPWRS_2020_2975455 crossref_primary_10_1109_TASLP_2014_2333242 crossref_primary_10_1186_s13636_014_0044_3 crossref_primary_10_3390_signals2030028 crossref_primary_10_1016_j_neucom_2016_07_048 crossref_primary_10_1109_TASL_2011_2165944 crossref_primary_10_1186_s13636_015_0067_4 crossref_primary_10_1109_TASLP_2016_2522643 crossref_primary_10_1109_TASLP_2020_3047262 crossref_primary_10_1109_ACCESS_2019_2923003 crossref_primary_10_1109_TASLPRO_2025_3542288 crossref_primary_10_1109_TPAMI_2023_3257839 crossref_primary_10_1587_transfun_E98_A_2178 crossref_primary_10_1016_j_specom_2014_12_004 crossref_primary_10_1017_ATSIP_2014_17 crossref_primary_10_3390_math11112525 crossref_primary_10_1109_TASLP_2020_3036784 crossref_primary_10_31590_ejosat_780650 crossref_primary_10_1109_TASLP_2020_3038524 crossref_primary_10_4236_jsip_2011_22017 crossref_primary_10_3390_app132111988 crossref_primary_10_1109_ACCESS_2022_3226350 |
| Cites_doi | 10.1109/TC.2007.1079 10.1109/89.326623 10.1023/A:1015727715131 10.21437/Interspeech.2008-419 10.1109/ICASSP.2006.1659961 10.1109/ICASSP.1995.479266 10.1109/89.661472 10.1109/ICASSP.1988.196671 10.1007/978-1-4757-3117-0 10.1109/ICASSP.1998.674423 10.1109/ICASSP.2007.367303 10.1109/ICASSP.2000.862114 10.1016/S0167-6393(98)00085-5 10.1162/neco.1992.4.1.1 10.1109/ICASSP.2007.366961 10.1109/ICASSP.2008.4518697 10.1109/TSA.2005.857790 10.1109/TASL.2007.894511 10.1109/ICASSP.1998.675407 10.21437/Eurospeech.2003-74 10.1016/0169-7439(93)85002-X 10.1109/TASL.2008.2006647 10.1016/j.csl.2005.06.001 10.1109/ICASSP.1992.225951 10.1016/S0098-1354(99)00291-4 10.1109/89.876308 10.21437/Eurospeech.2003-664 10.1109/TASL.2007.907344 |
| ContentType | Journal Article |
| DBID | 97E RIA RIE AAYXX CITATION 7SC 8FD JQ2 L7M L~C L~D |
| DOI | 10.1109/TASL.2010.2041699 |
| DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef Computer and Information Systems Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional |
| DatabaseTitle | CrossRef Computer and Information Systems Abstracts Technology Research Database Computer and Information Systems Abstracts – Academic Advanced Technologies Database with Aerospace ProQuest Computer Science Collection Computer and Information Systems Abstracts Professional |
| DatabaseTitleList | Computer and Information Systems Abstracts |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering |
| EISSN | 1558-7924 |
| EndPage | 921 |
| ExternalDocumentID | 10_1109_TASL_2010_2041699 5445056 |
| Genre | orig-research |
| GroupedDBID | 0R~ 29I 4.4 5GY 5VS 6IK 97E AAJGR AASAJ AAWTH ABAZT ABQJQ ABVLG AETIX AGQYO AGSQL AHBIQ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EBS EJD F5P HZ~ IFIPE IPLJI JAVBF LAI M43 O9- OCL RIA RIE RNS AAYXX CITATION 7SC 8FD JQ2 L7M L~C L~D |
| ID | FETCH-LOGICAL-c391t-55e8908e2434df0fcef5ff7f0f99a74f1f7521aa97a085ff33b99a2e87c2a2c73 |
| IEDL.DBID | RIE |
| ISSN | 1558-7916 |
| IngestDate | Wed Oct 01 13:37:58 EDT 2025 Wed Oct 01 01:44:52 EDT 2025 Thu Apr 24 23:01:43 EDT 2025 Tue Aug 26 16:39:52 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 5 |
| Language | English |
| License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c391t-55e8908e2434df0fcef5ff7f0f99a74f1f7521aa97a085ff33b99a2e87c2a2c73 |
| Notes | ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23 |
| PQID | 753681695 |
| PQPubID | 23500 |
| PageCount | 10 |
| ParticipantIDs | ieee_primary_5445056 crossref_citationtrail_10_1109_TASL_2010_2041699 crossref_primary_10_1109_TASL_2010_2041699 proquest_miscellaneous_753681695 |
| ProviderPackageCode | CITATION AAYXX |
| PublicationCentury | 2000 |
| PublicationDate | 2010-07-01 |
| PublicationDateYYYYMMDD | 2010-07-01 |
| PublicationDate_xml | – month: 07 year: 2010 text: 2010-07-01 day: 01 |
| PublicationDecade | 2010 |
| PublicationTitle | IEEE transactions on audio, speech, and language processing |
| PublicationTitleAbbrev | TASL |
| PublicationYear | 2010 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| References | ref13 ref12 ref15 ref14 ref31 ref33 ref11 ref10 ref2 ref1 ref17 gillet (ref16) 2003 ref19 ref18 chen (ref24) 2003 uto (ref23) 2006 mesbashi (ref22) 2007 helander (ref32) 2008 ref26 ref25 shuang (ref20) 2008 ref21 ref28 ref27 ref29 ref8 ref7 ref9 ref4 ref3 ref6 kominek (ref30) 2003 sndermann (ref5) 2006 |
| References_xml | – ident: ref2 doi: 10.1109/TC.2007.1079 – start-page: 2262 year: 2006 ident: ref5 article-title: Text-independent cross-language voice conversion publication-title: Proc INTERSPEECH – ident: ref25 doi: 10.1109/89.326623 – start-page: 2278 year: 2006 ident: ref23 article-title: voice conversion based on mixtures of factor analyzers publication-title: Proc INTERSPEECH – ident: ref28 doi: 10.1023/A:1015727715131 – start-page: 1453 year: 2008 ident: ref32 article-title: On the impact of alignment on voice conversion performance publication-title: Proc INTERSPEECH doi: 10.21437/Interspeech.2008-419 – ident: ref6 doi: 10.1109/ICASSP.2006.1659961 – ident: ref29 doi: 10.1109/ICASSP.1995.479266 – ident: ref11 doi: 10.1109/89.661472 – ident: ref18 doi: 10.1109/ICASSP.1988.196671 – ident: ref4 doi: 10.1007/978-1-4757-3117-0 – ident: ref12 doi: 10.1109/ICASSP.1998.674423 – ident: ref8 doi: 10.1109/ICASSP.2007.367303 – ident: ref1 doi: 10.1109/ICASSP.2000.862114 – ident: ref31 doi: 10.1016/S0167-6393(98)00085-5 – ident: ref21 doi: 10.1162/neco.1992.4.1.1 – ident: ref17 doi: 10.1109/ICASSP.2007.366961 – ident: ref19 doi: 10.1109/ICASSP.2008.4518697 – year: 2003 ident: ref30 publication-title: CMU ARCTIC Databases for Speech Synthesis – ident: ref10 doi: 10.1109/TSA.2005.857790 – ident: ref3 doi: 10.1109/TASL.2007.894511 – ident: ref15 doi: 10.1109/ICASSP.1998.675407 – start-page: 101 year: 2003 ident: ref16 article-title: transforming f0 contours publication-title: Proc EUROSPEECH doi: 10.21437/Eurospeech.2003-74 – ident: ref27 doi: 10.1016/0169-7439(93)85002-X – start-page: 4661 year: 2008 ident: ref20 article-title: voice conversion by combining frequency warping with unit selection publication-title: Proc ICASSP – ident: ref7 doi: 10.1109/TASL.2008.2006647 – ident: ref13 doi: 10.1016/j.csl.2005.06.001 – ident: ref26 doi: 10.1109/ICASSP.1992.225951 – ident: ref33 doi: 10.1016/S0098-1354(99)00291-4 – start-page: 1989 year: 2007 ident: ref22 article-title: comparing gmm-based speech transformation systems publication-title: Proc INTERSPEECH – ident: ref9 doi: 10.1109/89.876308 – start-page: 2413 year: 2003 ident: ref24 article-title: voice conversion with smoothed gmm and map adaptation publication-title: Proc EUROSPEECH doi: 10.21437/Eurospeech.2003-664 – ident: ref14 doi: 10.1109/TASL.2007.907344 |
| SSID | ssj0043641 |
| Score | 2.395964 |
| Snippet | Voice conversion can be formulated as finding a mapping function which transforms the features of the source speaker to those of the target speaker. Gaussian... |
| SourceID | proquest crossref ieee |
| SourceType | Aggregation Database Enrichment Source Index Database Publisher |
| StartPage | 912 |
| SubjectTerms | Bandwidth Conversion Density Gaussian mixture model (GMM) Least squares method Least squares methods Low pass filters Mapping Mathematical models Narrowband partial least squares regression Signal processing Speech Speech enhancement Speech processing Testing Training data Transforms Virtual colonoscopy Voice voice conversion |
| Title | Voice Conversion Using Partial Least Squares Regression |
| URI | https://ieeexplore.ieee.org/document/5445056 https://www.proquest.com/docview/753681695 |
| Volume | 18 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVIEE databaseName: IEEE Electronic Library (IEL) customDbUrl: eissn: 1558-7924 dateEnd: 20131231 omitProxy: false ssIdentifier: ssj0043641 issn: 1558-7916 databaseCode: RIE dateStart: 20060101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT8MwDLa2neDAayDGSz1wQnRLH2mb4zQxTWhDiG1otypNEw6glkd74dcTp-1AgBC3SI2lyE5jx_7yGeA89VKVKAcz9xGmbiixWURS2025S_WNKxSmTefsJpgs_esVXbXgcv0WRkppwGeyj0NTy09zUWKqbIDEMdpht6EdRkH1Vqs5dX0v8CtuVBohBWNQVzAdwgaL4XxagbhcouMPQ_P66YNMU5UfJ7FxL-NtmDULq1Alj_2ySPri_Rtn439XvgNbdZxpDauNsQstme3B5hf2wS6E97k-JawR4s5N0swy-AHrFneTlp1iWx9r_lLiEyXrTj5UkNlsH5bjq8VoYtd9FGzhMaewKZURI5F0fc9PFVFCKqpUqEeM8dBXjgq1E-echVwHYEp5XqI_uBINxV0RegfQyfJMHoIlqctVoLxUW9YPHMG4kIJQoSgnqXRJD0ij2VjUJOPY6-IpNpcNwmI0RozGiGtj9OBiLfJcMWz8NbmLyl1PrPXaA6sxX6x_D6x58Ezm5Vusb2NBpCXp0e-Sx7BR4QEQgHsCneK1lKc6zCiSM7O_PgAET83O |
| linkProvider | IEEE |
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwED6VMgADr4IozwxMiBTn4SQeq4qqQFoh2qJukevYDKCUR7Pw6_E5SUGAEJul-CTL59zLn78DOE29VE2Vg5X7CEs3lNgsIqntptylOuMKhWnT2R8EvbF_PaGTGpwv3sJIKQ34TLZwaO7y05nIsVR2gcQx2mEvwbIe-bR4rVXZXd8L_IIdlUZIwhiUd5gOYRej9jAuYFwu0RGIIXr99EKmrcoPW2wcTHcD-tXSClzJYyufT1vi_Rtr43_XvgnrZaRptYujsQU1mW3D2hf-wQaE9zNtJ6wOIs9N2cwyCALrFs-Tlo2xsY81fMnxkZJ1Jx8K0Gy2A-Pu5ajTs8tOCrbwmDO3KZURI5F0fc9PFVFCKqpUqEeM8dBXjgq1G-echVyHYEp53lR_cCWqirsi9Hahns0yuQeWpC5XgfJSrVs_cATjQgpChaKcpNIlTSDVziaipBnHbhdPiUk3CEtQGQkqIymV0YSzhchzwbHx1-QGbu5iYrmvTbAq9SX6B8FbD57JWf6W6HwsiLQk3f9d8gRWeqN-nMRXg5sDWC3QAQjHPYT6_DWXRzromE-PzVn7APZ20Rs |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Voice+Conversion+Using+Partial+Least+Squares+Regression&rft.jtitle=IEEE+transactions+on+audio%2C+speech%2C+and+language+processing&rft.au=Helander%2C+Elina&rft.au=Virtanen%2C+Tuomas&rft.au=Nurminen%2C+Jani&rft.au=Gabbouj%2C+Moncef&rft.date=2010-07-01&rft.issn=1558-7916&rft.eissn=1558-7924&rft.volume=18&rft.issue=5&rft.spage=912&rft.epage=921&rft_id=info:doi/10.1109%2FTASL.2010.2041699&rft.externalDBID=NO_FULL_TEXT |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1558-7916&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1558-7916&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1558-7916&client=summon |