Voice Conversion Using Partial Least Squares Regression

Voice conversion can be formulated as finding a mapping function which transforms the features of the source speaker to those of the target speaker. Gaussian mixture model (GMM)-based conversion is commonly used, but it is subject to overfitting. In this paper, we propose to use partial least square...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on audio, speech, and language processing Vol. 18; no. 5; pp. 912 - 921
Main Authors Helander, Elina, Virtanen, Tuomas, Nurminen, Jani, Gabbouj, Moncef
Format Journal Article
LanguageEnglish
Published IEEE 01.07.2010
Subjects
Online AccessGet full text
ISSN1558-7916
1558-7924
DOI10.1109/TASL.2010.2041699

Cover

Abstract Voice conversion can be formulated as finding a mapping function which transforms the features of the source speaker to those of the target speaker. Gaussian mixture model (GMM)-based conversion is commonly used, but it is subject to overfitting. In this paper, we propose to use partial least squares (PLS)-based transforms in voice conversion. To prevent overfitting, the degrees of freedom in the mapping can be controlled by choosing a suitable number of components. We propose a technique to combine PLS with GMMs, enabling the use of multiple local linear mappings. To further improve the perceptual quality of the mapping where rapid transitions between GMM components produce audible artefacts, we propose to low-pass filter the component posterior probabilities. The conducted experiments show that the proposed technique results in better subjective and objective quality than the baseline joint density GMM approach. In speech quality conversion preference tests, the proposed method achieved 67% preference score against the smoothed joint density GMM method and 84% preference score against the unsmoothed joint density GMM method. In objective tests the proposed method produced a lower Mel-cepstral distortion than the reference methods.
AbstractList Voice conversion can be formulated as finding a mapping function which transforms the features of the source speaker to those of the target speaker. Gaussian mixture model (GMM)-based conversion is commonly used, but it is subject to overfitting. In this paper, we propose to use partial least squares (PLS)-based transforms in voice conversion. To prevent overfitting, the degrees of freedom in the mapping can be controlled by choosing a suitable number of components. We propose a technique to combine PLS with GMMs, enabling the use of multiple local linear mappings. To further improve the perceptual quality of the mapping where rapid transitions between GMM components produce audible artefacts, we propose to low-pass filter the component posterior probabilities. The conducted experiments show that the proposed technique results in better subjective and objective quality than the baseline joint density GMM approach. In speech quality conversion preference tests, the proposed method achieved 67% preference score against the smoothed joint density GMM method and 84% preference score against the unsmoothed joint density GMM method. In objective tests the proposed method produced a lower Mel-cepstral distortion than the reference methods.
Author Nurminen, Jani
Gabbouj, Moncef
Virtanen, Tuomas
Helander, Elina
Author_xml – sequence: 1
  givenname: Elina
  surname: Helander
  fullname: Helander, Elina
  email: elina.helander@tut.fi
  organization: Dept. of Signal Process., Tampere Univ. of Technol., Tampere, Finland
– sequence: 2
  givenname: Tuomas
  surname: Virtanen
  fullname: Virtanen, Tuomas
  email: tuomas.virtanen@tut.fi
  organization: Dept. of Signal Process., Tampere Univ. of Technol., Tampere, Finland
– sequence: 3
  givenname: Jani
  surname: Nurminen
  fullname: Nurminen, Jani
  email: jani.k.nurminen@nokia.com
  organization: Nokia Devices R&D, Tampere, Finland
– sequence: 4
  givenname: Moncef
  surname: Gabbouj
  fullname: Gabbouj, Moncef
  email: moncef.gabbouj@tut.fi
  organization: Dept. of Signal Process., Tampere Univ. of Technol., Tampere, Finland
BookMark eNp9UE1LAzEUDFLBtvoDxMvePG3N52ZzLMUvWFBs6zXE-FIi29022Qr-e7Ns6cGDp5nHzDzemwkaNW0DCF0TPCMEq7vVfFnNKE4jxZwUSp2hMRGizKWifHTipLhAkxi_MOas4GSM5HvrLWSLtvmGEH3bZOvom032akLnTZ1VYGKXLfcHEyBmb7BJ0Nsu0bkzdYSrI07R-uF-tXjKq5fH58W8yi1TpMuFgFLhEihn_NNhZ8EJ52RiShnJHXFSUGKMkgaXSWHsIwkUSmmpoVayKbod9u5Cuz9A7PTWRwt1bRpoD1FLwYoy_SuSUw5OG9oYAzhtfWe6dGsXjK81wbpvSvdN6b4pfWwqJcmf5C74rQk__2ZuhowHgJNfcC6wKNgvBxF2Bw
CODEN ITASD8
CitedBy_id crossref_primary_10_1016_j_engappai_2022_105279
crossref_primary_10_1016_j_asoc_2014_06_040
crossref_primary_10_1109_TASLP_2022_3156757
crossref_primary_10_1109_TASLP_2019_2955289
crossref_primary_10_1155_2014_357048
crossref_primary_10_1109_LSP_2012_2225615
crossref_primary_10_1007_s00034_017_0639_x
crossref_primary_10_1007_s10772_020_09691_1
crossref_primary_10_1017_ATSIP_2018_23
crossref_primary_10_1007_s00034_017_0660_0
crossref_primary_10_1109_TASLP_2016_2593263
crossref_primary_10_1587_transfun_E96_A_1946
crossref_primary_10_1007_s11042_015_3039_x
crossref_primary_10_1186_1687_4722_2013_28
crossref_primary_10_1016_j_specom_2021_11_006
crossref_primary_10_1186_s13636_019_0160_1
crossref_primary_10_1109_TASLP_2018_2860682
crossref_primary_10_1109_TASLP_2022_3190715
crossref_primary_10_11834_jig_230476
crossref_primary_10_1016_j_specom_2022_09_002
crossref_primary_10_1109_LSP_2019_2961213
crossref_primary_10_1186_s13636_017_0112_6
crossref_primary_10_1186_s13636_017_0116_2
crossref_primary_10_1186_s13636_015_0075_4
crossref_primary_10_1587_transinf_E97_D_1411
crossref_primary_10_1007_s00034_022_01998_5
crossref_primary_10_1007_s00521_015_2030_9
crossref_primary_10_1186_1687_4722_2014_5
crossref_primary_10_1109_TASLP_2017_2723721
crossref_primary_10_1109_TASLP_2019_2923951
crossref_primary_10_1016_j_csl_2014_03_001
crossref_primary_10_1016_j_iot_2020_100180
crossref_primary_10_1145_2738048
crossref_primary_10_1007_s11045_017_0470_3
crossref_primary_10_1109_ACCESS_2021_3065460
crossref_primary_10_1016_j_csl_2021_101243
crossref_primary_10_1016_S1005_8885_17_60234_6
crossref_primary_10_1007_s11760_021_02119_6
crossref_primary_10_1109_TASLP_2019_2910637
crossref_primary_10_3390_pr10081562
crossref_primary_10_1016_j_protcy_2013_12_366
crossref_primary_10_3390_app10010151
crossref_primary_10_1016_j_specom_2017_01_008
crossref_primary_10_1109_ACCESS_2020_2988781
crossref_primary_10_1587_transinf_E97_D_1403
crossref_primary_10_1007_s11042_014_2180_2
crossref_primary_10_1109_TASLP_2020_3001456
crossref_primary_10_1109_TASLP_2019_2917232
crossref_primary_10_1109_TPWRS_2020_2975455
crossref_primary_10_1109_TASLP_2014_2333242
crossref_primary_10_1186_s13636_014_0044_3
crossref_primary_10_3390_signals2030028
crossref_primary_10_1016_j_neucom_2016_07_048
crossref_primary_10_1109_TASL_2011_2165944
crossref_primary_10_1186_s13636_015_0067_4
crossref_primary_10_1109_TASLP_2016_2522643
crossref_primary_10_1109_TASLP_2020_3047262
crossref_primary_10_1109_ACCESS_2019_2923003
crossref_primary_10_1109_TASLPRO_2025_3542288
crossref_primary_10_1109_TPAMI_2023_3257839
crossref_primary_10_1587_transfun_E98_A_2178
crossref_primary_10_1016_j_specom_2014_12_004
crossref_primary_10_1017_ATSIP_2014_17
crossref_primary_10_3390_math11112525
crossref_primary_10_1109_TASLP_2020_3036784
crossref_primary_10_31590_ejosat_780650
crossref_primary_10_1109_TASLP_2020_3038524
crossref_primary_10_4236_jsip_2011_22017
crossref_primary_10_3390_app132111988
crossref_primary_10_1109_ACCESS_2022_3226350
Cites_doi 10.1109/TC.2007.1079
10.1109/89.326623
10.1023/A:1015727715131
10.21437/Interspeech.2008-419
10.1109/ICASSP.2006.1659961
10.1109/ICASSP.1995.479266
10.1109/89.661472
10.1109/ICASSP.1988.196671
10.1007/978-1-4757-3117-0
10.1109/ICASSP.1998.674423
10.1109/ICASSP.2007.367303
10.1109/ICASSP.2000.862114
10.1016/S0167-6393(98)00085-5
10.1162/neco.1992.4.1.1
10.1109/ICASSP.2007.366961
10.1109/ICASSP.2008.4518697
10.1109/TSA.2005.857790
10.1109/TASL.2007.894511
10.1109/ICASSP.1998.675407
10.21437/Eurospeech.2003-74
10.1016/0169-7439(93)85002-X
10.1109/TASL.2008.2006647
10.1016/j.csl.2005.06.001
10.1109/ICASSP.1992.225951
10.1016/S0098-1354(99)00291-4
10.1109/89.876308
10.21437/Eurospeech.2003-664
10.1109/TASL.2007.907344
ContentType Journal Article
DBID 97E
RIA
RIE
AAYXX
CITATION
7SC
8FD
JQ2
L7M
L~C
L~D
DOI 10.1109/TASL.2010.2041699
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
Computer and Information Systems Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Computer and Information Systems Abstracts
Technology Research Database
Computer and Information Systems Abstracts – Academic
Advanced Technologies Database with Aerospace
ProQuest Computer Science Collection
Computer and Information Systems Abstracts Professional
DatabaseTitleList Computer and Information Systems Abstracts

Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 1558-7924
EndPage 921
ExternalDocumentID 10_1109_TASL_2010_2041699
5445056
Genre orig-research
GroupedDBID 0R~
29I
4.4
5GY
5VS
6IK
97E
AAJGR
AASAJ
AAWTH
ABAZT
ABQJQ
ABVLG
AETIX
AGQYO
AGSQL
AHBIQ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
EBS
EJD
F5P
HZ~
IFIPE
IPLJI
JAVBF
LAI
M43
O9-
OCL
RIA
RIE
RNS
AAYXX
CITATION
7SC
8FD
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c391t-55e8908e2434df0fcef5ff7f0f99a74f1f7521aa97a085ff33b99a2e87c2a2c73
IEDL.DBID RIE
ISSN 1558-7916
IngestDate Wed Oct 01 13:37:58 EDT 2025
Wed Oct 01 01:44:52 EDT 2025
Thu Apr 24 23:01:43 EDT 2025
Tue Aug 26 16:39:52 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 5
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c391t-55e8908e2434df0fcef5ff7f0f99a74f1f7521aa97a085ff33b99a2e87c2a2c73
Notes ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 23
PQID 753681695
PQPubID 23500
PageCount 10
ParticipantIDs ieee_primary_5445056
crossref_citationtrail_10_1109_TASL_2010_2041699
crossref_primary_10_1109_TASL_2010_2041699
proquest_miscellaneous_753681695
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2010-07-01
PublicationDateYYYYMMDD 2010-07-01
PublicationDate_xml – month: 07
  year: 2010
  text: 2010-07-01
  day: 01
PublicationDecade 2010
PublicationTitle IEEE transactions on audio, speech, and language processing
PublicationTitleAbbrev TASL
PublicationYear 2010
Publisher IEEE
Publisher_xml – name: IEEE
References ref13
ref12
ref15
ref14
ref31
ref33
ref11
ref10
ref2
ref1
ref17
gillet (ref16) 2003
ref19
ref18
chen (ref24) 2003
uto (ref23) 2006
mesbashi (ref22) 2007
helander (ref32) 2008
ref26
ref25
shuang (ref20) 2008
ref21
ref28
ref27
ref29
ref8
ref7
ref9
ref4
ref3
ref6
kominek (ref30) 2003
sndermann (ref5) 2006
References_xml – ident: ref2
  doi: 10.1109/TC.2007.1079
– start-page: 2262
  year: 2006
  ident: ref5
  article-title: Text-independent cross-language voice conversion
  publication-title: Proc INTERSPEECH
– ident: ref25
  doi: 10.1109/89.326623
– start-page: 2278
  year: 2006
  ident: ref23
  article-title: voice conversion based on mixtures of factor analyzers
  publication-title: Proc INTERSPEECH
– ident: ref28
  doi: 10.1023/A:1015727715131
– start-page: 1453
  year: 2008
  ident: ref32
  article-title: On the impact of alignment on voice conversion performance
  publication-title: Proc INTERSPEECH
  doi: 10.21437/Interspeech.2008-419
– ident: ref6
  doi: 10.1109/ICASSP.2006.1659961
– ident: ref29
  doi: 10.1109/ICASSP.1995.479266
– ident: ref11
  doi: 10.1109/89.661472
– ident: ref18
  doi: 10.1109/ICASSP.1988.196671
– ident: ref4
  doi: 10.1007/978-1-4757-3117-0
– ident: ref12
  doi: 10.1109/ICASSP.1998.674423
– ident: ref8
  doi: 10.1109/ICASSP.2007.367303
– ident: ref1
  doi: 10.1109/ICASSP.2000.862114
– ident: ref31
  doi: 10.1016/S0167-6393(98)00085-5
– ident: ref21
  doi: 10.1162/neco.1992.4.1.1
– ident: ref17
  doi: 10.1109/ICASSP.2007.366961
– ident: ref19
  doi: 10.1109/ICASSP.2008.4518697
– year: 2003
  ident: ref30
  publication-title: CMU ARCTIC Databases for Speech Synthesis
– ident: ref10
  doi: 10.1109/TSA.2005.857790
– ident: ref3
  doi: 10.1109/TASL.2007.894511
– ident: ref15
  doi: 10.1109/ICASSP.1998.675407
– start-page: 101
  year: 2003
  ident: ref16
  article-title: transforming f0 contours
  publication-title: Proc EUROSPEECH
  doi: 10.21437/Eurospeech.2003-74
– ident: ref27
  doi: 10.1016/0169-7439(93)85002-X
– start-page: 4661
  year: 2008
  ident: ref20
  article-title: voice conversion by combining frequency warping with unit selection
  publication-title: Proc ICASSP
– ident: ref7
  doi: 10.1109/TASL.2008.2006647
– ident: ref13
  doi: 10.1016/j.csl.2005.06.001
– ident: ref26
  doi: 10.1109/ICASSP.1992.225951
– ident: ref33
  doi: 10.1016/S0098-1354(99)00291-4
– start-page: 1989
  year: 2007
  ident: ref22
  article-title: comparing gmm-based speech transformation systems
  publication-title: Proc INTERSPEECH
– ident: ref9
  doi: 10.1109/89.876308
– start-page: 2413
  year: 2003
  ident: ref24
  article-title: voice conversion with smoothed gmm and map adaptation
  publication-title: Proc EUROSPEECH
  doi: 10.21437/Eurospeech.2003-664
– ident: ref14
  doi: 10.1109/TASL.2007.907344
SSID ssj0043641
Score 2.395964
Snippet Voice conversion can be formulated as finding a mapping function which transforms the features of the source speaker to those of the target speaker. Gaussian...
SourceID proquest
crossref
ieee
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 912
SubjectTerms Bandwidth
Conversion
Density
Gaussian mixture model (GMM)
Least squares method
Least squares methods
Low pass filters
Mapping
Mathematical models
Narrowband
partial least squares regression
Signal processing
Speech
Speech enhancement
Speech processing
Testing
Training data
Transforms
Virtual colonoscopy
Voice
voice conversion
Title Voice Conversion Using Partial Least Squares Regression
URI https://ieeexplore.ieee.org/document/5445056
https://www.proquest.com/docview/753681695
Volume 18
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVIEE
  databaseName: IEEE Electronic Library (IEL)
  customDbUrl:
  eissn: 1558-7924
  dateEnd: 20131231
  omitProxy: false
  ssIdentifier: ssj0043641
  issn: 1558-7916
  databaseCode: RIE
  dateStart: 20060101
  isFulltext: true
  titleUrlDefault: https://ieeexplore.ieee.org/
  providerName: IEEE
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT8MwDLa2neDAayDGSz1wQnRLH2mb4zQxTWhDiG1otypNEw6glkd74dcTp-1AgBC3SI2lyE5jx_7yGeA89VKVKAcz9xGmbiixWURS2025S_WNKxSmTefsJpgs_esVXbXgcv0WRkppwGeyj0NTy09zUWKqbIDEMdpht6EdRkH1Vqs5dX0v8CtuVBohBWNQVzAdwgaL4XxagbhcouMPQ_P66YNMU5UfJ7FxL-NtmDULq1Alj_2ySPri_Rtn439XvgNbdZxpDauNsQstme3B5hf2wS6E97k-JawR4s5N0swy-AHrFneTlp1iWx9r_lLiEyXrTj5UkNlsH5bjq8VoYtd9FGzhMaewKZURI5F0fc9PFVFCKqpUqEeM8dBXjgq1E-echVwHYEp5XqI_uBINxV0RegfQyfJMHoIlqctVoLxUW9YPHMG4kIJQoSgnqXRJD0ij2VjUJOPY6-IpNpcNwmI0RozGiGtj9OBiLfJcMWz8NbmLyl1PrPXaA6sxX6x_D6x58Ezm5Vusb2NBpCXp0e-Sx7BR4QEQgHsCneK1lKc6zCiSM7O_PgAET83O
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwED6VMgADr4IozwxMiBTn4SQeq4qqQFoh2qJukevYDKCUR7Pw6_E5SUGAEJul-CTL59zLn78DOE29VE2Vg5X7CEs3lNgsIqntptylOuMKhWnT2R8EvbF_PaGTGpwv3sJIKQ34TLZwaO7y05nIsVR2gcQx2mEvwbIe-bR4rVXZXd8L_IIdlUZIwhiUd5gOYRej9jAuYFwu0RGIIXr99EKmrcoPW2wcTHcD-tXSClzJYyufT1vi_Rtr43_XvgnrZaRptYujsQU1mW3D2hf-wQaE9zNtJ6wOIs9N2cwyCALrFs-Tlo2xsY81fMnxkZJ1Jx8K0Gy2A-Pu5ajTs8tOCrbwmDO3KZURI5F0fc9PFVFCKqpUqEeM8dBXjgq1G-echVyHYEp53lR_cCWqirsi9Hahns0yuQeWpC5XgfJSrVs_cATjQgpChaKcpNIlTSDVziaipBnHbhdPiUk3CEtQGQkqIymV0YSzhchzwbHx1-QGbu5iYrmvTbAq9SX6B8FbD57JWf6W6HwsiLQk3f9d8gRWeqN-nMRXg5sDWC3QAQjHPYT6_DWXRzromE-PzVn7APZ20Rs
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Voice+Conversion+Using+Partial+Least+Squares+Regression&rft.jtitle=IEEE+transactions+on+audio%2C+speech%2C+and+language+processing&rft.au=Helander%2C+Elina&rft.au=Virtanen%2C+Tuomas&rft.au=Nurminen%2C+Jani&rft.au=Gabbouj%2C+Moncef&rft.date=2010-07-01&rft.issn=1558-7916&rft.eissn=1558-7924&rft.volume=18&rft.issue=5&rft.spage=912&rft.epage=921&rft_id=info:doi/10.1109%2FTASL.2010.2041699&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1558-7916&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1558-7916&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1558-7916&client=summon