Voice conversion from non-parallel corpora using variational auto-encoder
We propose a flexible framework for spectral conversion (SC) that facilitates training with unaligned corpora. Many SC frameworks require parallel corpora, phonetic alignments, or explicit frame-wise correspondence for learning conversion functions or for synthesizing a target spectrum with the aid...
        Saved in:
      
    
          | Published in | 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) pp. 1 - 6 | 
|---|---|
| Main Authors | , , , , | 
| Format | Conference Proceeding | 
| Language | English | 
| Published | 
            Asia Pacific Signal and Information Processing Association
    
        01.12.2016
     | 
| Subjects | |
| Online Access | Get full text | 
| DOI | 10.1109/APSIPA.2016.7820786 | 
Cover
| Abstract | We propose a flexible framework for spectral conversion (SC) that facilitates training with unaligned corpora. Many SC frameworks require parallel corpora, phonetic alignments, or explicit frame-wise correspondence for learning conversion functions or for synthesizing a target spectrum with the aid of alignments. However, these requirements gravely limit the scope of practical applications of SC due to scarcity or even unavailability of parallel corpora. We propose an SC framework based on variational auto-encoder which enables us to exploit non-parallel corpora. The framework comprises an encoder that learns speaker-independent phonetic representations and a decoder that learns to reconstruct the designated speaker. It removes the requirement of parallel corpora or phonetic alignments to train a spectral conversion system. We report objective and subjective evaluations to validate our proposed method and compare it to SC methods that have access to aligned corpora. | 
    
|---|---|
| AbstractList | We propose a flexible framework for spectral conversion (SC) that facilitates training with unaligned corpora. Many SC frameworks require parallel corpora, phonetic alignments, or explicit frame-wise correspondence for learning conversion functions or for synthesizing a target spectrum with the aid of alignments. However, these requirements gravely limit the scope of practical applications of SC due to scarcity or even unavailability of parallel corpora. We propose an SC framework based on variational auto-encoder which enables us to exploit non-parallel corpora. The framework comprises an encoder that learns speaker-independent phonetic representations and a decoder that learns to reconstruct the designated speaker. It removes the requirement of parallel corpora or phonetic alignments to train a spectral conversion system. We report objective and subjective evaluations to validate our proposed method and compare it to SC methods that have access to aligned corpora. | 
    
| Author | Chin-Cheng Hsu Hsin-Te Hwang Hsin-Min Wang Yu Tsao Yi-Chiao Wu  | 
    
| Author_xml | – sequence: 1 surname: Chin-Cheng Hsu fullname: Chin-Cheng Hsu email: jeremycchsu@iis.sinica.edu.tw organization: Inst. of Inf. Sci., Taipei, Taiwan – sequence: 2 surname: Hsin-Te Hwang fullname: Hsin-Te Hwang email: hwanght@iis.sinica.edu.tw organization: Inst. of Inf. Sci., Taipei, Taiwan – sequence: 3 surname: Yi-Chiao Wu fullname: Yi-Chiao Wu email: tedwu@iis.sinica.edu.tw organization: Inst. of Inf. Sci., Taipei, Taiwan – sequence: 4 surname: Yu Tsao fullname: Yu Tsao email: yu.tsao@citi.sinica.edu.tw organization: Res. Center for Inf. Technol. Innovation, Taipei, Taiwan – sequence: 5 surname: Hsin-Min Wang fullname: Hsin-Min Wang email: whm@iis.sinica.edu.tw organization: Inst. of Inf. Sci., Taipei, Taiwan  | 
    
| BookMark | eNotj8tqwzAURFVoFk2aL8hGP2BXV0pkaWlCH4ZAAw3dhiv5uggcychOoH9fQ7OaxRkOM0v2GFMkxjYgSgBhX-rjV3OsSylAl5WRojL6gS2tMbCttJHmiTXfKXjiPsUb5TGkyLucLnz2FANm7HvqZ5iHlJFfxxB_-A1zwGluYs_xOqWCok8t5We26LAfaX3PFTu9vZ72H8Xh873Z14ciyC1MhUNHSoBRptu1O0Rr0UHbakAF1hsnyJIW2nYSnFdOWVEpZeQ8F61zoFZs868NRHQecrhg_j3fz6k_OoxKew | 
    
| ContentType | Conference Proceeding | 
    
| DBID | 6IE 6IL CBEJK RIE RIL  | 
    
| DOI | 10.1109/APSIPA.2016.7820786 | 
    
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Xplore Digital Library IEEE Proceedings Order Plans (POP All) 1998-Present  | 
    
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher  | 
    
| DeliveryMethod | fulltext_linktorsrc | 
    
| EISBN | 9881476828 9789881476821  | 
    
| EndPage | 6 | 
    
| ExternalDocumentID | 7820786 | 
    
| Genre | orig-research | 
    
| GroupedDBID | 6IE 6IL CBEJK RIE RIL  | 
    
| ID | FETCH-LOGICAL-i241t-babe301838f5d5aa99ab1dd61a319c8b0e9e6069f21bc3b39073382768a9bb13 | 
    
| IEDL.DBID | RIE | 
    
| IngestDate | Thu Jun 29 18:38:22 EDT 2023 | 
    
| IsPeerReviewed | false | 
    
| IsScholarly | false | 
    
| Language | English | 
    
| LinkModel | DirectLink | 
    
| MergedId | FETCHMERGED-LOGICAL-i241t-babe301838f5d5aa99ab1dd61a319c8b0e9e6069f21bc3b39073382768a9bb13 | 
    
| PageCount | 6 | 
    
| ParticipantIDs | ieee_primary_7820786 | 
    
| PublicationCentury | 2000 | 
    
| PublicationDate | 2016-12 | 
    
| PublicationDateYYYYMMDD | 2016-12-01 | 
    
| PublicationDate_xml | – month: 12 year: 2016 text: 2016-12  | 
    
| PublicationDecade | 2010 | 
    
| PublicationTitle | 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) | 
    
| PublicationTitleAbbrev | APSIPA | 
    
| PublicationYear | 2016 | 
    
| Publisher | Asia Pacific Signal and Information Processing Association | 
    
| Publisher_xml | – name: Asia Pacific Signal and Information Processing Association | 
    
| Score | 2.19643 | 
    
| Snippet | We propose a flexible framework for spectral conversion (SC) that facilitates training with unaligned corpora. Many SC frameworks require parallel corpora,... | 
    
| SourceID | ieee | 
    
| SourceType | Publisher | 
    
| StartPage | 1 | 
    
| SubjectTerms | Adaptation models Artificial neural networks Decoding Speech Speech recognition Training  | 
    
| Title | Voice conversion from non-parallel corpora using variational auto-encoder | 
    
| URI | https://ieeexplore.ieee.org/document/7820786 | 
    
| hasFullText | 1 | 
    
| inHoldings | 1 | 
    
| isFullTextHit | |
| isPrint | |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NSwMxEA1tT55UWvGbHDyabZP9MDkWsbRCpWCV3kpmMytisVJ2PfjrnWTXiuLByxI2gYQMwzCT994wdgGpjcGoQmQmAZEUmRFQKCdsjE5D7lQcSGHTu2z8kNwu0kWLXW65MIgYwGcY-WF4y3frvPKlsr7XdrvSWZu16VtztRohITkw_eHsfjIberRWFjUrf7RMCRFjtMumX3vVQJGXqCohyj9-yTD-9zB7rPfNzeOzbdTZZy187bLJ45ocngcIeah_cU8b4ZTaC6_tvVrhiiaDZjH3SPcn_k45clMH5LYq18ILWjrc9Nh8dDO_HoumSYJ4puBbCrCA5KQ61kXqUmuNsSCdy6Ql58o1DNAgJSmmUBLyGGLjuzRqRVmGNQAyPmAdOgweMi6lzXBgnfFvc2Q57RLKNpTDVNFQqiPW9bewfKtlMJbNBRz__fuE7XhL1MiPU9YpNxWeUfwu4TwY7hM-rZ4s | 
    
| linkProvider | IEEE | 
    
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NSwMxEB1qPehJpRW_zcGjaZvsh5tjEUurbSlYpbeS2cyKWKyUXQ_-epPsWlE8eAu7CxsyDMObvPcG4AIjHaCSGY9ViDzMYsUxk4brgEyCqZGBF4WNxnH_IbydRbMaXK61METkyWfUckt_l2-WaeFaZW3n7XaVxBuwGYVhGJVqrcpKSHRUuzu5H0y6jq8Vt6pvfwxN8TWjtwOjr7-VVJGXVpFjK_34ZcT43-3sQvNbnccm67qzBzV6bcDgcWlTnnkSue-AMSccYRbcc-fuvVjQwr70rsXMcd2f2LtFyVUnkOkiX3JnaWlo1YRp72Z63efVmAT-bMtvzlEj2TRNgiSLTKS1UhqFMbHQNr3SBDukyMIUlUmBaYCBcnMaE2lxhlaIItiHut0MHQATQsfU0Ua52zkbu8SEFm9IQ5G0SyEPoeFOYf5WGmHMqwM4-vvxOWz1p6PhfDgY3x3DtotKyQM5gXq-KujUVvMcz3wQPwG2p6F5 | 
    
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2016+Asia-Pacific+Signal+and+Information+Processing+Association+Annual+Summit+and+Conference+%28APSIPA%29&rft.atitle=Voice+conversion+from+non-parallel+corpora+using+variational+auto-encoder&rft.au=Chin-Cheng+Hsu&rft.au=Hsin-Te+Hwang&rft.au=Yi-Chiao+Wu&rft.au=Yu+Tsao&rft.date=2016-12-01&rft.pub=Asia+Pacific+Signal+and+Information+Processing+Association&rft.spage=1&rft.epage=6&rft_id=info:doi/10.1109%2FAPSIPA.2016.7820786&rft.externalDocID=7820786 |