Breathing and Speech Planning in Spontaneous Speech Synthesis
Breathing and speech planning in spontaneous speech are coordinated processes, often exhibiting disfluent patterns. While synthetic speech is not subject to respiratory needs, integrating breath into synthesis has advantages for naturalness and recall. At the same time, a synthetic voice reproducing...
Saved in:
Published in | ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) pp. 7649 - 7653 |
---|---|
Main Authors | , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.05.2020
|
Subjects | |
Online Access | Get full text |
ISSN | 2379-190X |
DOI | 10.1109/ICASSP40776.2020.9054107 |
Cover
Abstract | Breathing and speech planning in spontaneous speech are coordinated processes, often exhibiting disfluent patterns. While synthetic speech is not subject to respiratory needs, integrating breath into synthesis has advantages for naturalness and recall. At the same time, a synthetic voice reproducing disfluent breathing patterns learned from the data can be problematic. To address this, we first propose training stochastic TTS on a corpus of overlapping breath-group bigrams, to take context into account. Next, we introduce an unsupervised automatic annotation of likely-disfluent breath events, through a product-of-experts model that combines the output of two breath- event predictors, each using complementary information and operating in opposite directions. This annotation enables creating an automatically-breathing spontaneous speech synthesiser with a more fluent breathing style. A subjective evaluation on two spoken genres (impromptu and rehearsed) found the proposed system to be preferred over the baseline approach treating all breath events the same. |
---|---|
AbstractList | Breathing and speech planning in spontaneous speech are coordinated processes, often exhibiting disfluent patterns. While synthetic speech is not subject to respiratory needs, integrating breath into synthesis has advantages for naturalness and recall. At the same time, a synthetic voice reproducing disfluent breathing patterns learned from the data can be problematic. To address this, we first propose training stochastic TTS on a corpus of overlapping breath-group bigrams, to take context into account. Next, we introduce an unsupervised automatic annotation of likely-disfluent breath events, through a product-of-experts model that combines the output of two breath- event predictors, each using complementary information and operating in opposite directions. This annotation enables creating an automatically-breathing spontaneous speech synthesiser with a more fluent breathing style. A subjective evaluation on two spoken genres (impromptu and rehearsed) found the proposed system to be preferred over the baseline approach treating all breath events the same. |
Author | Szekely, Eva Henter, Gustav Eje Gustafson, Joakim Beskow, Jonas |
Author_xml | – sequence: 1 givenname: Eva surname: Szekely fullname: Szekely, Eva organization: KTH Royal Institute of Technology,Division of Speech, Music and Hearing,Stockholm,Sweden – sequence: 2 givenname: Gustav Eje surname: Henter fullname: Henter, Gustav Eje organization: KTH Royal Institute of Technology,Division of Speech, Music and Hearing,Stockholm,Sweden – sequence: 3 givenname: Jonas surname: Beskow fullname: Beskow, Jonas organization: KTH Royal Institute of Technology,Division of Speech, Music and Hearing,Stockholm,Sweden – sequence: 4 givenname: Joakim surname: Gustafson fullname: Gustafson, Joakim organization: KTH Royal Institute of Technology,Division of Speech, Music and Hearing,Stockholm,Sweden |
BookMark | eNo9kN1Kw0AQhVdRsK19Am_yAqkzm83-XHihxapQsBAF78ImmbUrcRu6KdK3t7WtVwPn8B2Gb8guwioQYwnCBBHM7cv0vigWApSSEw4cJgZygaDO2NgojTkYkDLD_JwNeKZMigY-rtgwxi8A0EroAbt7WJPtlz58JjY0SdER1ctk0doQ9pkPu2gVehtotYmnutiGfknRx2t26WwbaXy8I_Y-e3ybPqfz16fdc_PUc4N9arlDoXNJpnGNlA7R1k6jhpx4ZauGeFPr3LkKhbMyqxUXqlEaDClBNa-yETOH3U3o7PbHtm3Zrf23XW9LhHLvovS1jbH7c1HuXZRHFzv25sB6IvrHTu0vCyVeyA |
ContentType | Conference Proceeding |
DBID | 6IE 6IH CBEJK RIE RIO ADTOC UNPAY |
DOI | 10.1109/ICASSP40776.2020.9054107 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present Unpaywall for CDI: Periodical Content Unpaywall |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher – sequence: 2 dbid: UNPAY name: Unpaywall url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/ sourceTypes: Open Access Repository |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Engineering |
EISBN | 9781509066315 1509066314 |
EISSN | 2379-190X |
EndPage | 7653 |
ExternalDocumentID | oai:DiVA.org:kth-283731 9054107 |
Genre | orig-research |
GroupedDBID | 23M 29P 6IE 6IF 6IH 6IK 6IL 6IM 6IN AAJGR AAWTH ABLEC ACGFS ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP IPLJI M43 OCL RIE RIL RIO RNS ADTOC UNPAY |
ID | FETCH-LOGICAL-i291t-a2f14856e9dfd66f11acf81805e2babde2dc85ffb14fa63c7247d7809e74ec2b3 |
IEDL.DBID | UNPAY |
IngestDate | Wed Aug 20 00:18:22 EDT 2025 Wed Aug 27 02:46:53 EDT 2025 |
IsDoiOpenAccess | false |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | true |
Language | English |
License | other-oa |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-i291t-a2f14856e9dfd66f11acf81805e2babde2dc85ffb14fa63c7247d7809e74ec2b3 |
OpenAccessLink | https://proxy.k.utb.cz/login?url=http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-283731 |
PageCount | 5 |
ParticipantIDs | unpaywall_primary_10_1109_icassp40776_2020_9054107 ieee_primary_9054107 |
PublicationCentury | 2000 |
PublicationDate | 2020-May |
PublicationDateYYYYMMDD | 2020-05-01 |
PublicationDate_xml | – month: 05 year: 2020 text: 2020-May |
PublicationDecade | 2020 |
PublicationTitle | ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) |
PublicationTitleAbbrev | ICASSP |
PublicationYear | 2020 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
SSID | ssj0008748 |
Score | 2.267196 |
Snippet | Breathing and speech planning in spontaneous speech are coordinated processes, often exhibiting disfluent patterns. While synthetic speech is not subject to... |
SourceID | unpaywall ieee |
SourceType | Open Access Repository Publisher |
StartPage | 7649 |
SubjectTerms | Annotations breathing ensemble method Planning Signal processing speech planning Speech synthesis spontaneous speech Stochastic processes Synthesizers Training |
SummonAdditionalLinks | – databaseName: IEEE Electronic Library (IEL) dbid: RIE link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NS8MwFA9zF93Fj02cX_Tg0W5NljbNUYZjCpNBHexW8lUcjrbYFpl_vUm_HOLBW0ia8Hh5NL-XvN97ANwh3-cahWjj5QzZWGJq0_K2yadSCs608IaNvHjx5iv8vHbXHXDfcmGUUmXwmRqZZvmWLxNRmKuyMdX4Ahrq-AEhtOJqtX9dn2C_idRx6Php-hAES2yS1WgnEDmjem5dRKUHDos4ZbtPtt3unSezY7BoJKnCSN5HRc5H4utXksb_inoCBj_MPWvZnkmnoKPiM9DbSzrYB9rzNrBPty0WSytIlRJvVlO8yNrEuiuJNWZUSZE1w8Eu1kgx22QDsJo9vk7ndl1Ewd4gCnOboUh7PK6nqIyk50UQMhEZgrerEGdcKiSF70YRhzhi3kQQhIkkvkMVwUogPjkH3TiJ1QWwBCcCQU94E4mxKzS2ZK4DOSUKyomEfAj6RhFhWuXJCGsdDAFqdd2Olb6HQ0NtiVmWlvsUmn1qJl3-vdYVODJfVRGH16CbfxTqRqOCnN-W5vANnWm2lg priority: 102 providerName: IEEE |
Title | Breathing and Speech Planning in Spontaneous Speech Synthesis |
URI | https://ieeexplore.ieee.org/document/9054107 http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-283731 |
UnpaywallVersion | submittedVersion |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1NSwMxEA3SHsSLSitWtOzB69ZNmv3IgoiIpQiWQi3UU8nX0mLJLu62Un-9k3a3LXjSQy4TAskkkPeSeTMI3ZIoEoBC4PAKTlyqKHPZ5rUpYkpJwWHyVo38Ogj6Y_oy8Sf7pwtwXxnkDmQzXaz0A1juocVGmDjX8Ucxi9V8xV2btcXqp-uB_Vmqofp4MHx8r2J1PHYHa8zzjNp0NUADiddhgE6wV5VROUHHS5Px9RdfLA5ulN4pmlS6nG0gyUdnWYiO_P6dpvGvkz1Dzb2MzxnuLqhzdKRNAwHxtqgPDA43yhllWsuZU9UucuYGTKkByKjTZV51j9YGgGI-z5to3Ht-e-q7ZQ0Fd04YLlxOEiA8fqCZSlQQJBhzmVh9t6-J4EJpomTkJ4nANOFBV4aEhiqMPKZDqiUR3QtUM6nRl8iRIpQEBzLoKkp9CdCS-x4WLNRYdRUWLdSwjp1m2zQZ09LfLUR2jt71baiHx6YHmzS1m1QNuvrPoGtUKz6X-gZQQiHaGylfuzwXP4AlwgQ |
linkProvider | Unpaywall |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LawIxEB7EHqyXPrTUPvfQY1c3MfvIsUhFWxVBBW9LXkulskrXpdhf32R1t1J66C0kzDJMhs03yXwzAA84CLhGIdp5OcM2kYTaNLttCqiUgjOtvGEjD0deb0Ze5u68BI8FF0YplSWfqaYZZm_5ciVSc1XWohpfIEMdP3J1VOHv2FrFfzfwSZDn6ji01e88TSZjYsrV6DAQO8299L6NShUqabxm20-2XB6cKN0TGOa67BJJ3pvphjfF168yjf9V9hTqP9w9a1ycSmdQUvE5VA_KDtZAx94G-OmxxWJpTdZKiTcrb19kLWI9tYo1alSrNMmXJ9tYY8VkkdRh1n2ednr2vo2CvcAUbWyGIx3zuJ6iMpKeFyHERGQo3q7CnHGpsBSBG0UckYh5beFj4ks_cKjyiRKYty-gHK9idQmW4L7AyBNeWxLiCo0umesgTn2FZFsi3oCaMUS43lXKCPc2aAAubF2sZdGHQ0Pti0myzvYpNPuUC139_a17qPSmw0E46I9er-HYSOzyD2-gvPlI1a3GCBt-l7nGN2Oguec |
linkToUnpaywall | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1NSwMxEA3SHsSLSitWVPbgdesmm_3IgoiIpQiWQi3UU8nX0tKSLu62Un-9k3a3LXjSQy4TAskkkPeSeTMI3ZE4FoBC4PAKTlyqKHPZ5rUpZkpJwWHyVo381gu7Q_o6Ckb7pwtwXxnkDmRzMV_pR7A8QEuMMEmuk1kxSdR0xV2btcXqp-uh_Vmqofqw13_6qGJ1PHYPa8zzjNp0NUADiddmgE6wV5VROUHHS5Px9Refzw9ulM4pGlW6nG0gyay9LERbfv9O0_jXyZ6h5l7G5_R3F9Q5OtKmgYB4W9QHBocb5QwyreXEqWoXOVMDpoUByKgXy7zqHqwNAMV8mjfRsPPy_tx1yxoK7pQwXLicpEB4glAzlaowTDHmMrX67kATwYXSRMk4SFOBacpDX0aERiqKPaYjqiUR_gWqmYXRl8iRIpIEhzL0FaWBBGjJAw8LFmmsfIVFCzWsY8fZNk3GuPR3C5Gdo3d9G-rhsfHBJo3tJlWDrv4z6BrVis-lvgGUUIjb8kT8APlTwQM |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+of+the+...+IEEE+International+Conference+on+Acoustics%2C+Speech+and+Signal+Processing+%281998%29&rft.atitle=Breathing+and+Speech+Planning+in+Spontaneous+Speech+Synthesis&rft.au=Szekely%2C+Eva&rft.au=Henter%2C+Gustav+Eje&rft.au=Beskow%2C+Jonas&rft.au=Gustafson%2C+Joakim&rft.date=2020-05-01&rft.pub=IEEE&rft.eissn=2379-190X&rft.spage=7649&rft.epage=7653&rft_id=info:doi/10.1109%2FICASSP40776.2020.9054107&rft.externalDocID=9054107 |