Breathing and Speech Planning in Spontaneous Speech Synthesis

Breathing and speech planning in spontaneous speech are coordinated processes, often exhibiting disfluent patterns. While synthetic speech is not subject to respiratory needs, integrating breath into synthesis has advantages for naturalness and recall. At the same time, a synthetic voice reproducing...

Full description

Saved in:

Bibliographic Details
Published in	ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) pp. 7649 - 7653
Main Authors	Szekely, Eva, Henter, Gustav Eje, Beskow, Jonas, Gustafson, Joakim
Format	Conference Proceeding
Language	English
Published	IEEE 01.05.2020
Subjects	Annotations breathing ensemble method Planning Signal processing speech planning Speech synthesis spontaneous speech Stochastic processes Synthesizers Training
Online Access	Get full text
ISSN	2379-190X
DOI	10.1109/ICASSP40776.2020.9054107

Cover

Abstract	Breathing and speech planning in spontaneous speech are coordinated processes, often exhibiting disfluent patterns. While synthetic speech is not subject to respiratory needs, integrating breath into synthesis has advantages for naturalness and recall. At the same time, a synthetic voice reproducing disfluent breathing patterns learned from the data can be problematic. To address this, we first propose training stochastic TTS on a corpus of overlapping breath-group bigrams, to take context into account. Next, we introduce an unsupervised automatic annotation of likely-disfluent breath events, through a product-of-experts model that combines the output of two breath- event predictors, each using complementary information and operating in opposite directions. This annotation enables creating an automatically-breathing spontaneous speech synthesiser with a more fluent breathing style. A subjective evaluation on two spoken genres (impromptu and rehearsed) found the proposed system to be preferred over the baseline approach treating all breath events the same.
AbstractList	Breathing and speech planning in spontaneous speech are coordinated processes, often exhibiting disfluent patterns. While synthetic speech is not subject to respiratory needs, integrating breath into synthesis has advantages for naturalness and recall. At the same time, a synthetic voice reproducing disfluent breathing patterns learned from the data can be problematic. To address this, we first propose training stochastic TTS on a corpus of overlapping breath-group bigrams, to take context into account. Next, we introduce an unsupervised automatic annotation of likely-disfluent breath events, through a product-of-experts model that combines the output of two breath- event predictors, each using complementary information and operating in opposite directions. This annotation enables creating an automatically-breathing spontaneous speech synthesiser with a more fluent breathing style. A subjective evaluation on two spoken genres (impromptu and rehearsed) found the proposed system to be preferred over the baseline approach treating all breath events the same.
Author	Szekely, Eva Henter, Gustav Eje Gustafson, Joakim Beskow, Jonas
Author_xml	– sequence: 1 givenname: Eva surname: Szekely fullname: Szekely, Eva organization: KTH Royal Institute of Technology,Division of Speech, Music and Hearing,Stockholm,Sweden – sequence: 2 givenname: Gustav Eje surname: Henter fullname: Henter, Gustav Eje organization: KTH Royal Institute of Technology,Division of Speech, Music and Hearing,Stockholm,Sweden – sequence: 3 givenname: Jonas surname: Beskow fullname: Beskow, Jonas organization: KTH Royal Institute of Technology,Division of Speech, Music and Hearing,Stockholm,Sweden – sequence: 4 givenname: Joakim surname: Gustafson fullname: Gustafson, Joakim organization: KTH Royal Institute of Technology,Division of Speech, Music and Hearing,Stockholm,Sweden
BookMark	eNo9kN1Kw0AQhVdRsK19Am_yAqkzm83-XHihxapQsBAF78ImmbUrcRu6KdK3t7WtVwPn8B2Gb8guwioQYwnCBBHM7cv0vigWApSSEw4cJgZygaDO2NgojTkYkDLD_JwNeKZMigY-rtgwxi8A0EroAbt7WJPtlz58JjY0SdER1ctk0doQ9pkPu2gVehtotYmnutiGfknRx2t26WwbaXy8I_Y-e3ybPqfz16fdc_PUc4N9arlDoXNJpnGNlA7R1k6jhpx4ZauGeFPr3LkKhbMyqxUXqlEaDClBNa-yETOH3U3o7PbHtm3Zrf23XW9LhHLvovS1jbH7c1HuXZRHFzv25sB6IvrHTu0vCyVeyA
ContentType	Conference Proceeding
DBID	6IE 6IH CBEJK RIE RIO ADTOC UNPAY
DOI	10.1109/ICASSP40776.2020.9054107
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present Unpaywall for CDI: Periodical Content Unpaywall
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher – sequence: 2 dbid: UNPAY name: Unpaywall url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/ sourceTypes: Open Access Repository
DeliveryMethod	fulltext_linktorsrc
Discipline	Engineering
EISBN	9781509066315 1509066314
EISSN	2379-190X
EndPage	7653
ExternalDocumentID	oai:DiVA.org:kth-283731 9054107
Genre	orig-research
GroupedDBID	23M 29P 6IE 6IF 6IH 6IK 6IL 6IM 6IN AAJGR AAWTH ABLEC ACGFS ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP IPLJI M43 OCL RIE RIL RIO RNS ADTOC UNPAY
ID	FETCH-LOGICAL-i291t-a2f14856e9dfd66f11acf81805e2babde2dc85ffb14fa63c7247d7809e74ec2b3
IEDL.DBID	UNPAY
IngestDate	Wed Aug 20 00:18:22 EDT 2025 Wed Aug 27 02:46:53 EDT 2025
IsDoiOpenAccess	false
IsOpenAccess	true
IsPeerReviewed	false
IsScholarly	true
Language	English
License	other-oa
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i291t-a2f14856e9dfd66f11acf81805e2babde2dc85ffb14fa63c7247d7809e74ec2b3
OpenAccessLink	https://proxy.k.utb.cz/login?url=http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-283731
PageCount	5
ParticipantIDs	unpaywall_primary_10_1109_icassp40776_2020_9054107 ieee_primary_9054107
PublicationCentury	2000
PublicationDate	2020-May
PublicationDateYYYYMMDD	2020-05-01
PublicationDate_xml	– month: 05 year: 2020 text: 2020-May
PublicationDecade	2020
PublicationTitle	ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
PublicationTitleAbbrev	ICASSP
PublicationYear	2020
Publisher	IEEE
Publisher_xml	– name: IEEE
SSID	ssj0008748
Score	2.267196
Snippet	Breathing and speech planning in spontaneous speech are coordinated processes, often exhibiting disfluent patterns. While synthetic speech is not subject to...
SourceID	unpaywall ieee
SourceType	Open Access Repository Publisher
StartPage	7649
SubjectTerms	Annotations breathing ensemble method Planning Signal processing speech planning Speech synthesis spontaneous speech Stochastic processes Synthesizers Training
SummonAdditionalLinks	– databaseName: IEEE Electronic Library (IEL) dbid: RIE link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NS8MwFA9zF93Fj02cX_Tg0W5NljbNUYZjCpNBHexW8lUcjrbYFpl_vUm_HOLBW0ia8Hh5NL-XvN97ANwh3-cahWjj5QzZWGJq0_K2yadSCs608IaNvHjx5iv8vHbXHXDfcmGUUmXwmRqZZvmWLxNRmKuyMdX4Ahrq-AEhtOJqtX9dn2C_idRx6Php-hAES2yS1WgnEDmjem5dRKUHDos4ZbtPtt3unSezY7BoJKnCSN5HRc5H4utXksb_inoCBj_MPWvZnkmnoKPiM9DbSzrYB9rzNrBPty0WSytIlRJvVlO8yNrEuiuJNWZUSZE1w8Eu1kgx22QDsJo9vk7ndl1Ewd4gCnOboUh7PK6nqIyk50UQMhEZgrerEGdcKiSF70YRhzhi3kQQhIkkvkMVwUogPjkH3TiJ1QWwBCcCQU94E4mxKzS2ZK4DOSUKyomEfAj6RhFhWuXJCGsdDAFqdd2Olb6HQ0NtiVmWlvsUmn1qJl3-vdYVODJfVRGH16CbfxTqRqOCnN-W5vANnWm2lg priority: 102 providerName: IEEE
Title	Breathing and Speech Planning in Spontaneous Speech Synthesis
URI	https://ieeexplore.ieee.org/document/9054107 http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-283731
UnpaywallVersion	submittedVersion
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1NSwMxEA3SHsSLSitWtOzB69ZNmv3IgoiIpQiWQi3UU8nX0mLJLu62Un-9k3a3LXjSQy4TAskkkPeSeTMI3ZIoEoBC4PAKTlyqKHPZ5rUpYkpJwWHyVo38Ogj6Y_oy8Sf7pwtwXxnkDmQzXaz0A1juocVGmDjX8Ucxi9V8xV2btcXqp-uB_Vmqofp4MHx8r2J1PHYHa8zzjNp0NUADiddhgE6wV5VROUHHS5Px9RdfLA5ulN4pmlS6nG0gyUdnWYiO_P6dpvGvkz1Dzb2MzxnuLqhzdKRNAwHxtqgPDA43yhllWsuZU9UucuYGTKkByKjTZV51j9YGgGI-z5to3Ht-e-q7ZQ0Fd04YLlxOEiA8fqCZSlQQJBhzmVh9t6-J4EJpomTkJ4nANOFBV4aEhiqMPKZDqiUR3QtUM6nRl8iRIpQEBzLoKkp9CdCS-x4WLNRYdRUWLdSwjp1m2zQZ09LfLUR2jt71baiHx6YHmzS1m1QNuvrPoGtUKz6X-gZQQiHaGylfuzwXP4AlwgQ
linkProvider	Unpaywall
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LawIxEB7EHqyXPrTUPvfQY1c3MfvIsUhFWxVBBW9LXkulskrXpdhf32R1t1J66C0kzDJMhs03yXwzAA84CLhGIdp5OcM2kYTaNLttCqiUgjOtvGEjD0deb0Ze5u68BI8FF0YplSWfqaYZZm_5ciVSc1XWohpfIEMdP3J1VOHv2FrFfzfwSZDn6ji01e88TSZjYsrV6DAQO8299L6NShUqabxm20-2XB6cKN0TGOa67BJJ3pvphjfF168yjf9V9hTqP9w9a1ycSmdQUvE5VA_KDtZAx94G-OmxxWJpTdZKiTcrb19kLWI9tYo1alSrNMmXJ9tYY8VkkdRh1n2ednr2vo2CvcAUbWyGIx3zuJ6iMpKeFyHERGQo3q7CnHGpsBSBG0UckYh5beFj4ks_cKjyiRKYty-gHK9idQmW4L7AyBNeWxLiCo0umesgTn2FZFsi3oCaMUS43lXKCPc2aAAubF2sZdGHQ0Pti0myzvYpNPuUC139_a17qPSmw0E46I9er-HYSOzyD2-gvPlI1a3GCBt-l7nGN2Oguec
linkToUnpaywall	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1NSwMxEA3SHsSLSitWVPbgdesmm_3IgoiIpQiWQi3UU8nX0tKSLu62Un-9k3a3LXjSQy4TAskkkPeSeTMI3ZE4FoBC4PAKTlyqKHPZ5rUpZkpJwWHyVo381gu7Q_o6Ckb7pwtwXxnkDmRzMV_pR7A8QEuMMEmuk1kxSdR0xV2btcXqp-uh_Vmqofqw13_6qGJ1PHYPa8zzjNp0NUADiddmgE6wV5VROUHHS5Px9Refzw9ulM4pGlW6nG0gyay9LERbfv9O0_jXyZ6h5l7G5_R3F9Q5OtKmgYB4W9QHBocb5QwyreXEqWoXOVMDpoUByKgXy7zqHqwNAMV8mjfRsPPy_tx1yxoK7pQwXLicpEB4glAzlaowTDHmMrX67kATwYXSRMk4SFOBacpDX0aERiqKPaYjqiUR_gWqmYXRl8iRIpIEhzL0FaWBBGjJAw8LFmmsfIVFCzWsY8fZNk3GuPR3C5Gdo3d9G-rhsfHBJo3tJlWDrv4z6BrVis-lvgGUUIjb8kT8APlTwQM
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+of+the+...+IEEE+International+Conference+on+Acoustics%2C+Speech+and+Signal+Processing+%281998%29&rft.atitle=Breathing+and+Speech+Planning+in+Spontaneous+Speech+Synthesis&rft.au=Szekely%2C+Eva&rft.au=Henter%2C+Gustav+Eje&rft.au=Beskow%2C+Jonas&rft.au=Gustafson%2C+Joakim&rft.date=2020-05-01&rft.pub=IEEE&rft.eissn=2379-190X&rft.spage=7649&rft.epage=7653&rft_id=info:doi/10.1109%2FICASSP40776.2020.9054107&rft.externalDocID=9054107