Breathing and Speech Planning in Spontaneous Speech Synthesis

Breathing and speech planning in spontaneous speech are coordinated processes, often exhibiting disfluent patterns. While synthetic speech is not subject to respiratory needs, integrating breath into synthesis has advantages for naturalness and recall. At the same time, a synthetic voice reproducing...

Full description

Saved in:
Bibliographic Details
Published inICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) pp. 7649 - 7653
Main Authors Szekely, Eva, Henter, Gustav Eje, Beskow, Jonas, Gustafson, Joakim
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.05.2020
Subjects
Online AccessGet full text
ISSN2379-190X
DOI10.1109/ICASSP40776.2020.9054107

Cover

Abstract Breathing and speech planning in spontaneous speech are coordinated processes, often exhibiting disfluent patterns. While synthetic speech is not subject to respiratory needs, integrating breath into synthesis has advantages for naturalness and recall. At the same time, a synthetic voice reproducing disfluent breathing patterns learned from the data can be problematic. To address this, we first propose training stochastic TTS on a corpus of overlapping breath-group bigrams, to take context into account. Next, we introduce an unsupervised automatic annotation of likely-disfluent breath events, through a product-of-experts model that combines the output of two breath- event predictors, each using complementary information and operating in opposite directions. This annotation enables creating an automatically-breathing spontaneous speech synthesiser with a more fluent breathing style. A subjective evaluation on two spoken genres (impromptu and rehearsed) found the proposed system to be preferred over the baseline approach treating all breath events the same.
AbstractList Breathing and speech planning in spontaneous speech are coordinated processes, often exhibiting disfluent patterns. While synthetic speech is not subject to respiratory needs, integrating breath into synthesis has advantages for naturalness and recall. At the same time, a synthetic voice reproducing disfluent breathing patterns learned from the data can be problematic. To address this, we first propose training stochastic TTS on a corpus of overlapping breath-group bigrams, to take context into account. Next, we introduce an unsupervised automatic annotation of likely-disfluent breath events, through a product-of-experts model that combines the output of two breath- event predictors, each using complementary information and operating in opposite directions. This annotation enables creating an automatically-breathing spontaneous speech synthesiser with a more fluent breathing style. A subjective evaluation on two spoken genres (impromptu and rehearsed) found the proposed system to be preferred over the baseline approach treating all breath events the same.
Author Szekely, Eva
Henter, Gustav Eje
Gustafson, Joakim
Beskow, Jonas
Author_xml – sequence: 1
  givenname: Eva
  surname: Szekely
  fullname: Szekely, Eva
  organization: KTH Royal Institute of Technology,Division of Speech, Music and Hearing,Stockholm,Sweden
– sequence: 2
  givenname: Gustav Eje
  surname: Henter
  fullname: Henter, Gustav Eje
  organization: KTH Royal Institute of Technology,Division of Speech, Music and Hearing,Stockholm,Sweden
– sequence: 3
  givenname: Jonas
  surname: Beskow
  fullname: Beskow, Jonas
  organization: KTH Royal Institute of Technology,Division of Speech, Music and Hearing,Stockholm,Sweden
– sequence: 4
  givenname: Joakim
  surname: Gustafson
  fullname: Gustafson, Joakim
  organization: KTH Royal Institute of Technology,Division of Speech, Music and Hearing,Stockholm,Sweden
BookMark eNo9kN1Kw0AQhVdRsK19Am_yAqkzm83-XHihxapQsBAF78ImmbUrcRu6KdK3t7WtVwPn8B2Gb8guwioQYwnCBBHM7cv0vigWApSSEw4cJgZygaDO2NgojTkYkDLD_JwNeKZMigY-rtgwxi8A0EroAbt7WJPtlz58JjY0SdER1ctk0doQ9pkPu2gVehtotYmnutiGfknRx2t26WwbaXy8I_Y-e3ybPqfz16fdc_PUc4N9arlDoXNJpnGNlA7R1k6jhpx4ZauGeFPr3LkKhbMyqxUXqlEaDClBNa-yETOH3U3o7PbHtm3Zrf23XW9LhHLvovS1jbH7c1HuXZRHFzv25sB6IvrHTu0vCyVeyA
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
ADTOC
UNPAY
DOI 10.1109/ICASSP40776.2020.9054107
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
Unpaywall for CDI: Periodical Content
Unpaywall
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
– sequence: 2
  dbid: UNPAY
  name: Unpaywall
  url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISBN 9781509066315
1509066314
EISSN 2379-190X
EndPage 7653
ExternalDocumentID oai:DiVA.org:kth-283731
9054107
Genre orig-research
GroupedDBID 23M
29P
6IE
6IF
6IH
6IK
6IL
6IM
6IN
AAJGR
AAWTH
ABLEC
ACGFS
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IJVOP
IPLJI
M43
OCL
RIE
RIL
RIO
RNS
ADTOC
UNPAY
ID FETCH-LOGICAL-i291t-a2f14856e9dfd66f11acf81805e2babde2dc85ffb14fa63c7247d7809e74ec2b3
IEDL.DBID UNPAY
IngestDate Wed Aug 20 00:18:22 EDT 2025
Wed Aug 27 02:46:53 EDT 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed false
IsScholarly true
Language English
License other-oa
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i291t-a2f14856e9dfd66f11acf81805e2babde2dc85ffb14fa63c7247d7809e74ec2b3
OpenAccessLink https://proxy.k.utb.cz/login?url=http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-283731
PageCount 5
ParticipantIDs unpaywall_primary_10_1109_icassp40776_2020_9054107
ieee_primary_9054107
PublicationCentury 2000
PublicationDate 2020-May
PublicationDateYYYYMMDD 2020-05-01
PublicationDate_xml – month: 05
  year: 2020
  text: 2020-May
PublicationDecade 2020
PublicationTitle ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
PublicationTitleAbbrev ICASSP
PublicationYear 2020
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0008748
Score 2.267196
Snippet Breathing and speech planning in spontaneous speech are coordinated processes, often exhibiting disfluent patterns. While synthetic speech is not subject to...
SourceID unpaywall
ieee
SourceType Open Access Repository
Publisher
StartPage 7649
SubjectTerms Annotations
breathing
ensemble method
Planning
Signal processing
speech planning
Speech synthesis
spontaneous speech
Stochastic processes
Synthesizers
Training
SummonAdditionalLinks – databaseName: IEEE Electronic Library (IEL)
  dbid: RIE
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NS8MwFA9zF93Fj02cX_Tg0W5NljbNUYZjCpNBHexW8lUcjrbYFpl_vUm_HOLBW0ia8Hh5NL-XvN97ANwh3-cahWjj5QzZWGJq0_K2yadSCs608IaNvHjx5iv8vHbXHXDfcmGUUmXwmRqZZvmWLxNRmKuyMdX4Ahrq-AEhtOJqtX9dn2C_idRx6Php-hAES2yS1WgnEDmjem5dRKUHDos4ZbtPtt3unSezY7BoJKnCSN5HRc5H4utXksb_inoCBj_MPWvZnkmnoKPiM9DbSzrYB9rzNrBPty0WSytIlRJvVlO8yNrEuiuJNWZUSZE1w8Eu1kgx22QDsJo9vk7ndl1Ewd4gCnOboUh7PK6nqIyk50UQMhEZgrerEGdcKiSF70YRhzhi3kQQhIkkvkMVwUogPjkH3TiJ1QWwBCcCQU94E4mxKzS2ZK4DOSUKyomEfAj6RhFhWuXJCGsdDAFqdd2Olb6HQ0NtiVmWlvsUmn1qJl3-vdYVODJfVRGH16CbfxTqRqOCnN-W5vANnWm2lg
  priority: 102
  providerName: IEEE
Title Breathing and Speech Planning in Spontaneous Speech Synthesis
URI https://ieeexplore.ieee.org/document/9054107
http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-283731
UnpaywallVersion submittedVersion
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1NSwMxEA3SHsSLSitWtOzB69ZNmv3IgoiIpQiWQi3UU8nX0mLJLu62Un-9k3a3LXjSQy4TAskkkPeSeTMI3ZIoEoBC4PAKTlyqKHPZ5rUpYkpJwWHyVo38Ogj6Y_oy8Sf7pwtwXxnkDmQzXaz0A1juocVGmDjX8Ucxi9V8xV2btcXqp-uB_Vmqofp4MHx8r2J1PHYHa8zzjNp0NUADiddhgE6wV5VROUHHS5Px9RdfLA5ulN4pmlS6nG0gyUdnWYiO_P6dpvGvkz1Dzb2MzxnuLqhzdKRNAwHxtqgPDA43yhllWsuZU9UucuYGTKkByKjTZV51j9YGgGI-z5to3Ht-e-q7ZQ0Fd04YLlxOEiA8fqCZSlQQJBhzmVh9t6-J4EJpomTkJ4nANOFBV4aEhiqMPKZDqiUR3QtUM6nRl8iRIpQEBzLoKkp9CdCS-x4WLNRYdRUWLdSwjp1m2zQZ09LfLUR2jt71baiHx6YHmzS1m1QNuvrPoGtUKz6X-gZQQiHaGylfuzwXP4AlwgQ
linkProvider Unpaywall
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LawIxEB7EHqyXPrTUPvfQY1c3MfvIsUhFWxVBBW9LXkulskrXpdhf32R1t1J66C0kzDJMhs03yXwzAA84CLhGIdp5OcM2kYTaNLttCqiUgjOtvGEjD0deb0Ze5u68BI8FF0YplSWfqaYZZm_5ciVSc1XWohpfIEMdP3J1VOHv2FrFfzfwSZDn6ji01e88TSZjYsrV6DAQO8299L6NShUqabxm20-2XB6cKN0TGOa67BJJ3pvphjfF168yjf9V9hTqP9w9a1ycSmdQUvE5VA_KDtZAx94G-OmxxWJpTdZKiTcrb19kLWI9tYo1alSrNMmXJ9tYY8VkkdRh1n2ednr2vo2CvcAUbWyGIx3zuJ6iMpKeFyHERGQo3q7CnHGpsBSBG0UckYh5beFj4ks_cKjyiRKYty-gHK9idQmW4L7AyBNeWxLiCo0umesgTn2FZFsi3oCaMUS43lXKCPc2aAAubF2sZdGHQ0Pti0myzvYpNPuUC139_a17qPSmw0E46I9er-HYSOzyD2-gvPlI1a3GCBt-l7nGN2Oguec
linkToUnpaywall http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1NSwMxEA3SHsSLSitWVPbgdesmm_3IgoiIpQiWQi3UU8nX0tKSLu62Un-9k3a3LXjSQy4TAskkkPeSeTMI3ZE4FoBC4PAKTlyqKHPZ5rUpZkpJwWHyVo381gu7Q_o6Ckb7pwtwXxnkDmRzMV_pR7A8QEuMMEmuk1kxSdR0xV2btcXqp-uh_Vmqofqw13_6qGJ1PHYPa8zzjNp0NUADiddmgE6wV5VROUHHS5Px9Refzw9ulM4pGlW6nG0gyay9LERbfv9O0_jXyZ6h5l7G5_R3F9Q5OtKmgYB4W9QHBocb5QwyreXEqWoXOVMDpoUByKgXy7zqHqwNAMV8mjfRsPPy_tx1yxoK7pQwXLicpEB4glAzlaowTDHmMrX67kATwYXSRMk4SFOBacpDX0aERiqKPaYjqiUR_gWqmYXRl8iRIpIEhzL0FaWBBGjJAw8LFmmsfIVFCzWsY8fZNk3GuPR3C5Gdo3d9G-rhsfHBJo3tJlWDrv4z6BrVis-lvgGUUIjb8kT8APlTwQM
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+of+the+...+IEEE+International+Conference+on+Acoustics%2C+Speech+and+Signal+Processing+%281998%29&rft.atitle=Breathing+and+Speech+Planning+in+Spontaneous+Speech+Synthesis&rft.au=Szekely%2C+Eva&rft.au=Henter%2C+Gustav+Eje&rft.au=Beskow%2C+Jonas&rft.au=Gustafson%2C+Joakim&rft.date=2020-05-01&rft.pub=IEEE&rft.eissn=2379-190X&rft.spage=7649&rft.epage=7653&rft_id=info:doi/10.1109%2FICASSP40776.2020.9054107&rft.externalDocID=9054107