Breathing and Speech Planning in Spontaneous Speech Synthesis

Breathing and speech planning in spontaneous speech are coordinated processes, often exhibiting disfluent patterns. While synthetic speech is not subject to respiratory needs, integrating breath into synthesis has advantages for naturalness and recall. At the same time, a synthetic voice reproducing...

Full description

Saved in:
Bibliographic Details
Published inICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) pp. 7649 - 7653
Main Authors Szekely, Eva, Henter, Gustav Eje, Beskow, Jonas, Gustafson, Joakim
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.05.2020
Subjects
Online AccessGet full text
ISSN2379-190X
DOI10.1109/ICASSP40776.2020.9054107

Cover

More Information
Summary:Breathing and speech planning in spontaneous speech are coordinated processes, often exhibiting disfluent patterns. While synthetic speech is not subject to respiratory needs, integrating breath into synthesis has advantages for naturalness and recall. At the same time, a synthetic voice reproducing disfluent breathing patterns learned from the data can be problematic. To address this, we first propose training stochastic TTS on a corpus of overlapping breath-group bigrams, to take context into account. Next, we introduce an unsupervised automatic annotation of likely-disfluent breath events, through a product-of-experts model that combines the output of two breath- event predictors, each using complementary information and operating in opposite directions. This annotation enables creating an automatically-breathing spontaneous speech synthesiser with a more fluent breathing style. A subjective evaluation on two spoken genres (impromptu and rehearsed) found the proposed system to be preferred over the baseline approach treating all breath events the same.
ISSN:2379-190X
DOI:10.1109/ICASSP40776.2020.9054107