电视剧语音识别中的半监督自动语音分割算法

TP918; 针对具有大段连续文本标注、但无时间标签的电视剧语音提出了一种半监督自动语音分割算法.首先采用原始的标注文本构建一个有偏的语言模型, 然后将该语言模型以一种半监督的方式用于电视剧语音识别中, 最后利用自动语音识别的解码结果对传统的基于距离度量、模型分类以及基于音素识别的语音分割算法进行改进.在英国科幻电视剧"神秘博士"数据集合上的实验结果表明, 提出的半监督自动语音分割算法能够取得明显优于传统语音分割算法的性能, 不仅有效解决了电视剧语音识别中大段连续音频的自动分割问题, 还能对相应的大段连续文本标注进行分段, 保证分割后各语音段时间标签及其对应文本的准确性....

Full description

Saved in:
Bibliographic Details
Published in数据采集与处理 Vol. 34; no. 2; pp. 281 - 287
Main Authors 龙艳花, 茅红伟, 叶宏
Format Journal Article
LanguageChinese
Published 上海师范大学信息与机电工程学院,上海,200234 01.03.2019
Subjects
Online AccessGet full text
ISSN1004-9037
DOI10.16337/j.1004-9037.2019.02.010

Cover

Abstract TP918; 针对具有大段连续文本标注、但无时间标签的电视剧语音提出了一种半监督自动语音分割算法.首先采用原始的标注文本构建一个有偏的语言模型, 然后将该语言模型以一种半监督的方式用于电视剧语音识别中, 最后利用自动语音识别的解码结果对传统的基于距离度量、模型分类以及基于音素识别的语音分割算法进行改进.在英国科幻电视剧"神秘博士"数据集合上的实验结果表明, 提出的半监督自动语音分割算法能够取得明显优于传统语音分割算法的性能, 不仅有效解决了电视剧语音识别中大段连续音频的自动分割问题, 还能对相应的大段连续文本标注进行分段, 保证分割后各语音段时间标签及其对应文本的准确性.
AbstractList TP918; 针对具有大段连续文本标注、但无时间标签的电视剧语音提出了一种半监督自动语音分割算法.首先采用原始的标注文本构建一个有偏的语言模型, 然后将该语言模型以一种半监督的方式用于电视剧语音识别中, 最后利用自动语音识别的解码结果对传统的基于距离度量、模型分类以及基于音素识别的语音分割算法进行改进.在英国科幻电视剧"神秘博士"数据集合上的实验结果表明, 提出的半监督自动语音分割算法能够取得明显优于传统语音分割算法的性能, 不仅有效解决了电视剧语音识别中大段连续音频的自动分割问题, 还能对相应的大段连续文本标注进行分段, 保证分割后各语音段时间标签及其对应文本的准确性.
Abstract_FL To deal with the speech segmentation of TV-drama which has large coherent text transcriptions but no time-stamps, an automatic semi-supervised speech segmentation algorithm is proposed in the paper.Firstly, the original text transcriptions are used to build a biased language model, then the model is applied to the TV-drama speech recognition in a semi-supervised way, and finally, the resulting automatic speech decoding hypothesis are well combined with the traditional segmentation methods to improve the performances of speech segmentation. These traditional methods are usually based on the distance metric, model classification and the phone recognizers. Experimental results on the British TV-drama"Doctor Who"database demonstrate that, the proposed approach can achieve significant performance improvement over traditional baseline algorithms. Meanwhile, the proposed approach allows high quality segmentation and the associated transcription alignments for the large coherent TV-drama speech recordings.
Author 龙艳花
茅红伟
叶宏
AuthorAffiliation 上海师范大学信息与机电工程学院,上海,200234
AuthorAffiliation_xml – name: 上海师范大学信息与机电工程学院,上海,200234
Author_FL Long Yanhua
Ye Hong
Mao Hongwei
Author_FL_xml – sequence: 1
  fullname: Long Yanhua
– sequence: 2
  fullname: Mao Hongwei
– sequence: 3
  fullname: Ye Hong
Author_xml – sequence: 1
  fullname: 龙艳花
– sequence: 2
  fullname: 茅红伟
– sequence: 3
  fullname: 叶宏
BookMark eNrjYmDJy89LZWBQMDTQMzQzNjbXz9IzNDAw0bU0MDbXMzIwtNQzMNIzMDRgYeCEi3Mw8BYXZyYZGBmbmZgZWppxMtg8n7L1xfK2p53LX6xf-3L-5hfr2552rH6yY-3zWS1Pe7uez574fO7iF-2rnnatgCh42gFUvOn5uunPNk_lYWBNS8wpTuWF0twMoW6uIc4euj7-7p7Ojj66xYYGxga6aYamxmkphiZpJpaWKRZJlqkmqSkGyUaWFsnJqaZJxoZp5hapiRYphomWZoaGJiYGBkCB5JRkc_MkCyMjy5Q0Y24GDYi55Yl5aYl56fFZ-aVFeUAb44uzkrMqk3NA3jUAEgbGAG4pYlE
ClassificationCodes TP918
ContentType Journal Article
Copyright Copyright © Wanfang Data Co. Ltd. All Rights Reserved.
Copyright_xml – notice: Copyright © Wanfang Data Co. Ltd. All Rights Reserved.
DBID 2B.
4A8
92I
93N
PSX
TCJ
DOI 10.16337/j.1004-9037.2019.02.010
DatabaseName Wanfang Data Journals - Hong Kong
WANFANG Data Centre
Wanfang Data Journals
万方数据期刊 - 香港版
China Online Journals (COJ)
China Online Journals (COJ)
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
DocumentTitle_FL Semi-supervised Automatic Speech Segmentation for TV-drama Speech Recognition
EndPage 287
ExternalDocumentID sjcjycl201902010
GrantInformation_xml – fundername: 上海市青年科技英才扬帆计划; 国家自然科学基金
  funderid: (14YF1409300); (61701306)
GroupedDBID 2B.
4A8
92I
93N
ADMLS
ALMA_UNASSIGNED_HOLDINGS
PSX
TCJ
ID FETCH-LOGICAL-s1030-f153fd14f499d8b9e4ed0c298cce5b31f78ea8d1a96114400f78cdc77b8229df3
ISSN 1004-9037
IngestDate Thu May 29 04:00:12 EDT 2025
IsPeerReviewed false
IsScholarly true
Issue 2
Keywords speech transcription
semi-supervised
speech recognition
半监督
语音标注
语音识别
Language Chinese
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-s1030-f153fd14f499d8b9e4ed0c298cce5b31f78ea8d1a96114400f78cdc77b8229df3
PageCount 7
ParticipantIDs wanfang_journals_sjcjycl201902010
PublicationCentury 2000
PublicationDate 2019-03-01
PublicationDateYYYYMMDD 2019-03-01
PublicationDate_xml – month: 03
  year: 2019
  text: 2019-03-01
  day: 01
PublicationDecade 2010
PublicationTitle 数据采集与处理
PublicationTitle_FL Journal of Data Acquisition & Processing
PublicationYear 2019
Publisher 上海师范大学信息与机电工程学院,上海,200234
Publisher_xml – name: 上海师范大学信息与机电工程学院,上海,200234
SSID ssib023646196
ssib001102757
ssib000459638
ssib001164671
ssib006568634
ssib002264227
ssib036439733
ssib057620134
ssib023167944
ssib051372606
Score 2.2076814
Snippet TP918; 针对具有大段连续文本标注、但无时间标签的电视剧语音提出了一种半监督自动语音分割算法.首先采用原始的标注文本构建一个有偏的语言模型, 然后将该语言模型以一种半...
SourceID wanfang
SourceType Aggregation Database
StartPage 281
Title 电视剧语音识别中的半监督自动语音分割算法
URI https://d.wanfangdata.com.cn/periodical/sjcjycl201902010
Volume 34
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVEBS
  databaseName: Inspec with Full Text
  issn: 1004-9037
  databaseCode: ADMLS
  dateStart: 20180901
  customDbUrl:
  isFulltext: true
  dateEnd: 99991231
  titleUrlDefault: https://www.ebsco.com/products/research-databases/inspec-full-text
  omitProxy: false
  ssIdentifier: ssib057620134
  providerName: EBSCOhost
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1Pa9RAFA91e_Eiior_qeAcUzNJJpkBL5NsliLWiy30VjbJRCmyBXd7sGeptSAiKIIiHjxUlFqwCH4g112_he9NZpNUV6jCEoY3b9783rzdzJvZeW8s65oCF5qybmBjsi_b51TZ3KOZXcDcLFJK81DHwizeDhaW_ZsrbGXm2LPGqaWNQTqfbU6NK_kfqwIN7IpRsv9g2UooEKAM9oUnWBieR7IxSUIifBIxknAiQ8IDkjDCBZaR0iGyTRJBRIdEnqEYHihHJIG2XPOAHEm4r6vahEtNiYigutAmUjfnIZFS80gi-bQutOQKRuRic5kQAXgCZCivupx4w0gUjEQOFqBf4ARR0At8UGakRWmQPEGZ0tcgAZJDeLWniLwRdCI0RmF0BYwRrVmAEBPOsHEE6F0tNwbcNQtgBi00eEDCO80dEQzC8po7IhNcUmsGSoTYDil80plnIEtdBeMkS21gzKjWmOL41frBYMQIrraqFis1aBhvHjXkCNQX1HDjKUjcGE_FmP3bcsrBo0DCKVPfTOakmqHaGjATTHnBjfFV3NJZ-WMaDDydSWFtvhKOhxiFzk5rThEfTjLeX8vWHmb3kclxdcjirAvTpNOyZmV78dad5grg0CsbvEc3bPxxTjFfXWOFgXHartsMZQ54UGckdDERQyM0G68zgCV95SF72mGuVySMeiGswKt6hiDN2ZGJqua8Ho7B9b-MgI7Q6xXd3t2GM7l00jphVoFzsvxJn7JmNu-dtm6MXnwd724Nn-yO9_d-vjsY728Ntz99_7Y3ev1o-HRn9Ob56O378eOPw50PJcNwG5i_jD6_-nHw8oy13EmW4gXbXG5i9_FmP3gZMq_IqV_4QuQ8FcpXuZO5gmeZYqlHi5CrLs9pVwQUD2A4QMjyLAxTvKIhL7yzVqu33lPnrDmmWA614ITCSKrASwvGClg5q64KVFik562rRttV8_Lqr_5u8AtH4LloHa9_bpes1uDBhroMLvkgvWK-Jr8ALkKijw
linkProvider EBSCOhost
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=%E7%94%B5%E8%A7%86%E5%89%A7%E8%AF%AD%E9%9F%B3%E8%AF%86%E5%88%AB%E4%B8%AD%E7%9A%84%E5%8D%8A%E7%9B%91%E7%9D%A3%E8%87%AA%E5%8A%A8%E8%AF%AD%E9%9F%B3%E5%88%86%E5%89%B2%E7%AE%97%E6%B3%95&rft.jtitle=%E6%95%B0%E6%8D%AE%E9%87%87%E9%9B%86%E4%B8%8E%E5%A4%84%E7%90%86&rft.au=%E9%BE%99%E8%89%B3%E8%8A%B1&rft.au=%E8%8C%85%E7%BA%A2%E4%BC%9F&rft.au=%E5%8F%B6%E5%AE%8F&rft.date=2019-03-01&rft.pub=%E4%B8%8A%E6%B5%B7%E5%B8%88%E8%8C%83%E5%A4%A7%E5%AD%A6%E4%BF%A1%E6%81%AF%E4%B8%8E%E6%9C%BA%E7%94%B5%E5%B7%A5%E7%A8%8B%E5%AD%A6%E9%99%A2%2C%E4%B8%8A%E6%B5%B7%2C200234&rft.issn=1004-9037&rft.volume=34&rft.issue=2&rft.spage=281&rft.epage=287&rft_id=info:doi/10.16337%2Fj.1004-9037.2019.02.010&rft.externalDocID=sjcjycl201902010
thumbnail_s http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=http%3A%2F%2Fwww.wanfangdata.com.cn%2Fimages%2FPeriodicalImages%2Fsjcjycl%2Fsjcjycl.jpg