Automatic indexing of key sentences for lecture archives using statistics of presumed discourse markers

Automatic extraction of key sentences from lecture audio archives is addressed. The method makes use of the characteristic expressions used in initial utterances of sections, which are defined as discourse markers and derived in a totally unsupervised manner based on word statistics. The statistics...

Full description

Saved in:
Bibliographic Details
Published in2004 IEEE International Conference on Acoustics, Speech and Signal Processing Vol. 1; pp. I - 449
Main Authors Nanjo, H., Kitade, T., Kawahara, T.
Format Conference Proceeding
LanguageEnglish
Japanese
Published Piscataway, N.J IEEE 28.09.2004
Subjects
Online AccessGet full text
ISBN9780780384842
0780384849
ISSN1520-6149
DOI10.1109/ICASSP.2004.1326019

Cover

Abstract Automatic extraction of key sentences from lecture audio archives is addressed. The method makes use of the characteristic expressions used in initial utterances of sections, which are defined as discourse markers and derived in a totally unsupervised manner based on word statistics. The statistics of the presumed discourse markers are then used to define the importance of the sentences. It is also combined with the conventional tf-idf measure of content words. Experimental results using a large corpus of lectures confirm the effectiveness of the method based on the discourse markers and its combination with the keyword-based method. It is also shown that the method is robust against ASR errors and sentence segmentation accuracy is more vital. Thus, we also enhance segmentation by incorporating prosodic information.
AbstractList Automatic extraction of key sentences from lecture audio archives is addressed. The method makes use of the characteristic expressions used in initial utterances of sections, which are defined as discourse markers and derived in a totally unsupervised manner based on word statistics. The statistics of the presumed discourse markers are then used to define the importance of the sentences. It is also combined with the conventional tf-idf measure of content words. Experimental results using a large corpus of lectures confirm the effectiveness of the method based on the discourse markers and its combination with the keyword-based method. It is also shown that the method is robust against ASR errors and sentence segmentation accuracy is more vital. Thus, we also enhance segmentation by incorporating prosodic information.
Author Kitade, T.
Nanjo, H.
Kawahara, T.
Author_xml – sequence: 1
  givenname: H.
  surname: Nanjo
  fullname: Nanjo, H.
  organization: Sch. of Informatics, Kyoto Univ., Japan
– sequence: 2
  givenname: T.
  surname: Kitade
  fullname: Kitade, T.
  organization: Sch. of Informatics, Kyoto Univ., Japan
– sequence: 3
  givenname: T.
  surname: Kawahara
  fullname: Kawahara, T.
  organization: Sch. of Informatics, Kyoto Univ., Japan
BackLink http://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=17565953$$DView record in Pascal Francis
BookMark eNpFkEtrAjEQgAO1ULX-Ai-59Lg2z93kKNIXCC3YniUmE5uqu5LZLfXfd8VCYWBg5vuGmRmRQd3UQMiUsxnnzN6_LOar1dtMMKZmXIqScXtFJrYyrA9plFFiQIZcC1aUXNkbMkL8YoyZSpkh2c67tjm4Nnma6gA_qd7SJtIdnChC3ULtAWlsMt2Db7sM1GX_mb77YodnFtvexV7Hs3bMgN0BAg0JfdNlBHpweQcZb8l1dHuEyV8ek4_Hh_fFc7F8feovWBaJKy0LGXnY6MCjF1qzwGTp4sYEWwartBeCSSHB-TJEo6tohQ1mo3SsKq9MlA7kmNxd5h4dereP2dU-4fqYU7_Iac0rXWqrZc9NL1wCgP_25X_yF1ZXaFs
ContentType Conference Proceeding
Copyright 2006 INIST-CNRS
Copyright_xml – notice: 2006 INIST-CNRS
DBID 6IE
6IH
CBEJK
RIE
RIO
IQODW
DOI 10.1109/ICASSP.2004.1326019
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
Pascal-Francis
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Statistics
Applied Sciences
EndPage 449
ExternalDocumentID 17565953
1326019
Genre orig-research
GroupedDBID 23M
29P
6IE
6IF
6IH
6IK
6IL
6IM
6IN
AAJGR
AAWTH
ABLEC
ACGFS
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IJVOP
IPLJI
M43
OCL
RIE
RIL
RIO
RNS
AAVQY
IQODW
RIB
RIC
ID FETCH-LOGICAL-i1453-3f1db5d1fc2550d036afb8d96d945c220323eac6df857f929d8b45f77c48f3ae3
IEDL.DBID RIE
ISBN 9780780384842
0780384849
ISSN 1520-6149
IngestDate Wed Apr 02 07:25:52 EDT 2025
Tue Aug 26 18:33:11 EDT 2025
IsPeerReviewed false
IsScholarly true
Keywords Performance evaluation
Archive
Keyword
Segmentation
Unsupervised classification
Signal classification
Accuracy
Prosody
Automatic indexing
Signal processing
Feature extraction
Discourse analysis
Automatic recognition
Language English
Japanese
License CC BY 4.0
LinkModel DirectLink
MeetingName 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing (proceedings)
MergedId FETCHMERGED-LOGICAL-i1453-3f1db5d1fc2550d036afb8d96d945c220323eac6df857f929d8b45f77c48f3ae3
ParticipantIDs ieee_primary_1326019
pascalfrancis_primary_17565953
PublicationCentury 2000
PublicationDate 2004-09-28
PublicationDateYYYYMMDD 2004-09-28
PublicationDate_xml – month: 09
  year: 2004
  text: 2004-09-28
  day: 28
PublicationDecade 2000
PublicationPlace Piscataway, N.J
PublicationPlace_xml – name: Piscataway, N.J
PublicationTitle 2004 IEEE International Conference on Acoustics, Speech and Signal Processing
PublicationTitleAbbrev ICASSP
PublicationYear 2004
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0008748
ssj0000454154
Score 1.5550214
Snippet Automatic extraction of key sentences from lecture audio archives is addressed. The method makes use of the characteristic expressions used in initial...
SourceID pascalfrancis
ieee
SourceType Index Database
Publisher
StartPage I
SubjectTerms Acoustic testing
Applied sciences
Automatic speech recognition
Data mining
Exact sciences and technology
Informatics
Information, signal and communications theory
Machine assisted indexing
Natural languages
Robustness
Signal and communications theory
Signal representation. Spectral analysis
Signal, noise
Speech recognition
Statistics
Telecommunications and information theory
Vocabulary
Title Automatic indexing of key sentences for lecture archives using statistics of presumed discourse markers
URI https://ieeexplore.ieee.org/document/1326019
Volume 1
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwED5RJliAtojykgdGUprYTpwRIVBBAiFBJTbkJ0JAg_pY-PXcOaEUxMCWKHZiO5bvPvu77wCOcNpkwmmTyNyXieBWJsoqj1CFG55xrdN4Ynp9kw9H4upBPqzA8SIWxnsfyWe-T5fxLN9Vdk5bZSeInBA_lC1oFSqvY7UW-ykkJZeSaWxWYVXEzFlonggeiTJCdjXgSihRNso7X_dZI0eUDsqTy7PTu7vbCBz7zfeaxCtEm9RTHLlQp7xYskMXG3D91YOafvLSn89M3378Enf8bxc3ofsd8cduF7ZsC1b8uA3rS2KFbVgjv7SWde7A0-l8VkW1Vxb1FrEEqwLDFYFRNFMkZzP0h9lrfUbBdKNwy4ho_8Smi3dRNSLjYsMcoxjhimgl7I1oQ5NpF0YX5_dnw6TJ2ZA8p0LyhIfUGenSYBGrDBzaRx2McmXuSiFtRvnaOa71uQtKFgF9M6eMkKEorFCBa8-3YXVcjf0OMIueabDKuIJjkUKYVPPg0N3QVvrUqx50aPwe32tZjsdm6Hpw-OM3fT8vJIkn8t2_6-3BWs3KKZNM7cPqbDL3B-hwzMxhnGmf-PbP9g
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3JTsMwEB2xHIALu9jxgSMpTWw3zrFCoLIUIQESt8orQkCDulz4emactCziwC1R7MR2LM88-80bgCOcNplw2iSy5YtEcCsTZZVHqMINz7jWaTwx7d60Og_i8lE-zsDxNBbGex_JZ75Bl_Es35V2TFtlJ4icED8UszAvhRCyitaa7qiQmFxKxrFeh1Uec2ehgSKAJIoI2lWTK6FEUWvvTO6zWpAobRYnF6ftu7vbCB0b9Rfr1CtEnNRDHLtQJb34ZonOl6E76UNFQHlpjEemYT9-yTv-t5MrsPEV88dup9ZsFWZ8fw2WvskVrsEieaaVsPM6PLXHozLqvbKouIglWBkYrgmM4pkiPZuhR8xeq1MKpmuNW0ZU-yc2nL6LqhEdFxvmGEUJl0QsYW9EHBoMN-Dh_Oz-tJPUWRuS51RInvCQOiNdGiyilaZDC6mDUa5ouUJIm1HGdo6rfcsFJfOA3plTRsiQ51aowLXnmzDXL_t-C5hF3zRYZVzOsUguTKp5cOhwaCt96tU2rNP49d4rYY5ePXTbcPDjN309zyXJJ_Kdv-sdwkLnvnvdu764udqFxYqjUySZ2oO50WDs99H9GJmDOOs-AQht00M
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2004+IEEE+International+Conference+on+Acoustics%2C+Speech%2C+and+Signal+Processing&rft.atitle=Automatic+indexing+of+key+sentences+for+lecture+archives+using+statistics+of+presumed+discourse+markers&rft.au=Nanjo%2C+H.&rft.au=Kitade%2C+T.&rft.au=Kawahara%2C+T.&rft.date=2004-09-28&rft.pub=IEEE&rft.isbn=9780780384842&rft.issn=1520-6149&rft.volume=1&rft.spage=I&rft.epage=449&rft_id=info:doi/10.1109%2FICASSP.2004.1326019&rft.externalDocID=1326019
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1520-6149&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1520-6149&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1520-6149&client=summon