Automatic indexing of key sentences for lecture archives using statistics of presumed discourse markers

Automatic extraction of key sentences from lecture audio archives is addressed. The method makes use of the characteristic expressions used in initial utterances of sections, which are defined as discourse markers and derived in a totally unsupervised manner based on word statistics. The statistics...

Full description

Saved in:

Bibliographic Details
Published in	2004 IEEE International Conference on Acoustics, Speech and Signal Processing Vol. 1; pp. I - 449
Main Authors	Nanjo, H., Kitade, T., Kawahara, T.
Format	Conference Proceeding
Language	English Japanese
Published	Piscataway, N.J IEEE 28.09.2004
Subjects	Acoustic testing Applied sciences Automatic speech recognition Data mining Exact sciences and technology Informatics Information, signal and communications theory Machine assisted indexing Natural languages Robustness Signal and communications theory Signal representation. Spectral analysis Signal, noise Speech recognition Statistics Telecommunications and information theory Vocabulary Performance evaluation Archive Keyword Segmentation Unsupervised classification Signal classification Accuracy Prosody Automatic indexing Signal processing Feature extraction Discourse analysis Automatic recognition
Online Access	Get full text
ISBN	9780780384842 0780384849
ISSN	1520-6149
DOI	10.1109/ICASSP.2004.1326019

Cover

Abstract	Automatic extraction of key sentences from lecture audio archives is addressed. The method makes use of the characteristic expressions used in initial utterances of sections, which are defined as discourse markers and derived in a totally unsupervised manner based on word statistics. The statistics of the presumed discourse markers are then used to define the importance of the sentences. It is also combined with the conventional tf-idf measure of content words. Experimental results using a large corpus of lectures confirm the effectiveness of the method based on the discourse markers and its combination with the keyword-based method. It is also shown that the method is robust against ASR errors and sentence segmentation accuracy is more vital. Thus, we also enhance segmentation by incorporating prosodic information.
AbstractList	Automatic extraction of key sentences from lecture audio archives is addressed. The method makes use of the characteristic expressions used in initial utterances of sections, which are defined as discourse markers and derived in a totally unsupervised manner based on word statistics. The statistics of the presumed discourse markers are then used to define the importance of the sentences. It is also combined with the conventional tf-idf measure of content words. Experimental results using a large corpus of lectures confirm the effectiveness of the method based on the discourse markers and its combination with the keyword-based method. It is also shown that the method is robust against ASR errors and sentence segmentation accuracy is more vital. Thus, we also enhance segmentation by incorporating prosodic information.
Author	Kitade, T. Nanjo, H. Kawahara, T.
Author_xml	– sequence: 1 givenname: H. surname: Nanjo fullname: Nanjo, H. organization: Sch. of Informatics, Kyoto Univ., Japan – sequence: 2 givenname: T. surname: Kitade fullname: Kitade, T. organization: Sch. of Informatics, Kyoto Univ., Japan – sequence: 3 givenname: T. surname: Kawahara fullname: Kawahara, T. organization: Sch. of Informatics, Kyoto Univ., Japan
BackLink	http://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=17565953$$DView record in Pascal Francis
BookMark	eNpFkEtrAjEQgAO1ULX-Ai-59Lg2z93kKNIXCC3YniUmE5uqu5LZLfXfd8VCYWBg5vuGmRmRQd3UQMiUsxnnzN6_LOar1dtMMKZmXIqScXtFJrYyrA9plFFiQIZcC1aUXNkbMkL8YoyZSpkh2c67tjm4Nnma6gA_qd7SJtIdnChC3ULtAWlsMt2Db7sM1GX_mb77YodnFtvexV7Hs3bMgN0BAg0JfdNlBHpweQcZb8l1dHuEyV8ek4_Hh_fFc7F8feovWBaJKy0LGXnY6MCjF1qzwGTp4sYEWwartBeCSSHB-TJEo6tohQ1mo3SsKq9MlA7kmNxd5h4dereP2dU-4fqYU7_Iac0rXWqrZc9NL1wCgP_25X_yF1ZXaFs
ContentType	Conference Proceeding
Copyright	2006 INIST-CNRS
Copyright_xml	– notice: 2006 INIST-CNRS
DBID	6IE 6IH CBEJK RIE RIO IQODW
DOI	10.1109/ICASSP.2004.1326019
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present Pascal-Francis
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Engineering Statistics Applied Sciences
EndPage	449
ExternalDocumentID	17565953 1326019
Genre	orig-research
GroupedDBID	23M 29P 6IE 6IF 6IH 6IK 6IL 6IM 6IN AAJGR AAWTH ABLEC ACGFS ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP IPLJI M43 OCL RIE RIL RIO RNS AAVQY IQODW RIB RIC
ID	FETCH-LOGICAL-i1453-3f1db5d1fc2550d036afb8d96d945c220323eac6df857f929d8b45f77c48f3ae3
IEDL.DBID	RIE
ISBN	9780780384842 0780384849
ISSN	1520-6149
IngestDate	Wed Apr 02 07:25:52 EDT 2025 Tue Aug 26 18:33:11 EDT 2025
IsPeerReviewed	false
IsScholarly	true
Keywords	Performance evaluation Archive Keyword Segmentation Unsupervised classification Signal classification Accuracy Prosody Automatic indexing Signal processing Feature extraction Discourse analysis Automatic recognition
Language	English Japanese
License	CC BY 4.0
LinkModel	DirectLink
MeetingName	2004 IEEE International Conference on Acoustics, Speech, and Signal Processing (proceedings)
MergedId	FETCHMERGED-LOGICAL-i1453-3f1db5d1fc2550d036afb8d96d945c220323eac6df857f929d8b45f77c48f3ae3
ParticipantIDs	ieee_primary_1326019 pascalfrancis_primary_17565953
PublicationCentury	2000
PublicationDate	2004-09-28
PublicationDateYYYYMMDD	2004-09-28
PublicationDate_xml	– month: 09 year: 2004 text: 2004-09-28 day: 28
PublicationDecade	2000
PublicationPlace	Piscataway, N.J
PublicationPlace_xml	– name: Piscataway, N.J
PublicationTitle	2004 IEEE International Conference on Acoustics, Speech and Signal Processing
PublicationTitleAbbrev	ICASSP
PublicationYear	2004
Publisher	IEEE
Publisher_xml	– name: IEEE
SSID	ssj0008748 ssj0000454154
Score	1.5550214
Snippet	Automatic extraction of key sentences from lecture audio archives is addressed. The method makes use of the characteristic expressions used in initial...
SourceID	pascalfrancis ieee
SourceType	Index Database Publisher
StartPage	I
SubjectTerms	Acoustic testing Applied sciences Automatic speech recognition Data mining Exact sciences and technology Informatics Information, signal and communications theory Machine assisted indexing Natural languages Robustness Signal and communications theory Signal representation. Spectral analysis Signal, noise Speech recognition Statistics Telecommunications and information theory Vocabulary
Title	Automatic indexing of key sentences for lecture archives using statistics of presumed discourse markers
URI	https://ieeexplore.ieee.org/document/1326019
Volume	1
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwED5RJliAtojykgdGUprYTpwRIVBBAiFBJTbkJ0JAg_pY-PXcOaEUxMCWKHZiO5bvPvu77wCOcNpkwmmTyNyXieBWJsoqj1CFG55xrdN4Ynp9kw9H4upBPqzA8SIWxnsfyWe-T5fxLN9Vdk5bZSeInBA_lC1oFSqvY7UW-ykkJZeSaWxWYVXEzFlonggeiTJCdjXgSihRNso7X_dZI0eUDsqTy7PTu7vbCBz7zfeaxCtEm9RTHLlQp7xYskMXG3D91YOafvLSn89M3378Enf8bxc3ofsd8cduF7ZsC1b8uA3rS2KFbVgjv7SWde7A0-l8VkW1Vxb1FrEEqwLDFYFRNFMkZzP0h9lrfUbBdKNwy4ho_8Smi3dRNSLjYsMcoxjhimgl7I1oQ5NpF0YX5_dnw6TJ2ZA8p0LyhIfUGenSYBGrDBzaRx2McmXuSiFtRvnaOa71uQtKFgF9M6eMkKEorFCBa8-3YXVcjf0OMIueabDKuIJjkUKYVPPg0N3QVvrUqx50aPwe32tZjsdm6Hpw-OM3fT8vJIkn8t2_6-3BWs3KKZNM7cPqbDL3B-hwzMxhnGmf-PbP9g
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3JTsMwEB2xHIALu9jxgSMpTWw3zrFCoLIUIQESt8orQkCDulz4emactCziwC1R7MR2LM88-80bgCOcNplw2iSy5YtEcCsTZZVHqMINz7jWaTwx7d60Og_i8lE-zsDxNBbGex_JZ75Bl_Es35V2TFtlJ4icED8UszAvhRCyitaa7qiQmFxKxrFeh1Uec2ehgSKAJIoI2lWTK6FEUWvvTO6zWpAobRYnF6ftu7vbCB0b9Rfr1CtEnNRDHLtQJb34ZonOl6E76UNFQHlpjEemYT9-yTv-t5MrsPEV88dup9ZsFWZ8fw2WvskVrsEieaaVsPM6PLXHozLqvbKouIglWBkYrgmM4pkiPZuhR8xeq1MKpmuNW0ZU-yc2nL6LqhEdFxvmGEUJl0QsYW9EHBoMN-Dh_Oz-tJPUWRuS51RInvCQOiNdGiyilaZDC6mDUa5ouUJIm1HGdo6rfcsFJfOA3plTRsiQ51aowLXnmzDXL_t-C5hF3zRYZVzOsUguTKp5cOhwaCt96tU2rNP49d4rYY5ePXTbcPDjN309zyXJJ_Kdv-sdwkLnvnvdu764udqFxYqjUySZ2oO50WDs99H9GJmDOOs-AQht00M
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2004+IEEE+International+Conference+on+Acoustics%2C+Speech%2C+and+Signal+Processing&rft.atitle=Automatic+indexing+of+key+sentences+for+lecture+archives+using+statistics+of+presumed+discourse+markers&rft.au=Nanjo%2C+H.&rft.au=Kitade%2C+T.&rft.au=Kawahara%2C+T.&rft.date=2004-09-28&rft.pub=IEEE&rft.isbn=9780780384842&rft.issn=1520-6149&rft.volume=1&rft.spage=I&rft.epage=449&rft_id=info:doi/10.1109%2FICASSP.2004.1326019&rft.externalDocID=1326019
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1520-6149&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1520-6149&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1520-6149&client=summon