Automatic indexing of key sentences for lecture archives using statistics of presumed discourse markers

Automatic extraction of key sentences from lecture audio archives is addressed. The method makes use of the characteristic expressions used in initial utterances of sections, which are defined as discourse markers and derived in a totally unsupervised manner based on word statistics. The statistics...

Full description

Saved in:

Bibliographic Details
Published in	2004 IEEE International Conference on Acoustics, Speech and Signal Processing Vol. 1; pp. I - 449
Main Authors	Nanjo, H., Kitade, T., Kawahara, T.
Format	Conference Proceeding
Language	English Japanese
Published	Piscataway, N.J IEEE 28.09.2004
Subjects	Acoustic testing Applied sciences Automatic speech recognition Data mining Exact sciences and technology Informatics Information, signal and communications theory Machine assisted indexing Natural languages Robustness Signal and communications theory Signal representation. Spectral analysis Signal, noise Speech recognition Statistics Telecommunications and information theory Vocabulary Performance evaluation Archive Keyword Segmentation Unsupervised classification Signal classification Accuracy Prosody Automatic indexing Signal processing Feature extraction Discourse analysis Automatic recognition
Online Access	Get full text
ISBN	9780780384842 0780384849
ISSN	1520-6149
DOI	10.1109/ICASSP.2004.1326019

Cover

More Information
Summary:	Automatic extraction of key sentences from lecture audio archives is addressed. The method makes use of the characteristic expressions used in initial utterances of sections, which are defined as discourse markers and derived in a totally unsupervised manner based on word statistics. The statistics of the presumed discourse markers are then used to define the importance of the sentences. It is also combined with the conventional tf-idf measure of content words. Experimental results using a large corpus of lectures confirm the effectiveness of the method based on the discourse markers and its combination with the keyword-based method. It is also shown that the method is robust against ASR errors and sentence segmentation accuracy is more vital. Thus, we also enhance segmentation by incorporating prosodic information.
ISBN:	9780780384842 0780384849
ISSN:	1520-6149
DOI:	10.1109/ICASSP.2004.1326019