Word Sense Disambiguation with a Similarity-Smoothed Case Library

A case-based algorithm for word sense disambiguation, tested in the SENSEVAL workshop competition, constructs a case library from the training corpus using dependency trees to define local contexts of test words as feature vectors in which each feature of a word is a path in the dependency tree of i...

Full description

Saved in:
Bibliographic Details
Published inComputers and the humanities Vol. 34; no. 1/2; pp. 147 - 152
Main Author Lin, Dekang
Format Journal Article
LanguageEnglish
Published New York Kluwer Academic Publishers 01.04.2000
Pergamon
Springer Nature B.V
Subjects
Online AccessGet full text
ISSN0010-4817
1574-020X
1572-8412
1574-0218
DOI10.1023/a:1002633105432

Cover

More Information
Summary:A case-based algorithm for word sense disambiguation, tested in the SENSEVAL workshop competition, constructs a case library from the training corpus using dependency trees to define local contexts of test words as feature vectors in which each feature of a word is a path in the dependency tree of its containing sentence. Data sparseness is addressed by applying a similarity function to a thesaurus extracted from a 125 million word corpus, thereby recognizing commonalities between local contexts; target words are tagged with the sense value of the example having the maximally similar local context. Training with the entire training corpus yielded robust SENSEVAL evaluation results of 0.701 recall & 0.706 precision; running the system without the thesaurus produced a 4%-6% drop in both values, & a 7% drop resulted when local contexts were formalized as surrounding words instead of dependency trees. 2 Tables, 6 References. J. Hitchcock
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:0010-4817
1574-020X
1572-8412
1574-0218
DOI:10.1023/a:1002633105432