Word Sense Disambiguation with a Similarity-Smoothed Case Library
A case-based algorithm for word sense disambiguation, tested in the SENSEVAL workshop competition, constructs a case library from the training corpus using dependency trees to define local contexts of test words as feature vectors in which each feature of a word is a path in the dependency tree of i...
Saved in:
| Published in | Computers and the humanities Vol. 34; no. 1/2; pp. 147 - 152 |
|---|---|
| Main Author | |
| Format | Journal Article |
| Language | English |
| Published |
New York
Kluwer Academic Publishers
01.04.2000
Pergamon Springer Nature B.V |
| Subjects | |
| Online Access | Get full text |
| ISSN | 0010-4817 1574-020X 1572-8412 1574-0218 |
| DOI | 10.1023/a:1002633105432 |
Cover
| Summary: | A case-based algorithm for word sense disambiguation, tested in the SENSEVAL workshop competition, constructs a case library from the training corpus using dependency trees to define local contexts of test words as feature vectors in which each feature of a word is a path in the dependency tree of its containing sentence. Data sparseness is addressed by applying a similarity function to a thesaurus extracted from a 125 million word corpus, thereby recognizing commonalities between local contexts; target words are tagged with the sense value of the example having the maximally similar local context. Training with the entire training corpus yielded robust SENSEVAL evaluation results of 0.701 recall & 0.706 precision; running the system without the thesaurus produced a 4%-6% drop in both values, & a 7% drop resulted when local contexts were formalized as surrounding words instead of dependency trees. 2 Tables, 6 References. J. Hitchcock |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 |
| ISSN: | 0010-4817 1574-020X 1572-8412 1574-0218 |
| DOI: | 10.1023/a:1002633105432 |