Resolving abbreviations to their senses in Medline
Motivation: Biological literature contains many abbreviations with one particular sense in each document. However, most abbreviations do not have a unique sense across the literature. Furthermore, many documents do not contain the long forms of the abbreviations. Resolving an abbreviation in a docum...
Saved in:
Published in | Bioinformatics Vol. 21; no. 18; pp. 3658 - 3664 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
Oxford
Oxford University Press
15.09.2005
Oxford Publishing Limited (England) |
Subjects | |
Online Access | Get full text |
ISSN | 1367-4803 1460-2059 1367-4811 |
DOI | 10.1093/bioinformatics/bti586 |
Cover
Summary: | Motivation: Biological literature contains many abbreviations with one particular sense in each document. However, most abbreviations do not have a unique sense across the literature. Furthermore, many documents do not contain the long forms of the abbreviations. Resolving an abbreviation in a document consists of retrieving its sense in use. Abbreviation resolution improves accuracy of document retrieval engines and of information extraction systems. Results: We combine an automatic analysis of Medline abstracts and linguistic methods to build a dictionary of abbreviation/sense pairs. The dictionary is used for the resolution of abbreviations occurring with their long forms. Ambiguous global abbreviations are resolved using support vector machines that have been trained on the context of each instance of the abbreviation/sense pairs, previously extracted for the dictionary set-up. The system disambiguates abbreviations with a precision of 98.9% for a recall of 98.2% (98.5% accuracy). This performance is superior in comparison with previously reported research work. Availability: The abbreviation resolution module is available at http://www.ebi.ac.uk/Rebholz/software.html Contact: gaudan@ebi.ac.uk |
---|---|
Bibliography: | To whom correspondence should be addressed. ark:/67375/HXZ-LBTQ9PSF-B local:bti586 istex:06A7892085274302F150CE06222ED6F12756EEC0 ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 ObjectType-Article-2 ObjectType-Feature-1 content type line 23 |
ISSN: | 1367-4803 1460-2059 1367-4811 |
DOI: | 10.1093/bioinformatics/bti586 |