MeSH Up: effective MeSH text classification for improved document retrieval

Motivation: Controlled vocabularies such as the Medical Subject Headings (MeSH) thesaurus and the Gene Ontology (GO) provide an efficient way of accessing and organizing biomedical information by reducing the ambiguity inherent to free-text data. Different methods of automating the assignment of MeS...

Full description

Saved in:

Bibliographic Details
Published in	Bioinformatics Vol. 25; no. 11; pp. 1412 - 1418
Main Authors	Trieschnigg, Dolf, Pezik, Piotr, Lee, Vivian, de Jong, Franciska, Kraaij, Wessel, Rebholz-Schuhmann, Dietrich
Format	Journal Article
Language	English
Published	Oxford Oxford University Press 01.06.2009 Oxford Publishing Limited (England)
Subjects	Bioinformatics Biological and medical sciences Computational Biology - methods Database Management Systems - classification Databases, Genetic - classification Fundamental and applied biological sciences. Psychology General aspects Information Storage and Retrieval - classification Information Storage and Retrieval - methods Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects) Medical Subject Headings Original Papers Vocabulary, Controlled Information retrieval Text Document Classification
Online Access	Get full text
ISSN	1367-4803 1367-4811 1460-2059 1367-4811
DOI	10.1093/bioinformatics/btp249

Cover

More Information
Summary:	Motivation: Controlled vocabularies such as the Medical Subject Headings (MeSH) thesaurus and the Gene Ontology (GO) provide an efficient way of accessing and organizing biomedical information by reducing the ambiguity inherent to free-text data. Different methods of automating the assignment of MeSH concepts have been proposed to replace manual annotation, but they are either limited to a small subset of MeSH or have only been compared with a limited number of other systems. Results: We compare the performance of six MeSH classification systems [MetaMap, EAGL, a language and a vector space model-based approach, a K-Nearest Neighbor (KNN) approach and MTI] in terms of reproducing and complementing manual MeSH annotations. A KNN system clearly outperforms the other published approaches and scales well with large amounts of text using the full MeSH thesaurus. Our measurements demonstrate to what extent manual MeSH annotations can be reproduced and how they can be complemented by automatic annotations. We also show that a statistically significant improvement can be obtained in information retrieval (IR) when the text of a user's query is automatically annotated with MeSH concepts, compared to using the original textual query alone. Conclusions: The annotation of biomedical texts using controlled vocabularies such as MeSH can be automated to improve text-only IR. Furthermore, the automatic MeSH annotation system we propose is highly scalable and it generates improvements in IR comparable with those observed for manual annotations. Contact: trieschn@ewi.utwente.nl Supplementary information: Supplementary data are available at Bioinformatics online.
Bibliography:	Associate Editor: Limsoon Wong To whom correspondence should be addressed. ArticleID:btp249 ark:/67375/HXZ-CVF9LRC4-H istex:2E5F26393302AE8E8C38263847EFD7ED518F78B3 ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14 ObjectType-Article-1 ObjectType-Feature-2 content type line 23
ISSN:	1367-4803 1367-4811 1460-2059 1367-4811
DOI:	10.1093/bioinformatics/btp249