Extracting Variant Forms of Chemical Names for Information Retrieval

Chemical substance names are long, complex and prone to variation. This study investigates the retrieval effects of the variation. A large set of acronyms and associated text parts was extracted from a subset of the Medline collection and used to construct a full name -- acronym index. A longest com...

Full description

Saved in:
Bibliographic Details
Published inInformation research Vol. 13; no. 3
Main Author Pirkola, An
Format Journal Article
LanguageEnglish
Published InformationR.net 01.09.2008
Subjects
Online AccessGet full text
ISSN1368-1613
1368-1613

Cover

More Information
Summary:Chemical substance names are long, complex and prone to variation. This study investigates the retrieval effects of the variation. A large set of acronyms and associated text parts was extracted from a subset of the Medline collection and used to construct a full name -- acronym index. A longest common subsequence and statistics based technique (named FNV-Finder) was devised to identify MeSH term variants from the full name -- acronym index for use as query terms in searching. The average number of variants for each MeSH term, the performance of the FNV-Finder technique and retrieval performance were evaluated. The average number of unique variants for each MeSH term denoting a chemical substance is 2.82. The FNV-Finder technique achieved 95.0% recall and 97.1% precision. The retrieval experiments showed that the collection contains a substantial number of documents that contain only variant forms of the MeSH terms (and do not contain the MeSH terms or CAS registry numbers). The selection of variant forms for queries from a collection would be very useful or even necessary in chemical name searching. Variant forms can be selected readily from the full name -- acronym index either manually or automatically using the FNV-Finder technique. Adapted from the source document.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1368-1613
1368-1613