Extracting Variant Forms of Chemical Names for Information Retrieval
Chemical substance names are long, complex and prone to variation. This study investigates the retrieval effects of the variation. A large set of acronyms and associated text parts was extracted from a subset of the Medline collection and used to construct a full name -- acronym index. A longest com...
Saved in:
| Published in | Information research Vol. 13; no. 3 |
|---|---|
| Main Author | |
| Format | Journal Article |
| Language | English |
| Published |
InformationR.net
01.09.2008
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 1368-1613 1368-1613 |
Cover
| Summary: | Chemical substance names are long, complex and prone to variation. This study investigates the retrieval effects of the variation. A large set of acronyms and associated text parts was extracted from a subset of the Medline collection and used to construct a full name -- acronym index. A longest common subsequence and statistics based technique (named FNV-Finder) was devised to identify MeSH term variants from the full name -- acronym index for use as query terms in searching. The average number of variants for each MeSH term, the performance of the FNV-Finder technique and retrieval performance were evaluated. The average number of unique variants for each MeSH term denoting a chemical substance is 2.82. The FNV-Finder technique achieved 95.0% recall and 97.1% precision. The retrieval experiments showed that the collection contains a substantial number of documents that contain only variant forms of the MeSH terms (and do not contain the MeSH terms or CAS registry numbers). The selection of variant forms for queries from a collection would be very useful or even necessary in chemical name searching. Variant forms can be selected readily from the full name -- acronym index either manually or automatically using the FNV-Finder technique. Adapted from the source document. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| ISSN: | 1368-1613 1368-1613 |