Extracting Variant Forms of Chemical Names for Information Retrieval
Chemical substance names are long, complex and prone to variation. This study investigates the retrieval effects of the variation. A large set of acronyms and associated text parts was extracted from a subset of the Medline collection and used to construct a full name -- acronym index. A longest com...
Saved in:
| Published in | Information research Vol. 13; no. 3 |
|---|---|
| Main Author | |
| Format | Journal Article |
| Language | English |
| Published |
InformationR.net
01.09.2008
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 1368-1613 1368-1613 |
Cover
| Abstract | Chemical substance names are long, complex and prone to variation. This study investigates the retrieval effects of the variation. A large set of acronyms and associated text parts was extracted from a subset of the Medline collection and used to construct a full name -- acronym index. A longest common subsequence and statistics based technique (named FNV-Finder) was devised to identify MeSH term variants from the full name -- acronym index for use as query terms in searching. The average number of variants for each MeSH term, the performance of the FNV-Finder technique and retrieval performance were evaluated. The average number of unique variants for each MeSH term denoting a chemical substance is 2.82. The FNV-Finder technique achieved 95.0% recall and 97.1% precision. The retrieval experiments showed that the collection contains a substantial number of documents that contain only variant forms of the MeSH terms (and do not contain the MeSH terms or CAS registry numbers). The selection of variant forms for queries from a collection would be very useful or even necessary in chemical name searching. Variant forms can be selected readily from the full name -- acronym index either manually or automatically using the FNV-Finder technique. Adapted from the source document. |
|---|---|
| AbstractList | Chemical substance names are long, complex and prone to variation. This study investigates the retrieval effects of the variation. A large set of acronyms and associated text parts was extracted from a subset of the Medline collection and used to construct a full name -- acronym index. A longest common subsequence and statistics based technique (named FNV-Finder) was devised to identify MeSH term variants from the full name -- acronym index for use as query terms in searching. The average number of variants for each MeSH term, the performance of the FNV-Finder technique and retrieval performance were evaluated. The average number of unique variants for each MeSH term denoting a chemical substance is 2.82. The FNV-Finder technique achieved 95.0% recall and 97.1% precision. The retrieval experiments showed that the collection contains a substantial number of documents that contain only variant forms of the MeSH terms (and do not contain the MeSH terms or CAS registry numbers). The selection of variant forms for queries from a collection would be very useful or even necessary in chemical name searching. Variant forms can be selected readily from the full name -- acronym index either manually or automatically using the FNV-Finder technique. Adapted from the source document. Introduction. Chemical substance names are long, complex and prone to variation. This study investigates the retrieval effects of the variation. Method. A large set of acronyms and associated text parts was extracted from a subset of the Medline collection and used to construct a full name - acronym index. A longest common subsequence and statistics based technique (named FNV-Finder) was devised to identify MeSH term variants from the full name - acronym index for use as query terms in searching. The average number of variants for each MeSH term, the performance of the FNV-Finder technique and retrieval performance were evaluated. Results. The average number of unique variants for each MeSH term denoting a chemical substance is 2.82. The FNV-Finder technique achieved 95.0% recall and 97.1% precision. The retrieval experiments showed that the collection contains a substantial number of documents that contain only variant forms of the MeSH terms (and do not contain the MeSH terms or CAS registry numbers). Conclusions. The selection of variant forms for queries from a collection would be very useful or even necessary in chemical name searching. Variant forms can be selected readily from the full name - acronym index either manually or automatically using the FNV-Finder technique. Introducción. Los nombres de las substancias químicas son largos, complejos y propensos a la variación. Este estudio investiga los efectos en la recuperación de la variación. Método. Se extrajo un gran conjunto de acrónimos y partes textuales asociadas de un subconjunto de la colección Medline y se usó para construir un índice completo de nombre-acrónimo. Se diseñó una técnica basada en la subsecuencia común más larga y estadística (denominada FNV-Finder) para identificar las variantes de términos MeSH desde el índice completo de nombre-acrónimo para su uso como términos de consulta en búsquedas. Se evaluó el número medio de variantes para cada término MeSH, el desempeño de la técnica FNV-Finder y el desempeño de la recuperación. Resultados. El número medio de variantes únicas de cada término MeSH denotando una substancia química es de 2.82. La técnica FNV-Finder logró un 95.0% de exhaustividad y un 97.1% de precisión. Los experimentos de recuperación mostraron que la colección contiene un número sustancial de documentos que contienen sólo variantes de los términos MeSH (y no contiene términos MeSH o números de registro CAS). Conclusiones. La selección de formas variantes para las consultas desde una colección sería muy útil o incluso necesaria en la búsqueda de nombres químicos. Pueden seleccionarse rápidamente las formas variantes del índice nombre completo - acrónimo manual o automáticamente usando la técnica FNV-Finder. |
| Author | Pirkola, An |
| Author_xml | – sequence: 1 givenname: An surname: Pirkola fullname: Pirkola, An |
| BookMark | eNpNjkFLwzAYhoNMcJv-h5y8SCFN2qQ9ytx0MBREvZav6ReNpMlMMpn_3sI8eHmf9_Dw8i7IzAePZ2ReCtkUpSzF7F-_IIuUPhnjrFL1nNytjzmCzta_0zeIFnymmxDHRIOhqw8crQZHH2HERE2IdOunHCHb4Okz5mjxG9wlOTfgEl79cUleN-uX1UOxe7rfrm53heOc56JCrVVlZNkbNkDFaqMUyFYprLUxUjeK9xIrXQK0INqe97pWjA_Q6gHYYMSS3Jx23XTAWz_gsdtHO0L86QLYLqIOcehq1shWTPb1yd7H8HXAlLvRJo3OgcdwSF2tlGhFw8UvOvdb0g |
| ContentType | Journal Article |
| Copyright | free |
| Copyright_xml | – notice: free |
| DBID | E3H F2A 77F |
| DatabaseName | Library & Information Sciences Abstracts (LISA) Library & Information Science Abstracts (LISA) Latindex |
| DatabaseTitle | Library and Information Science Abstracts (LISA) |
| DatabaseTitleList | Library and Information Science Abstracts (LISA) |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Library & Information Science |
| EISSN | 1368-1613 |
| ExternalDocumentID | oai_record_508693 |
| GroupedDBID | .4I 29I 2WC 5GY 5VS 77I 77K AAFWJ ABDBF ABOPQ ADBBV ADMLS AEGXH AFPKN ALMA_UNASSIGNED_HOLDINGS BCNDV C1A E3H E3Z EBS EJD ELW F2A GROUPED_DOAJ H13 KQ8 M~E OVT P2P RNS XSB 4I 77F ABNOP ADACO AGCAB RIG |
| ID | FETCH-LOGICAL-l222t-4ecc74f61bf0da405f77a6977e5cff6c872b6e4c1aa9a39b2bc5702da9cda0df3 |
| ISSN | 1368-1613 |
| IngestDate | Wed Nov 11 00:08:33 EST 2020 Fri Sep 05 11:54:44 EDT 2025 |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 3 |
| Language | English |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-l222t-4ecc74f61bf0da405f77a6977e5cff6c872b6e4c1aa9a39b2bc5702da9cda0df3 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| OpenAccessLink | http://dialnet.unirioja.es/servlet/oaiart?codigo=2863060 |
| PQID | 57739382 |
| PQPubID | 23477 |
| ParticipantIDs | latinindex_primary_oai_record_508693 proquest_miscellaneous_57739382 |
| ProviderPackageCode | 77F |
| PublicationCentury | 2000 |
| PublicationDate | 2008-09-01 |
| PublicationDateYYYYMMDD | 2008-09-01 |
| PublicationDate_xml | – month: 09 year: 2008 text: 2008-09-01 day: 01 |
| PublicationDecade | 2000 |
| PublicationTitle | Information research |
| PublicationYear | 2008 |
| Publisher | InformationR.net |
| Publisher_xml | – name: InformationR.net |
| SSID | ssj0020475 |
| Score | 1.7756839 |
| Snippet | Chemical substance names are long, complex and prone to variation. This study investigates the retrieval effects of the variation. A large set of acronyms and... Introduction. Chemical substance names are long, complex and prone to variation. This study investigates the retrieval effects of the variation. Method. A... |
| SourceID | latinindex proquest |
| SourceType | Open Access Repository Aggregation Database |
| SubjectTerms | Chemical names Search strategies Subject indexing |
| Title | Extracting Variant Forms of Chemical Names for Information Retrieval |
| URI | https://www.proquest.com/docview/57739382 http://dialnet.unirioja.es/servlet/oaiart?codigo=2863060 |
| Volume | 13 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAFT databaseName: Open Access Digital Library customDbUrl: eissn: 1368-1613 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0020475 issn: 1368-1613 databaseCode: KQ8 dateStart: 19950101 isFulltext: true titleUrlDefault: http://grweb.coalliance.org/oadl/oadl.html providerName: Colorado Alliance of Research Libraries – providerCode: PRVAON databaseName: DOAJ Directory of Open Access Journals customDbUrl: eissn: 1368-1613 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0020475 issn: 1368-1613 databaseCode: DOA dateStart: 19950101 isFulltext: true titleUrlDefault: https://www.doaj.org/ providerName: Directory of Open Access Journals – providerCode: PRVEBS databaseName: EBSCOhost Academic Search Ultimate customDbUrl: https://search.ebscohost.com/login.aspx?authtype=ip,shib&custid=s3936755&profile=ehost&defaultdb=asn eissn: 1368-1613 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0020475 issn: 1368-1613 databaseCode: ABDBF dateStart: 20070101 isFulltext: true titleUrlDefault: https://search.ebscohost.com/direct.asp?db=asn providerName: EBSCOhost – providerCode: PRVEBS databaseName: Inspec with Full Text customDbUrl: eissn: 1368-1613 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0020475 issn: 1368-1613 databaseCode: ADMLS dateStart: 20070101 isFulltext: true titleUrlDefault: https://www.ebsco.com/products/research-databases/inspec-full-text providerName: EBSCOhost – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 1368-1613 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0020475 issn: 1368-1613 databaseCode: M~E dateStart: 19950101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3PT8IwFG6Ekxfjz4gK9kC8jYx17djRKIaY4MFAwm1puzZBzTA4iPHg3-5r95PIAb0sS0O6rd_2-r3He99DqMuppiHl1OGUh44fM-YMFKHgqkglwj4XmpvQwPiJjab-44zOao0uTHVJKnrya2tdyX9QhTHA1VTJ_gHZclIYgHPAF46AMBx3wnj4mdoiJ_D21-DzcqOzBBzUZmfIQgkgMXmwNpswF0m1iC9tJ611foWXIp-9_oNanMvYzvnyFdxga0uW841gwaDMhsodyGqaZxP6qBk-wsCbZFldaE9tGSusJam9FaTaRTb0qrMAUwTkj4WkgRqkb5pLjL-HpS_s-lYDubyEEQCF-0qsRuSvfdFu9pNDdJCzdHybLfkR2lPJMWrnNR74BteeEOfW8QTdV3DgHA5s4cALjQs4sIXDjOMaHLiE4xRNH4aTu5GTN6lw3oBapY4P30Dga9YX2o050F8dBJwBq1ZUas3kIPAEU77scx5yEgpPSBq4XsxDGXM31uQMNZNFos4RFjIIgG_GsVYSvhppehEAf2A6Jp7yPbeFutUaRe-ZGkn0a71b6LpYvQhshfkDiCdqsfqIaGD0DwfexW4TXaL96h26Qs10uVJtoGCp6NjQRcdC-gM4GkGm |
| linkProvider | ISSN International Centre |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Extracting+variant+forms+of+chemical+names+for+information+retrieval&rft.jtitle=Information+research&rft.au=Pirkola%2C+Ari&rft.date=2008-09-01&rft.pub=InformationR.net&rft.issn=1368-1613&rft.eissn=1368-1613&rft.volume=13&rft.issue=3&rft.externalDocID=oai_record_508693 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1368-1613&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1368-1613&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1368-1613&client=summon |