Automatic extraction of mutations from Medline and cross‐validation with OMIM

Mutations help us to understand the molecular origins of diseases. Researchers, therefore, both publish and seek disease‐relevant mutations in public databases and in scientific literature, e.g. Medline. The retrieval tends to be time‐consuming and incomplete. Automated screening of the literature i...

Full description

Saved in:
Bibliographic Details
Published inNucleic acids research Vol. 32; no. 1; pp. 135 - 142
Main Authors Rebholz‐Schuhmann, Dietrich, Marcel, Stephane, Albert, Sylvie, Tolle, Ralf, Casari, Georg, Kirsch, Harald
Format Journal Article
LanguageEnglish
Published England Oxford University Press 01.01.2004
Oxford Publishing Limited (England)
Subjects
Online AccessGet full text
ISSN0305-1048
1362-4962
1362-4962
DOI10.1093/nar/gkh162

Cover

More Information
Summary:Mutations help us to understand the molecular origins of diseases. Researchers, therefore, both publish and seek disease‐relevant mutations in public databases and in scientific literature, e.g. Medline. The retrieval tends to be time‐consuming and incomplete. Automated screening of the literature is more efficient. We developed extraction methods (called MEMA) that scan Medline abstracts for mutations. MEMA identified 24 351 singleton mutations in conjunction with a HUGO gene name out of 16 728 abstracts. From a sample of 100 abstracts we estimated the recall for the identification of mutation–gene pairs to 35% at a precision of 93%. Recall for the mutation detection alone was >67% with a precision rate of >96%. This shows that our system produces reliable data. The subset consisting of protein sequence mutations (PSMs) from MEMA was compared to the entries in OMIM (20 503 entries versus 6699, respectively). We found 1826 PSM–gene pairs to be in common to both datasets (cross‐validated). This is 27% of all PSM–gene pairs in OMIM and 91% of those pairs from OMIM which co‐occur in at least one Medline abstract. We conclude that Medline covers a large portion of the mutations known to OMIM. Another large portion could be artificially produced mutations from mutagenesis experiments. Access to the database of extracted mutation–gene pairs is available through the web pages of the EBI (refer to http://www.ebi. ac.uk/rebholz/index.html).
Bibliography:ark:/67375/HXZ-HB0D3057-5
To whom correspondence should be addressed. Tel: +44 1223 492594; Fax: +44 1223 444468; Email: rebholz@ebi.ac.uk
istex:D53EC6A78C817BB0916AAFB088AAC50C93B828B5
Received August 13, 2003 ; Revised September 17, 2003; Accepted November 12, 2003
local:gkh162
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ObjectType-Article-2
ObjectType-Feature-1
content type line 23
ISSN:0305-1048
1362-4962
1362-4962
DOI:10.1093/nar/gkh162