ReactionMap: An Efficient Atom-Mapping Algorithm for Chemical Reactions

Large databases of chemical reactions provide new data-mining opportunities and challenges. Key challenges result from the imperfect quality of the data and the fact that many of these reactions are not properly balanced or atom-mapped. Here, we describe ReactionMap, an efficient atom-mapping algori...

Full description

Saved in:
Bibliographic Details
Published inJournal of chemical information and modeling Vol. 53; no. 11; pp. 2812 - 2819
Main Authors Fooshee, David, Andronico, Alessio, Baldi, Pierre
Format Journal Article
LanguageEnglish
Published Washington, DC American Chemical Society 25.11.2013
Subjects
Online AccessGet full text
ISSN1549-9596
1549-960X
1549-960X
DOI10.1021/ci400326p

Cover

More Information
Summary:Large databases of chemical reactions provide new data-mining opportunities and challenges. Key challenges result from the imperfect quality of the data and the fact that many of these reactions are not properly balanced or atom-mapped. Here, we describe ReactionMap, an efficient atom-mapping algorithm. Our approach uses a combination of maximum common chemical subgraph search and minimization of an assignment cost function derived empirically from training data. We use a set of over 259,000 balanced atom-mapped reactions from the SPRESI commercial database to train the system, and we validate it on random sets of 1000 and 17,996 reactions sampled from this pool. These large test sets represent a broad range of chemical reaction types, and ReactionMap correctly maps about 99% of the atoms and about 96% of the reactions, with a mean time per mapping of 2 s. Most correctly mapped reactions are mapped with high confidence. Mapping accuracy compares favorably with ChemAxon’s AutoMapper, versions 5 and 6.1, and the DREAM Web tool. These approaches correctly map 60.7%, 86.5%, and 90.3% of the reactions, respectively, on the same data set. A ReactionMap server is available on the ChemDB Web portal at http://cdb.ics.uci.edu.
Bibliography:SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 14
ObjectType-Article-1
ObjectType-Feature-2
content type line 23
ISSN:1549-9596
1549-960X
1549-960X
DOI:10.1021/ci400326p