Computationally Efficient Algorithm to Identify Matched Molecular Pairs (MMPs) in Large Data Sets

Modern drug discovery organizations generate large volumes of SAR data. A promising methodology that can be used to mine this chemical data to identify novel structure−activity relationships is the matched molecular pair (MMP) methodology. However, before the full potential of the MMP methodology ca...

Full description

Saved in:

Bibliographic Details
Published in	Journal of chemical information and modeling Vol. 50; no. 3; pp. 339 - 348
Main Authors	Hussain, Jameed, Rea, Ceara
Format	Journal Article
Language	English
Published	Washington, DC American Chemical Society 22.03.2010
Subjects	Algorithms Analytical chemistry Biological and medical sciences Chemical compounds Chemical Information Databases, Factual Drug Discovery - methods General pharmacology Medical sciences Molecular structure Pharmaceutical technology. Pharmaceutical industry Pharmacology. Drug treatments Structure-Activity Relationship Very large databases Screening Structure activity relation Pharmaceutical industry Data structure Chemical structure
Online Access	Get full text
ISSN	1549-9596 1549-960X 1549-960X
DOI	10.1021/ci900450m

Cover

More Information
Summary:	Modern drug discovery organizations generate large volumes of SAR data. A promising methodology that can be used to mine this chemical data to identify novel structure−activity relationships is the matched molecular pair (MMP) methodology. However, before the full potential of the MMP methodology can be utilized, a MMP identification method that is capable of identifying all MMPs in large chemical data sets on modest computational hardware is required. In this paper we report an algorithm that is capable of systematically generating all MMPs in chemical data sets. Additionally, the algorithm is computationally efficient enough to be applied on large data sets. As an example the algorithm was used to identify the MMPs in the ∼300k NIH MLSMR set. The algorithm identified ∼5.3 million matched molecular pairs in the set. These pairs cover ∼2.6 million unique molecular transformations.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	1549-9596 1549-960X 1549-960X
DOI:	10.1021/ci900450m