Computationally Efficient Algorithm to Identify Matched Molecular Pairs (MMPs) in Large Data Sets

Modern drug discovery organizations generate large volumes of SAR data. A promising methodology that can be used to mine this chemical data to identify novel structure−activity relationships is the matched molecular pair (MMP) methodology. However, before the full potential of the MMP methodology ca...

Full description

Saved in:
Bibliographic Details
Published inJournal of chemical information and modeling Vol. 50; no. 3; pp. 339 - 348
Main Authors Hussain, Jameed, Rea, Ceara
Format Journal Article
LanguageEnglish
Published Washington, DC American Chemical Society 22.03.2010
Subjects
Online AccessGet full text
ISSN1549-9596
1549-960X
1549-960X
DOI10.1021/ci900450m

Cover

More Information
Summary:Modern drug discovery organizations generate large volumes of SAR data. A promising methodology that can be used to mine this chemical data to identify novel structure−activity relationships is the matched molecular pair (MMP) methodology. However, before the full potential of the MMP methodology can be utilized, a MMP identification method that is capable of identifying all MMPs in large chemical data sets on modest computational hardware is required. In this paper we report an algorithm that is capable of systematically generating all MMPs in chemical data sets. Additionally, the algorithm is computationally efficient enough to be applied on large data sets. As an example the algorithm was used to identify the MMPs in the ∼300k NIH MLSMR set. The algorithm identified ∼5.3 million matched molecular pairs in the set. These pairs cover ∼2.6 million unique molecular transformations.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:1549-9596
1549-960X
1549-960X
DOI:10.1021/ci900450m