AgreementPred: a cheminformatic framework for drug and natural product category recommendation based on multi-representation structural similarity data fusion

Natural products offer a vast reservoir of bioactive compounds, playing a crucial role in drug discovery. In this big data era, the annotation of their pharmacological categories holds great potential for accelerating drug discovery and advancing mechanistic studies of herbal medicines. However, a v...

Full description

Saved in:
Bibliographic Details
Published inDigital discovery
Main Authors Sutcharitchan, Chayanis, Wang, Boyang, Zhang, Dingfan, Liu, Qingyuan, Zhang, Tingyu, Zhang, Peng, Li, Shao
Format Journal Article
LanguageEnglish
Published 30.09.2025
Online AccessGet full text
ISSN2635-098X
2635-098X
DOI10.1039/D5DD00329F

Cover

More Information
Summary:Natural products offer a vast reservoir of bioactive compounds, playing a crucial role in drug discovery. In this big data era, the annotation of their pharmacological categories holds great potential for accelerating drug discovery and advancing mechanistic studies of herbal medicines. However, a vast majority of natural products' classification remains unannotated. Existing recommendation frameworks for pharmacological categories are predominantly tailored to conventional drugs and frequently require extensive experimental data which are typically lacking for natural products. Traditional cheminformatic approaches based on structural similarity, while widely adopted, often struggle to achieve a satisfactory balance between prediction recall and precision, thereby limiting their overall effectiveness. In this study, a simple and explainable category recommendation framework for drugs and natural products based on multi-representation structural similarity data fusion, AgreementPred, was proposed. The framework utilized PubChem compound annotations which comprised two compound classification systems, Anatomical Therapeutic Chemical (ATC) classification and Medical Subject Headings (MeSH) as category labels, extending the scope of application beyond conventional drugs. The similarity search results using 22 molecular representations were combined to improve prediction recall. The predicted annotations were subsequently filtered by agreement scores to enhance prediction precision. Compared to existing equivalent approaches, AgreementPred achieved superior recall-precision balance in both ATC and category prediction tasks. With an agreement score threshold of 0.1, AgreementPred showed 0.74 and 0.55 of recall and precision, respectively, for the category prediction for 1000 compounds from a pool of 1520 categories. Finally, AgreementPred was applied to 321 605 unannotated drugs and natural products. The resulting prediction is expected to be of contribution to drug discovery, as well as mechanistic study purposes.
ISSN:2635-098X
2635-098X
DOI:10.1039/D5DD00329F