Comparative analysis of QSAR feature selection methods

Quantitative structure-activity relationships (QSAR) describe the relationship between quantitative chemical structural properties (molecular descriptors) and biological activity. QSAR assays are increasingly used in drug discovery and development as they can save significant time and human resource...

Full description

Saved in:
Bibliographic Details
Published inAIP conference proceedings Vol. 3004; no. 1
Main Authors Davronov, Rifkat, Kushmuratov, Samariddin
Format Journal Article Conference Proceeding
LanguageEnglish
Published Melville American Institute of Physics 11.03.2024
Subjects
Online AccessGet full text
ISSN0094-243X
1935-0465
1551-7616
1551-7616
DOI10.1063/5.0199872

Cover

More Information
Summary:Quantitative structure-activity relationships (QSAR) describe the relationship between quantitative chemical structural properties (molecular descriptors) and biological activity. QSAR assays are increasingly used in drug discovery and development as they can save significant time and human resources. Several parameters affect the predictive performance of QSAR models. On the one hand, various statistical methods can be used to study the linear or nonlinear behavior of a data set. Feature selection approaches, on the other hand, are used to reduce model complexity, limit the risk of overfitting/overtraining, and select the most important descriptors from hundreds of lists. A mathematical model is then used to relate the selected descriptors to the biological activity of the corresponding molecule. A variety of modeling strategies can be used, some of which involve explicit feature selection. QSAR models are useful for developing new compounds with increased potency in the class under consideration. Only connections that are considered interesting are created. Learning algorithms face the challenge of selecting a meaningful subset of features of interest while ignoring the rest of the feature selection problem. This paper studied the comparative analysis of the Chi-square, Mutual Information, Anova F-value, Fisher Score, Permutation Importance, Recursive Feature Elimination, Random Forest, LightGBM and SHAP feature selection methods used in QSAR modeling. The Python code written to get experimental results in this article has been uploaded to Github (https://github.com/kushmuratoff/feature_selection ).
Bibliography:ObjectType-Conference Proceeding-1
SourceType-Conference Papers & Proceedings-1
content type line 21
ISSN:0094-243X
1935-0465
1551-7616
1551-7616
DOI:10.1063/5.0199872