Comparative analysis of QSAR feature selection methods
Quantitative structure-activity relationships (QSAR) describe the relationship between quantitative chemical structural properties (molecular descriptors) and biological activity. QSAR assays are increasingly used in drug discovery and development as they can save significant time and human resource...
Saved in:
| Published in | AIP conference proceedings Vol. 3004; no. 1 |
|---|---|
| Main Authors | , |
| Format | Journal Article Conference Proceeding |
| Language | English |
| Published |
Melville
American Institute of Physics
11.03.2024
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 0094-243X 1935-0465 1551-7616 1551-7616 |
| DOI | 10.1063/5.0199872 |
Cover
| Abstract | Quantitative structure-activity relationships (QSAR) describe the relationship between quantitative chemical structural properties (molecular descriptors) and biological activity. QSAR assays are increasingly used in drug discovery and development as they can save significant time and human resources. Several parameters affect the predictive performance of QSAR models. On the one hand, various statistical methods can be used to study the linear or nonlinear behavior of a data set. Feature selection approaches, on the other hand, are used to reduce model complexity, limit the risk of overfitting/overtraining, and select the most important descriptors from hundreds of lists. A mathematical model is then used to relate the selected descriptors to the biological activity of the corresponding molecule. A variety of modeling strategies can be used, some of which involve explicit feature selection. QSAR models are useful for developing new compounds with increased potency in the class under consideration. Only connections that are considered interesting are created. Learning algorithms face the challenge of selecting a meaningful subset of features of interest while ignoring the rest of the feature selection problem. This paper studied the comparative analysis of the Chi-square, Mutual Information, Anova F-value, Fisher Score, Permutation Importance, Recursive Feature Elimination, Random Forest, LightGBM and SHAP feature selection methods used in QSAR modeling. The Python code written to get experimental results in this article has been uploaded to Github (https://github.com/kushmuratoff/feature_selection ). |
|---|---|
| AbstractList | Quantitative structure-activity relationships (QSAR) describe the relationship between quantitative chemical structural properties (molecular descriptors) and biological activity. QSAR assays are increasingly used in drug discovery and development as they can save significant time and human resources. Several parameters affect the predictive performance of QSAR models. On the one hand, various statistical methods can be used to study the linear or nonlinear behavior of a data set. Feature selection approaches, on the other hand, are used to reduce model complexity, limit the risk of overfitting/overtraining, and select the most important descriptors from hundreds of lists. A mathematical model is then used to relate the selected descriptors to the biological activity of the corresponding molecule. A variety of modeling strategies can be used, some of which involve explicit feature selection. QSAR models are useful for developing new compounds with increased potency in the class under consideration. Only connections that are considered interesting are created. Learning algorithms face the challenge of selecting a meaningful subset of features of interest while ignoring the rest of the feature selection problem. This paper studied the comparative analysis of the Chi-square, Mutual Information, Anova F-value, Fisher Score, Permutation Importance, Recursive Feature Elimination, Random Forest, LightGBM and SHAP feature selection methods used in QSAR modeling. The Python code written to get experimental results in this article has been uploaded to Github (https://github.com/kushmuratoff/feature_selection ). |
| Author | Davronov, Rifkat Kushmuratov, Samariddin |
| Author_xml | – sequence: 1 givenname: Rifkat surname: Davronov fullname: Davronov, Rifkat organization: V.I.Romanovskiy Institute of Mathematics, Uzbekistan Academy of Sciences – sequence: 2 givenname: Samariddin surname: Kushmuratov fullname: Kushmuratov, Samariddin email: bekmezonali@gmail.com organization: V.I.Romanovskiy Institute of Mathematics, Uzbekistan Academy of Sciences |
| BookMark | eNp9j0lLw0AYQAepYFs9-A8C3oTUWTLbsRQ3KIgbeBu-mcxgSpqJmUTJv7elBW-e3uXx4M3QpImNR-iS4AXBgt3wBSZaK0lP0JRwTnIpiJigKca6yGnBPs7QLKUNxlRLqaZIrOK2hQ766ttn0EA9piplMWTPr8uXLHjoh85nydfe9VVssq3vP2OZztFpgDr5iyPn6P3u9m31kK-f7h9Xy3XeEqFo7olloAsWguBKce6sppI5Bw4XjAdCFID02NkQaAAqSgWUlNyWQVlrCWVzdH3oDk0L4w_UtWm7agvdaAg2-2PDzfF4J18d5LaLX4NPvdnEods9JUM1L3YS1vIvmVzVw37qn-Qva3tjwg |
| CODEN | APCPCS |
| ContentType | Journal Article Conference Proceeding |
| Copyright | Author(s) 2024 Author(s). Published by AIP Publishing. |
| Copyright_xml | – notice: Author(s) – notice: 2024 Author(s). Published by AIP Publishing. |
| DBID | 8FD H8D L7M ADTOC UNPAY |
| DOI | 10.1063/5.0199872 |
| DatabaseName | Technology Research Database Aerospace Database Advanced Technologies Database with Aerospace Unpaywall for CDI: Periodical Content Unpaywall |
| DatabaseTitle | Technology Research Database Aerospace Database Advanced Technologies Database with Aerospace |
| DatabaseTitleList | Technology Research Database |
| Database_xml | – sequence: 1 dbid: UNPAY name: Unpaywall url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/ sourceTypes: Open Access Repository |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Physics |
| EISSN | 1551-7616 |
| Editor | Shadimetov, Kholmat M. Durdiev, Durdimurod K. Hayotov, Abdullo R. Babaev, Samandar S. Jalolov, Ozodjon I. |
| Editor_xml | – sequence: 1 givenname: Kholmat M. surname: Shadimetov fullname: Shadimetov, Kholmat M. organization: V. I. Romanovskiy Institute of Mathematics, Uzbekistan Academy of Sciences – sequence: 2 givenname: Abdullo R. surname: Hayotov fullname: Hayotov, Abdullo R. organization: V. I. Romanovskiy Institute of Mathematics, Uzbekistan Academy of Sciences – sequence: 3 givenname: Durdimurod K. surname: Durdiev fullname: Durdiev, Durdimurod K. organization: V. I. Romanovskiy Institute of Mathematics, Uzbekistan Academy of Sciences – sequence: 4 givenname: Samandar S. surname: Babaev fullname: Babaev, Samandar S. organization: V. I. Romanovskiy Institute of Mathematics, Uzbekistan Academy of Sciences – sequence: 5 givenname: Ozodjon I. surname: Jalolov fullname: Jalolov, Ozodjon I. organization: Bukhara State University |
| ExternalDocumentID | 10.1063/5.0199872 acp |
| Genre | Conference Proceeding |
| GroupedDBID | -~X 23M 5GY AAAAW AABDS AAEUA AAPUP AAYIH ABJNI ACBRY ACZLF ADCTM AEJMO AFATG AFHCQ AGKCL AGLKD AGMXG AGTJO AHSDT AJJCW ALEPV ALMA_UNASSIGNED_HOLDINGS ATXIE AWQPM BPZLN F5P FDOHQ FFFMQ HAM M71 M73 RIP RQS SJN ~02 8FD ABJGX ADMLS H8D L7M 0ZJ ADTOC J23 NEUPN RDFOP UNPAY |
| ID | FETCH-LOGICAL-p1682-e1b3a943ff658855cb9273ccac0435f118aa7e0cbff2fa26d8a21d5bdf8bbb123 |
| IEDL.DBID | UNPAY |
| ISSN | 0094-243X 1935-0465 1551-7616 |
| IngestDate | Tue Aug 19 23:19:44 EDT 2025 Mon Jun 30 03:37:28 EDT 2025 Fri Jun 21 00:11:05 EDT 2024 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 1 |
| Language | English |
| License | Published by AIP Publishing. 0094-243X/2024/3004/050002/5/$30.00 |
| LinkModel | DirectLink |
| MeetingName | INTERNATIONAL SCIENTIFIC AND PRACTICAL CONFERENCE ON “MODERN PROBLEMS OF APPLIED MATHEMATICS AND INFORMATION TECHNOLOGY (MPAMIT2022)” |
| MergedId | FETCHMERGED-LOGICAL-p1682-e1b3a943ff658855cb9273ccac0435f118aa7e0cbff2fa26d8a21d5bdf8bbb123 |
| Notes | ObjectType-Conference Proceeding-1 SourceType-Conference Papers & Proceedings-1 content type line 21 |
| OpenAccessLink | https://proxy.k.utb.cz/login?url=https://pubs.aip.org/aip/acp/article-pdf/doi/10.1063/5.0199872/19722248/050002_1_5.0199872.pdf |
| PQID | 2954998097 |
| PQPubID | 2050672 |
| PageCount | 5 |
| ParticipantIDs | proquest_journals_2954998097 scitation_primary_10_1063_5_0199872 unpaywall_primary_10_1063_5_0199872 |
| PublicationCentury | 2000 |
| PublicationDate | 20240311 |
| PublicationDateYYYYMMDD | 2024-03-11 |
| PublicationDate_xml | – month: 03 year: 2024 text: 20240311 day: 11 |
| PublicationDecade | 2020 |
| PublicationPlace | Melville |
| PublicationPlace_xml | – name: Melville |
| PublicationTitle | AIP conference proceedings |
| PublicationYear | 2024 |
| Publisher | American Institute of Physics |
| Publisher_xml | – name: American Institute of Physics |
| References | Halder, Cordeiro (c8) 2021 Wagener, Geerestein (c4) 2000 Frimurer, Bywater, Namm, Lauritsen, Bnuiak (c3) 2000 |
| References_xml | – start-page: 29 year: 2021 ident: c8 article-title: QSAR-Co-X: an open source toolkit for multitarget QSAR modelling publication-title: Journal of Cheminformatics. – start-page: 1315 year: 2000 ident: c3 article-title: Improving the odds in discriminating drug-like from non drug-like compounds publication-title: J. Chem. Inf. Comput. Sci. – start-page: 280 year: 2000 ident: c4 article-title: Potential drugs and nondings: prediction and identification of important struc-tural features publication-title: J. Chem. Inf. Comput. Sci. |
| SSID | ssj0029778 |
| Score | 2.3513696 |
| Snippet | Quantitative structure-activity relationships (QSAR) describe the relationship between quantitative chemical structural properties (molecular descriptors) and... |
| SourceID | unpaywall proquest scitation |
| SourceType | Open Access Repository Aggregation Database Publisher |
| SubjectTerms | Algorithms Biological activity Biological properties Comparative analysis Feature selection Machine learning Mathematical models Performance prediction Permutations Statistical methods |
| Title | Comparative analysis of QSAR feature selection methods |
| URI | http://dx.doi.org/10.1063/5.0199872 https://www.proquest.com/docview/2954998097 https://pubs.aip.org/aip/acp/article-pdf/doi/10.1063/5.0199872/19722248/050002_1_5.0199872.pdf |
| UnpaywallVersion | publishedVersion |
| Volume | 3004 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVEBS databaseName: Inspec with Full Text customDbUrl: eissn: 1551-7616 dateEnd: 20241102 omitProxy: false ssIdentifier: ssj0029778 issn: 0094-243X databaseCode: ADMLS dateStart: 20000101 isFulltext: true titleUrlDefault: https://www.ebsco.com/products/research-databases/inspec-full-text providerName: EBSCOhost |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtZ1bS8MwFMeDbohv3iZO5ijoa7fekiWPYzqG6JjOwXwqucJwdMVuiL741T29rJuC4IsPhUAT2pML5x9O8jsIXWHuCmEC1-Zeh9uBYbCkOCY2047C0pM-zkhM90MymAS3UzwtUpWmd2HgJ5IWn8U5IngWt7mEJ-9EO1ZmQxwgfjslbsJ2oeO10-RZ4Ixo20kB_17ohuW7FrTaRVWCQatXUHUyHHWfczRlYHuBP82Aqti1YT9P8hA0Ts874jWGaPs732ToPvioPFwO5VUU8_c3Pp9veaf-Afos7coOpby0VkvRkh8_kI__Z_ghqm1uDlqj0iceoR0dHaO97JipTE4Q6W1g4xYveCjWwlgP4-6jZXRGGbWSLDcPWGzl-a2TGpr0b556A7vI3GDHLgHJrl3hcxb4xoDAoRhLwUAmwWSRDsgzA5sazjvakcIYz3CPKMo9V2GhDBVCgDM9RZVoEekzZCktMNYY5hNXgSQBo5IpKmnaSilG66ixHpKwWH5JmAUvGXVYp44uy2EK4xzgEWaBd-KHOCy6CmqVA_h7rfM_1WqgyvJ1pS9AqyxFE1W71_d342Yx8b4AyJjgQw |
| linkProvider | Unpaywall |
| linkToUnpaywall | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtZ1JSwMxFMeDtog3t4qVKgN6nX2SSY6lWIpgqUuhnoasUCzt4LSIXvzqvlk6rYLgxcNAYBJmXhbeP7zk9xC6xtwXwkS-zYOY25FhsKQ4JjbTnsIykCEuSEx3QzIYR7cTPKlSleZ3YeAnModP0xIRPE1dLuEpO9FOldkQB0jo5sRN2C7EgZsnzwJnRF0vB_wHiZ_U7xxotYuaBINWb6DmeDjqPpdoysgOonBSAFWxb8N-npQhaJyfd8RrDNH2d77J0H3wUWW4HMqrecrf3_hstuWd-gfos7arOJTy4qyWwpEfP5CP_2f4IWptbg5ao9onHqEdPT9Ge8UxU5mdINLbwMYtXvFQrIWx7h-7D5bRBWXUyorcPGCxVea3zlpo3L956g3sKnODnfoEJLv2RchZFBoDAodiLAUDmQSTRXogzwxsajiPtSeFMYHhAVGUB77CQhkqhABneooa88VcnyFLaYGxxjCfuIokiRiVTFFJ81ZKMdpGnfWQJNXyy5IieMmox-I2uqqHKUlLgEdSBN5JmOCk6iqoVQ_g77XO_1SrgxrL15W-AK2yFJfVhPsCzmLerw |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=AIP+conference+proceedings&rft.atitle=Comparative+analysis+of+QSAR+feature+selection+methods&rft.date=2024-03-11&rft.pub=American+Institute+of+Physics&rft.issn=0094-243X&rft.eissn=1551-7616&rft.volume=3004&rft.issue=1&rft_id=info:doi/10.1063%2F5.0199872&rft.externalDBID=NO_FULL_TEXT |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0094-243X&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0094-243X&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0094-243X&client=summon |