Genetics Algorithm Feature Selection for Improving Aqueous Solubility Prediction
Aqueous solubility is an important property for conducting chemical reactions of the compound. In this research, we develop several machine learning models for predicting the aqueous solubility reaction of molecules. The open public dataset, AqSolDB, was used for model development which contains 998...
        Saved in:
      
    
          | Published in | Journal of physics. Conference series Vol. 2377; no. 1; pp. 12016 - 12020 | 
|---|---|
| Main Authors | , | 
| Format | Journal Article | 
| Language | English | 
| Published | 
        Bristol
          IOP Publishing
    
        01.11.2022
     | 
| Subjects | |
| Online Access | Get full text | 
| ISSN | 1742-6588 1742-6596 1742-6596  | 
| DOI | 10.1088/1742-6596/2377/1/012016 | 
Cover
| Summary: | Aqueous solubility is an important property for conducting chemical reactions of the compound. In this research, we develop several machine learning models for predicting the aqueous solubility reaction of molecules. The open public dataset, AqSolDB, was used for model development which contains 9982 data on molecule solubility. Several machine learning regression models were trained on the dataset and their performance was evaluated using mean absolute error. In this research, we use machine learning model-based tree for model development. The result showed that the best model for solubility prediction is using Categoric Boosting Regressor achieving 0.854 mean absolute error. The importance of feature that affected solubility can also be calculated from the calculation. It is shown that variable MolLogP strongly correlated with solubility reaction. To further improve our model, we selected several features using a genetics algorithm and trained selected feature using several machine learning-based tree models. It showed that the lowest mean absolute error obtained from Categoric Boosting Regressor model achieving 0.771 which provides an improvement with previous calculation without feature selection. | 
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14  | 
| ISSN: | 1742-6588 1742-6596 1742-6596  | 
| DOI: | 10.1088/1742-6596/2377/1/012016 |