Genetics Algorithm Feature Selection for Improving Aqueous Solubility Prediction

Aqueous solubility is an important property for conducting chemical reactions of the compound. In this research, we develop several machine learning models for predicting the aqueous solubility reaction of molecules. The open public dataset, AqSolDB, was used for model development which contains 998...

Full description

Saved in:
Bibliographic Details
Published inJournal of physics. Conference series Vol. 2377; no. 1; pp. 12016 - 12020
Main Authors Suhendar, H, Widianto, E
Format Journal Article
LanguageEnglish
Published Bristol IOP Publishing 01.11.2022
Subjects
Online AccessGet full text
ISSN1742-6588
1742-6596
1742-6596
DOI10.1088/1742-6596/2377/1/012016

Cover

More Information
Summary:Aqueous solubility is an important property for conducting chemical reactions of the compound. In this research, we develop several machine learning models for predicting the aqueous solubility reaction of molecules. The open public dataset, AqSolDB, was used for model development which contains 9982 data on molecule solubility. Several machine learning regression models were trained on the dataset and their performance was evaluated using mean absolute error. In this research, we use machine learning model-based tree for model development. The result showed that the best model for solubility prediction is using Categoric Boosting Regressor achieving 0.854 mean absolute error. The importance of feature that affected solubility can also be calculated from the calculation. It is shown that variable MolLogP strongly correlated with solubility reaction. To further improve our model, we selected several features using a genetics algorithm and trained selected feature using several machine learning-based tree models. It showed that the lowest mean absolute error obtained from Categoric Boosting Regressor model achieving 0.771 which provides an improvement with previous calculation without feature selection.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1742-6588
1742-6596
1742-6596
DOI:10.1088/1742-6596/2377/1/012016