self-adaptive genetic algorithm-artificial neural network algorithm with leave-one-out cross validation for descriptor selection in QSAR study

Based on the quantitative structure-activity relationships (QSARs) models developed by artificial neural networks (ANNs), genetic algorithm (GA) was used in the variable-selection approach with molecule descriptors and helped to improve the back-propagation training algorithm as well. The cross vali...

Full description

Saved in:

Bibliographic Details
Published in	Journal of computational chemistry Vol. 31; no. 10; pp. 1956 - 1968
Main Authors	Wu, Jingheng, Mei, Juan, Wen, Sixiang, Liao, Siyan, Chen, Jincan, Shen, Yong
Format	Journal Article
Language	English
Published	Hoboken Wiley Subscription Services, Inc., A Wiley Company 30.07.2010 Wiley Subscription Services, Inc
Subjects	Algorithms artificial neural network Chemistry Genetic algorithms leave-multiple-out leave-one-out Linear Models Models, Biological multiple linear regression Neural networks Neural Networks (Computer) Quantitative Structure-Activity Relationship Validation studies Validity Y-randomization
Online Access	Get full text
ISSN	0192-8651 1096-987X 1096-987X
DOI	10.1002/jcc.21471

Cover

More Information
Summary:	Based on the quantitative structure-activity relationships (QSARs) models developed by artificial neural networks (ANNs), genetic algorithm (GA) was used in the variable-selection approach with molecule descriptors and helped to improve the back-propagation training algorithm as well. The cross validation techniques of leave-one-out investigated the validity of the generated ANN model and preferable variable combinations derived in the GAs. A self-adaptive GA-ANN model was successfully established by using a new estimate function for avoiding over-fitting phenomenon in ANN training. Compared with the variables selected in two recent QSAR studies that were based on stepwise multiple linear regression (MLR) models, the variables selected in self-adaptive GA-ANN model are superior in constructing ANN model, as they revealed a higher cross validation (CV) coefficient (Q²) and a lower root mean square deviation both in the established model and biological activity prediction. The introduced methods for validation, including leave-multiple-out, Y-randomization, and external validation, proved the superiority of the established GA-ANN models over MLR models in both stability and predictive power. Self-adaptive GA-ANN showed us a prospect of improving QSAR model.
Bibliography:	http://dx.doi.org/10.1002/jcc.21471 istex:1D70A598CA5443AEF2FA57F5BD1DDF8253C13FD3 ArticleID:JCC21471 Optic Vector Computing Workstation at State Key Laboratory of Optoelectronic Materials of Sun Yat-sen University National Natural Science Foundation of the People's Republic of China - No. 90608012 High Performance Computing Center (HPCC) at Sun Yat-sen University ark:/67375/WNG-GR93QV2Z-1 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14 ObjectType-Article-1 ObjectType-Feature-2 content type line 23
ISSN:	0192-8651 1096-987X 1096-987X
DOI:	10.1002/jcc.21471