Surrogate-Assisted Evolutionary Q-Learning for Black-Box Dynamic Time-Linkage Optimization Problems

Dynamic time-linkage optimization problems (DTPs) are special dynamic optimization problems (DOPs) with the time-linkage property. The environment of DTPs changes not only over time but also depends on the previous applied solutions. DTPs are hardly solved by existing dynamic evolutionary algorithms...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on evolutionary computation Vol. 27; no. 5; pp. 1162 - 1176
Main Authors	Zhang, Tuo, Wang, Handing, Yuan, Bo, Jin, Yaochu, Yao, Xin
Format	Journal Article
Language	English
Published	New York IEEE 01.10.2023 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Algorithms Black boxes Black-box problem Convergence Decision making dynamic time-linkage optimization problem (DTP) Evolutionary algorithms evolutionary dynamic optimization (EDO) Heuristic algorithms Optimization Prediction algorithms Q-learning Search process Sociology Statistics surrogate model
Online Access	Get full text
ISSN	1089-778X 1941-0026
DOI	10.1109/TEVC.2022.3179256

Cover

More Information
Summary:	Dynamic time-linkage optimization problems (DTPs) are special dynamic optimization problems (DOPs) with the time-linkage property. The environment of DTPs changes not only over time but also depends on the previous applied solutions. DTPs are hardly solved by existing dynamic evolutionary algorithms because they ignore the time-linkage property. In fact, they can be viewed as multiple decision-making problems and solved by reinforcement learning (RL). However, only some discrete DTPs are solved by RL-based evolutionary optimization algorithms with the assumption of observable objective functions. In this work, we propose a dynamic evolutionary optimization algorithm using surrogate-assisted <inline-formula> <tex-math notation="LaTeX">Q </tex-math></inline-formula>-learning for continuous black-box DTPs. To observe the states of black-box DTPs, the state extraction and prediction methods are applied after the search process at each time step. Based on the learned information, a surrogate-assisted <inline-formula> <tex-math notation="LaTeX">Q </tex-math></inline-formula>-learning is introduced to evaluate and select candidate solutions in the continuous decision space in a long-term consideration. We evaluate the components of our proposed algorithm on various benchmark problems to study their behaviors. Results of comparative experiments indicate that the proposed algorithm outperforms other compared algorithms and performs robustly on DTPs with up to 30 decision variables and different dynamic changes.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1089-778X 1941-0026
DOI:	10.1109/TEVC.2022.3179256