Reinforcement Learning-based control using Q-learning and gravitational search algorithm with experimental validation on a nonlinear servo system

•A combination of Deep Q-Learning algorithm and metaheuristic GSA is offered.•GSA initializes the weights and the biases of the neural networks.•A comparison with classical random, metaheuristic PSO and GWO is carried out.•The validation is done on real-time nonlinear servo system position control.•...

Full description

Saved in:

Bibliographic Details
Published in	Information sciences Vol. 583; pp. 99 - 120
Main Authors	Zamfirache, Iuliu Alexandru, Precup, Radu-Emil, Roman, Raul-Cristian, Petriu, Emil M.
Format	Journal Article
Language	English
Published	Elsevier Inc 01.01.2022
Subjects	Gravitational search algorithm NN training Optimal reference tracking control Q-learning Reinforcement learning Servo systems Gravitational search algorithm Q-learning Servo systems Optimal reference tracking control NN training Reinforcement learning
Online Access	Get full text
ISSN	0020-0255 1872-6291
DOI	10.1016/j.ins.2021.10.070

Cover

More Information
Summary:	•A combination of Deep Q-Learning algorithm and metaheuristic GSA is offered.•GSA initializes the weights and the biases of the neural networks.•A comparison with classical random, metaheuristic PSO and GWO is carried out.•The validation is done on real-time nonlinear servo system position control.•The drawbacks of randomly initialized neural networks are mitigated. This paper presents a novel Reinforcement Learning (RL)-based control approach that uses a combination of a Deep Q-Learning (DQL) algorithm and a metaheuristic Gravitational Search Algorithm (GSA). The GSA is employed to initialize the weights and the biases of the Neural Network (NN) involved in DQL in order to avoid the instability, which is the main drawback of the traditional randomly initialized NNs. The quality of a particular set of weights and biases is measured at each iteration of the GSA-based initialization using a fitness function aiming to achieve the predefined optimal control or learning objective. The data generated during the RL process is used in training a NN-based controller that will be able to autonomously achieve the optimal reference tracking control objective. The proposed approach is compared with other similar techniques which use different algorithms in the initialization step, namely the traditional random algorithm, the Grey Wolf Optimizer algorithm, and the Particle Swarm Optimization algorithm. The NN-based controllers based on each of these techniques are compared using performance indices specific to optimal control as settling time, rise time, peak time, overshoot, and minimum cost function value. Real-time experiments are conducted in order to validate and test the proposed new approach in the framework of the optimal reference tracking control of a nonlinear position servo system. The experimental results show the superiority of this approach versus the other three competing approaches.
ISSN:	0020-0255 1872-6291
DOI:	10.1016/j.ins.2021.10.070