A novel Q-learning algorithm based on improved whale optimization algorithm for path planning

Q-learning is a classical reinforcement learning algorithm and one of the most important methods of mobile robot path planning without a prior environmental model. Nevertheless, Q-learning is too simple when initializing Q-table and wastes too much time in the exploration process, causing a slow con...

Full description

Saved in:

Bibliographic Details
Published in	PloS one Vol. 17; no. 12; p. e0279438
Main Authors	Li, Ying, Wang, Hanyu, Fan, Jiahao, Geng, Yanyu
Format	Journal Article
Language	English
Published	United States Public Library of Science 27.12.2022
Subjects	Algorithms Analysis Animals Aviation Biology and Life Sciences Convergence Coronaviruses Earth Sciences Ecology and Environmental Sciences Engineering and Technology Environmental modeling Exploitation Exploration Feature selection Interior Design and Furnishings Learning Machine learning Marine mammals Methods Neural networks Optimization Optimization algorithms Path planning Physical Sciences Prey Records Reinforcement, Psychology Research and Analysis Methods Robots Social Sciences Strategy Whales China
Online Access	Get full text
ISSN	1932-6203 1932-6203
DOI	10.1371/journal.pone.0279438

Cover

More Information
Summary:	Q-learning is a classical reinforcement learning algorithm and one of the most important methods of mobile robot path planning without a prior environmental model. Nevertheless, Q-learning is too simple when initializing Q-table and wastes too much time in the exploration process, causing a slow convergence speed. This paper proposes a new Q-learning algorithm called the Paired Whale Optimization Q-learning Algorithm (PWOQLA) which includes four improvements. Firstly, to accelerate the convergence speed of Q-learning, a whale optimization algorithm is used to initialize the values of a Q-table. Before the exploration process, a Q-table which contains previous experience is learned to improve algorithm efficiency. Secondly, to improve the local exploitation capability of the whale optimization algorithm, a paired whale optimization algorithm is proposed in combination with a pairing strategy to speed up the search for prey. Thirdly, to improve the exploration efficiency of Q-learning and reduce the number of useless explorations, a new selective exploration strategy is introduced which considers the relationship between current position and target position. Fourthly, in order to balance the exploration and exploitation capabilities of Q-learning so that it focuses on exploration in the early stage and on exploitation in the later stage, a nonlinear function is designed which changes the value of ε in ε -greedy Q-learning dynamically based on the number of iterations. Comparing the performance of PWOQLA with other path planning algorithms, experimental results demonstrate that PWOQLA achieves a higher level of accuracy and a faster convergence speed than existing counterparts in mobile robot path planning. The code will be released at https://github.com/wanghanyu0526/improveQL.git .
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 Competing Interests: The authors have declared that no competing interests exist.
ISSN:	1932-6203 1932-6203
DOI:	10.1371/journal.pone.0279438