Adaptive Distributed Control for Leader–Follower Formation Based on a Recurrent SAC Algorithm

This study proposes a novel adaptive distributed recurrent SAC (Soft Actor–Critic) control method to address the leader–follower formation control problem of omnidirectional mobile robots. Our method successfully eliminates the reliance on the complete state of the leader and achieves the task of fo...

Full description

Saved in:

Bibliographic Details
Published in	Electronics (Basel) Vol. 13; no. 17; p. 3513
Main Authors	Li, Mingfei, Liu, Haibin, Xie, Feng, Huang, He
Format	Journal Article
Language	English
Published	Basel MDPI AG 01.09.2024
Subjects	Adaptive algorithms Adaptive control Control algorithms Control methods Experiments Kinematics Memory tasks Methods Network reliability Proportional integral derivative Robot control Robotics Robots Robust control
Online Access	Get full text
ISSN	2079-9292 2079-9292
DOI	10.3390/electronics13173513

Cover

More Information
Summary:	This study proposes a novel adaptive distributed recurrent SAC (Soft Actor–Critic) control method to address the leader–follower formation control problem of omnidirectional mobile robots. Our method successfully eliminates the reliance on the complete state of the leader and achieves the task of formation solely using the pose between robots. Moreover, we develop a novel recurrent SAC reinforcement learning framework that ensures that the controller exhibits good transient and steady-state characteristics to achieve outstanding control performance. We also present an episode-based memory replay buffer and sampling approaches, along with a unique normalized reward function, which expedites the recurrent SAC reinforcement learning formation framework to converge rapidly and receive consistent incentives across various leader–follower tasks. This facilitates better learning and adaptation to the formation task requirements in different scenarios. Furthermore, to bolster the generalization capability of our method, we normalized the state space, effectively eliminating differences between formation tasks of different shapes. Different shapes of leader–follower formation experiments in the Gazebo simulator achieve excellent results, validating the efficacy of our method. Comparative experiments with traditional PID and common network controllers demonstrate that our method achieves faster convergence and greater robustness. These simulation results provide strong support for our study and demonstrate the potential and reliability of our method in solving real-world problems.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2079-9292 2079-9292
DOI:	10.3390/electronics13173513