Distributed Neural Networks Training for Robotic Manipulation With Consensus Algorithm

In this article, we propose an algorithm that combines actor-critic-based off-policy method with consensus-based distributed training to deal with multiagent deep reinforcement learning problems. Specifically, convergence analysis of a consensus algorithm for a type of nonlinear system with a Lyapun...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transaction on neural networks and learning systems Vol. 35; no. 2; pp. 2732 - 2746
Main Authors	Liu, Wenxing, Niu, Hanlin, Jang, Inmo, Herrmann, Guido, Carrasco, Joaquin
Format	Journal Article
Language	English
Published	United States IEEE 01.02.2024 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Algorithms Consensus Convergence deep reinforcement learning Lyapunov methods Machine learning manipulator Manipulators Multiagent systems Neural networks Nonlinear systems Parameters Privacy Reinforcement learning Robot arms Robot kinematics Task analysis Training
Online Access	Get full text
ISSN	2162-237X 2162-2388 2162-2388
DOI	10.1109/TNNLS.2022.3191021

Cover

More Information
Summary:	In this article, we propose an algorithm that combines actor-critic-based off-policy method with consensus-based distributed training to deal with multiagent deep reinforcement learning problems. Specifically, convergence analysis of a consensus algorithm for a type of nonlinear system with a Lyapunov method is developed, and we use this result to analyze the convergence properties of the actor training parameters and the critic training parameters in our algorithm. Through the convergence analysis, it can be verified that all agents will converge to the same optimal model as the training time goes to infinity. To validate the implementation of our algorithm, a multiagent training framework is proposed to train each Universal Robot 5 (UR5) robot arm to reach the random target position. Finally, experiments are provided to demonstrate the effectiveness and feasibility of the proposed algorithm.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	2162-237X 2162-2388 2162-2388
DOI:	10.1109/TNNLS.2022.3191021