Data-Driven Optimal Consensus Control for Discrete-Time Multi-Agent Systems With Unknown Dynamics Using Reinforcement Learning Method

This paper investigates the optimal consensus control problem for discrete-time multi-agent systems with completely unknown dynamics by utilizing a data-driven reinforcement learning method. It is known that the optimal consensus control for multi-agent systems relies on the solution of the coupled...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on industrial electronics (1982) Vol. 64; no. 5; pp. 4091 - 4100
Main Authors	Zhang, Huaguang, Jiang, He, Luo, Yanhong, Xiao, Geyang
Format	Journal Article
Language	English
Published	New York IEEE 01.05.2017 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Action-dependent heuristic dynamic programming (ADHDP) adaptive dynamic programming (ADP) Artificial neural networks Control systems Coupling data-driven multi-agent systems Discrete time systems Dynamic programming Games Heuristic algorithms Iterative algorithms Iterative methods Learning Mathematical analysis Mathematical model Mathematical models Multi-agent systems Multiagent systems Neural networks optimal consensus control Optimal control Optimization Performance indices reinforcement learning (RL) Teaching methods
Online Access	Get full text
ISSN	0278-0046 1557-9948
DOI	10.1109/TIE.2016.2542134

Cover

More Information
Summary:	This paper investigates the optimal consensus control problem for discrete-time multi-agent systems with completely unknown dynamics by utilizing a data-driven reinforcement learning method. It is known that the optimal consensus control for multi-agent systems relies on the solution of the coupled Hamilton-Jacobi-Bellman equation, which is generally impossible to be solved analytically. Even worse, most real-world systems are too complicated to obtain accurate mathematical models. To overcome these deficiencies, a data-based adaptive dynamic programming method is presented using the current and past system data rather than the accurate system models also instead of the traditional identification scheme which would cause the approximation residual errors. First, we establish a discounted performance index and formulate the optimal consensus problem via Bellman optimality principle. Then, we introduce the policy iteration algorithm which motivates this paper. To implement the proposed online action-dependent heuristic dynamic programming method, two neural networks (NNs), 1) critic NN and 2) actor NN, are employed to approximate the iterative performance index functions and control policies, respectively, in real time. Finally, two simulation examples are provided to demonstrate the effectiveness of the proposed method.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0278-0046 1557-9948
DOI:	10.1109/TIE.2016.2542134