Nonlinear two-player zero-sum game approximate solution using a Policy Iteration algorithm

An approximate online solution is developed for a two-player zero-sum game subject to continuous-time nonlinear uncertain dynamics and an infinite horizon quadratic cost. A novel actor-critic-identifier (ACI) structure is used to implement the Policy Iteration (PI) algorithm, wherein a robust dynami...

Full description

Saved in:

Bibliographic Details
Published in	2011 50th IEEE Conference on Decision and Control and European Control Conference pp. 142 - 147
Main Authors	Johnson, M., Bhasin, S., Dixon, W. E.
Format	Conference Proceeding
Language	English
Published	IEEE 01.12.2011
Subjects	Approximation algorithms Approximation methods Artificial neural networks Equations Game theory Games Heuristic algorithms
Online Access	Get full text
ISBN	9781612848006 1612848001
ISSN	0191-2216
DOI	10.1109/CDC.2011.6160778

Cover

More Information
Summary:	An approximate online solution is developed for a two-player zero-sum game subject to continuous-time nonlinear uncertain dynamics and an infinite horizon quadratic cost. A novel actor-critic-identifier (ACI) structure is used to implement the Policy Iteration (PI) algorithm, wherein a robust dynamic neural network (DNN) is used to asymptotically identify the uncertain system, and a critic NN is used to approximate the value function. The weight update laws for the critic NN are generated using a gradient-descent method based on a modified temporal difference error, which is independent of the system dynamics. This method finds approximations of the optimal value function, and the saddle point feedback control policies. These policies are computed using the critic NN and the identifier DNN and guarantee uniformly ultimately bounded (UUB) stability of the closed-loop system. The actor, critic and identifier structures are implemented in real-time, continuously and simultaneously.
ISBN:	9781612848006 1612848001
ISSN:	0191-2216
DOI:	10.1109/CDC.2011.6160778