Policy Gradient-based Reinforcement Learning for LQG Control with Chance Constraints

In this paper, we investigate a model-free optimal control design that minimizes an infinite horizon average expected quadratic cost of states and control actions subject to a probabilistic risk or chance constraint using input-output data. In particular, we consider linear time-invariant systems an...

Full description

Saved in:

Bibliographic Details
Published in	European Control Conference (Piscataway, N.J. Online) pp. 364 - 371
Main Authors	Naha, Arunava, Dey, Subhrakanti
Format	Conference Proceeding
Language	English
Published	EUCA 24.06.2025
Subjects	Analytical models Computational modeling Convergence Numerical models Prediction algorithms Regulators Reinforcement learning Stability analysis State feedback Training
Online Access	Get full text
ISSN	2996-8895
DOI	10.23919/ECC65951.2025.11186950

Cover

More Information
Summary:	In this paper, we investigate a model-free optimal control design that minimizes an infinite horizon average expected quadratic cost of states and control actions subject to a probabilistic risk or chance constraint using input-output data. In particular, we consider linear time-invariant systems and design an optimal controller within the class of linear state feedback controls. Two different policy gradient (PG) based algorithms, natural policy gradient (NPG) and Gauss-Newton policy gradient (GNPG) are developed and compared to deep deterministic policy gradient (DDPG), the optimal risk-neutral linear-quadratic regulator (LQR), chance constrained LQR, and a scenario-based model predictive control (MPC). The convergence properties and the accuracy of all the algorithms are compared numerically. We also establish analytical convergence properties of the NPG algorithm under the known model scenario, while convergence analysis for the unknown model scenario is part of our ongoing work.
ISSN:	2996-8895
DOI:	10.23919/ECC65951.2025.11186950