Policy Gradient-based Reinforcement Learning for LQG Control with Chance Constraints

In this paper, we investigate a model-free optimal control design that minimizes an infinite horizon average expected quadratic cost of states and control actions subject to a probabilistic risk or chance constraint using input-output data. In particular, we consider linear time-invariant systems an...

Full description

Saved in:
Bibliographic Details
Published inEuropean Control Conference (Piscataway, N.J. Online) pp. 364 - 371
Main Authors Naha, Arunava, Dey, Subhrakanti
Format Conference Proceeding
LanguageEnglish
Published EUCA 24.06.2025
Subjects
Online AccessGet full text
ISSN2996-8895
DOI10.23919/ECC65951.2025.11186950

Cover

More Information
Summary:In this paper, we investigate a model-free optimal control design that minimizes an infinite horizon average expected quadratic cost of states and control actions subject to a probabilistic risk or chance constraint using input-output data. In particular, we consider linear time-invariant systems and design an optimal controller within the class of linear state feedback controls. Two different policy gradient (PG) based algorithms, natural policy gradient (NPG) and Gauss-Newton policy gradient (GNPG) are developed and compared to deep deterministic policy gradient (DDPG), the optimal risk-neutral linear-quadratic regulator (LQR), chance constrained LQR, and a scenario-based model predictive control (MPC). The convergence properties and the accuracy of all the algorithms are compared numerically. We also establish analytical convergence properties of the NPG algorithm under the known model scenario, while convergence analysis for the unknown model scenario is part of our ongoing work.
ISSN:2996-8895
DOI:10.23919/ECC65951.2025.11186950