Policy Gradient-based Reinforcement Learning for LQG Control with Chance Constraints
In this paper, we investigate a model-free optimal control design that minimizes an infinite horizon average expected quadratic cost of states and control actions subject to a probabilistic risk or chance constraint using input-output data. In particular, we consider linear time-invariant systems an...
        Saved in:
      
    
          | Published in | European Control Conference (Piscataway, N.J. Online) pp. 364 - 371 | 
|---|---|
| Main Authors | , | 
| Format | Conference Proceeding | 
| Language | English | 
| Published | 
            EUCA
    
        24.06.2025
     | 
| Subjects | |
| Online Access | Get full text | 
| ISSN | 2996-8895 | 
| DOI | 10.23919/ECC65951.2025.11186950 | 
Cover
| Summary: | In this paper, we investigate a model-free optimal control design that minimizes an infinite horizon average expected quadratic cost of states and control actions subject to a probabilistic risk or chance constraint using input-output data. In particular, we consider linear time-invariant systems and design an optimal controller within the class of linear state feedback controls. Two different policy gradient (PG) based algorithms, natural policy gradient (NPG) and Gauss-Newton policy gradient (GNPG) are developed and compared to deep deterministic policy gradient (DDPG), the optimal risk-neutral linear-quadratic regulator (LQR), chance constrained LQR, and a scenario-based model predictive control (MPC). The convergence properties and the accuracy of all the algorithms are compared numerically. We also establish analytical convergence properties of the NPG algorithm under the known model scenario, while convergence analysis for the unknown model scenario is part of our ongoing work. | 
|---|---|
| ISSN: | 2996-8895 | 
| DOI: | 10.23919/ECC65951.2025.11186950 |