On the Design of Safe Continual RL Methods for Control of Nonlinear Systems

Reinforcement learning (RL) algorithms have been successfully applied to control tasks associated with unmanned aerial vehicles and robotics. In recent years, safe RL has been proposed to allow the safe execution of RL algorithms in industrial and mission-critical systems that operate in closed loop...

Full description

Saved in:

Bibliographic Details
Published in	European Control Conference (Piscataway, N.J. Online) pp. 892 - 897
Main Authors	Coursey, Austin, Quinones-Grueiro, Marcos, Biswas, Gautam
Format	Conference Proceeding
Language	English
Published	EUCA 24.06.2025
Subjects	Control systems Costs Europe Limbs Mission critical systems Nonlinear systems Optimization Reinforcement learning Safety Service robots
Online Access	Get full text
ISSN	2996-8895
DOI	10.23919/ECC65951.2025.11187149

Cover

More Information
Summary:	Reinforcement learning (RL) algorithms have been successfully applied to control tasks associated with unmanned aerial vehicles and robotics. In recent years, safe RL has been proposed to allow the safe execution of RL algorithms in industrial and mission-critical systems that operate in closed loops. However, if the system operating conditions change, such as when an unknown fault occurs, typical safe RL algorithms cannot adapt while retaining past knowledge. Continual RL algorithms have been proposed to address this issue. However, the impact of continual adaptation on the system's safety is an understudied problem. In this paper, we study the intersection of safe and continual RL. First, we empirically demonstrate that a popular continual RL algorithm, elastic weight consolidation, does not satisfy safety constraints in nonlinear systems subject to varying operating conditions. Specifically, we study the MuJoCo HalfCheetah and Ant environments with velocity constraints and sudden joint loss non-stationarity. Then, we show that an agent trained using constrained policy optimization, a safe RL algorithm, experiences catastrophic forgetting in continual learning settings. With this in mind, we explore a simple reward-shaping method to ensure that elastic weight consolidation prioritizes remembering both safety and task performance for safety-constrained, nonlinear, and non-stationary systems.
ISSN:	2996-8895
DOI:	10.23919/ECC65951.2025.11187149