Safe Exploration Algorithms for Reinforcement Learning Controllers

Self-learning approaches, such as reinforcement learning, offer new possibilities for autonomous control of uncertain or time-varying systems. However, exploring an unknown environment under limited prediction capabilities is a challenge for a learning agent. If the environment is dangerous, free ex...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transaction on neural networks and learning systems Vol. 29; no. 4; pp. 1069 - 1081
Main Authors	Mannucci, Tommaso, van Kampen, Erik-Jan, de Visser, Cornelis, Chu, Qiping
Format	Journal Article
Language	English
Published	United States IEEE 01.04.2018 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Adaptation models Adaptive controllers Aerodynamics Aircraft Aircraft control Algorithms Altitude control Backups Complex systems Computer simulation Exploration Formulations Heuristic algorithms Learning Learning (artificial intelligence) Machine learning Measurement model-free control Reinforcement reinforcement learning (RL) Risk perception safe exploration Safety
Online Access	Get full text
ISSN	2162-237X 2162-2388 2162-2388
DOI	10.1109/TNNLS.2017.2654539

Cover

More Information
Summary:	Self-learning approaches, such as reinforcement learning, offer new possibilities for autonomous control of uncertain or time-varying systems. However, exploring an unknown environment under limited prediction capabilities is a challenge for a learning agent. If the environment is dangerous, free exploration can result in physical damage or in an otherwise unacceptable behavior. With respect to existing methods, the main contribution of this paper is the definition of a new approach that does not require global safety functions, nor specific formulations of the dynamics or of the environment, but relies on interval estimation of the dynamics of the agent during the exploration phase, assuming a limited capability of the agent to perceive the presence of incoming fatal states. Two algorithms are presented with this approach. The first is the Safety Handling Exploration with Risk Perception Algorithm (SHERPA), which provides safety by individuating temporary safety functions, called backups. SHERPA is shown in a simulated, simplified quadrotor task, for which dangerous states are avoided. The second algorithm, denominated OptiSHERPA, can safely handle more dynamically complex systems for which SHERPA is not sufficient through the use of safety metrics. An application of OptiSHERPA is simulated on an aircraft altitude control task.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	2162-237X 2162-2388 2162-2388
DOI:	10.1109/TNNLS.2017.2654539