Safe Exploration Algorithms for Reinforcement Learning Controllers

Self-learning approaches, such as reinforcement learning, offer new possibilities for autonomous control of uncertain or time-varying systems. However, exploring an unknown environment under limited prediction capabilities is a challenge for a learning agent. If the environment is dangerous, free ex...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transaction on neural networks and learning systems Vol. 29; no. 4; pp. 1069 - 1081
Main Authors	Mannucci, Tommaso, van Kampen, Erik-Jan, de Visser, Cornelis, Chu, Qiping
Format	Journal Article
Language	English
Published	United States IEEE 01.04.2018 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Adaptation models Adaptive controllers Aerodynamics Aircraft Aircraft control Algorithms Altitude control Backups Complex systems Computer simulation Exploration Formulations Heuristic algorithms Learning Learning (artificial intelligence) Machine learning Measurement model-free control Reinforcement reinforcement learning (RL) Risk perception safe exploration Safety
Online Access	Get full text
ISSN	2162-237X 2162-2388 2162-2388
DOI	10.1109/TNNLS.2017.2654539

Cover

Abstract	Self-learning approaches, such as reinforcement learning, offer new possibilities for autonomous control of uncertain or time-varying systems. However, exploring an unknown environment under limited prediction capabilities is a challenge for a learning agent. If the environment is dangerous, free exploration can result in physical damage or in an otherwise unacceptable behavior. With respect to existing methods, the main contribution of this paper is the definition of a new approach that does not require global safety functions, nor specific formulations of the dynamics or of the environment, but relies on interval estimation of the dynamics of the agent during the exploration phase, assuming a limited capability of the agent to perceive the presence of incoming fatal states. Two algorithms are presented with this approach. The first is the Safety Handling Exploration with Risk Perception Algorithm (SHERPA), which provides safety by individuating temporary safety functions, called backups. SHERPA is shown in a simulated, simplified quadrotor task, for which dangerous states are avoided. The second algorithm, denominated OptiSHERPA, can safely handle more dynamically complex systems for which SHERPA is not sufficient through the use of safety metrics. An application of OptiSHERPA is simulated on an aircraft altitude control task.
AbstractList	Self-learning approaches, such as reinforcement learning, offer new possibilities for autonomous control of uncertain or time-varying systems. However, exploring an unknown environment under limited prediction capabilities is a challenge for a learning agent. If the environment is dangerous, free exploration can result in physical damage or in an otherwise unacceptable behavior. With respect to existing methods, the main contribution of this paper is the definition of a new approach that does not require global safety functions, nor specific formulations of the dynamics or of the environment, but relies on interval estimation of the dynamics of the agent during the exploration phase, assuming a limited capability of the agent to perceive the presence of incoming fatal states. Two algorithms are presented with this approach. The first is the Safety Handling Exploration with Risk Perception Algorithm (SHERPA), which provides safety by individuating temporary safety functions, called backups. SHERPA is shown in a simulated, simplified quadrotor task, for which dangerous states are avoided. The second algorithm, denominated OptiSHERPA, can safely handle more dynamically complex systems for which SHERPA is not sufficient through the use of safety metrics. An application of OptiSHERPA is simulated on an aircraft altitude control task.Self-learning approaches, such as reinforcement learning, offer new possibilities for autonomous control of uncertain or time-varying systems. However, exploring an unknown environment under limited prediction capabilities is a challenge for a learning agent. If the environment is dangerous, free exploration can result in physical damage or in an otherwise unacceptable behavior. With respect to existing methods, the main contribution of this paper is the definition of a new approach that does not require global safety functions, nor specific formulations of the dynamics or of the environment, but relies on interval estimation of the dynamics of the agent during the exploration phase, assuming a limited capability of the agent to perceive the presence of incoming fatal states. Two algorithms are presented with this approach. The first is the Safety Handling Exploration with Risk Perception Algorithm (SHERPA), which provides safety by individuating temporary safety functions, called backups. SHERPA is shown in a simulated, simplified quadrotor task, for which dangerous states are avoided. The second algorithm, denominated OptiSHERPA, can safely handle more dynamically complex systems for which SHERPA is not sufficient through the use of safety metrics. An application of OptiSHERPA is simulated on an aircraft altitude control task. Self-learning approaches, such as reinforcement learning, offer new possibilities for autonomous control of uncertain or time-varying systems. However, exploring an unknown environment under limited prediction capabilities is a challenge for a learning agent. If the environment is dangerous, free exploration can result in physical damage or in an otherwise unacceptable behavior. With respect to existing methods, the main contribution of this paper is the definition of a new approach that does not require global safety functions, nor specific formulations of the dynamics or of the environment, but relies on interval estimation of the dynamics of the agent during the exploration phase, assuming a limited capability of the agent to perceive the presence of incoming fatal states. Two algorithms are presented with this approach. The first is the Safety Handling Exploration with Risk Perception Algorithm (SHERPA), which provides safety by individuating temporary safety functions, called backups. SHERPA is shown in a simulated, simplified quadrotor task, for which dangerous states are avoided. The second algorithm, denominated OptiSHERPA, can safely handle more dynamically complex systems for which SHERPA is not sufficient through the use of safety metrics. An application of OptiSHERPA is simulated on an aircraft altitude control task.
Author	Mannucci, Tommaso Chu, Qiping de Visser, Cornelis van Kampen, Erik-Jan
Author_xml	– sequence: 1 givenname: Tommaso orcidid: 0000-0003-1994-2965 surname: Mannucci fullname: Mannucci, Tommaso email: t.mannucci@tudelft.nl organization: Control and Simulation Division, Faculty of Aerospace Engineering, Delft University of Technology, Delft, The Netherlands – sequence: 2 givenname: Erik-Jan orcidid: 0000-0002-5593-4471 surname: van Kampen fullname: van Kampen, Erik-Jan email: e.vankampen@tudelft.nl organization: Control and Simulation Division, Faculty of Aerospace Engineering, Delft University of Technology, Delft, The Netherlands – sequence: 3 givenname: Cornelis surname: de Visser fullname: de Visser, Cornelis email: c.c.devisser@tudelft.nl organization: Control and Simulation Division, Faculty of Aerospace Engineering, Delft University of Technology, Delft, The Netherlands – sequence: 4 givenname: Qiping surname: Chu fullname: Chu, Qiping email: q.p.chu@tudelft.nl organization: Control and Simulation Division, Faculty of Aerospace Engineering, Delft University of Technology, Delft, The Netherlands
BackLink	https://www.ncbi.nlm.nih.gov/pubmed/28182560$$D View this record in MEDLINE/PubMed
BookMark	eNp9kUlPwzAQhS1UxFL4AyChSFy4pHiJY-dYqrJIFUjQAzfLcSZglNjFTiX496QbBw7MZebwvafRe8do4LwDhM4IHhGCi-v54-PsZUQxESOa84yzYg8dUZLTlDIpB7-3eD1EpzF-4H5yzPOsOECHVBJJeY6P0M2LriGZfi0aH3RnvUvGzZsPtntvY1L7kDyDdf020ILrkhno4Kx7SybedcE3DYR4gvZr3UQ43e4hmt9O55P7dPZ09zAZz1LDOOlSiplhtaEVLQ2XONOZLLkpOFSskKTirCo5yynUEouK07pmwElVCcgMxRqzIbra2C6C_1xC7FRro4Gm0Q78Mioic8EzlmPRo5d_0A-_DK5_TvWBcUKJKFbUxZZali1UahFsq8O32oXTA3IDmOBjDFArY7t1SF3QtlEEq1UVal3FylmobRW9lP6R7tz_FZ1vRBYAfgVCZpTzgv0Ac2KTXQ
CODEN	ITNNAL
CitedBy_id	crossref_primary_10_1016_j_eswa_2024_124622 crossref_primary_10_1109_LRA_2024_3397531 crossref_primary_10_1109_TITS_2020_2989352 crossref_primary_10_1109_TCYB_2023_3283771 crossref_primary_10_1146_annurev_control_042920_020211 crossref_primary_10_1109_ACCESS_2019_2931884 crossref_primary_10_1016_j_ifacol_2021_08_562 crossref_primary_10_1109_ACCESS_2020_2973169 crossref_primary_10_1109_JSEN_2020_3029430 crossref_primary_10_1109_TNNLS_2018_2854796 crossref_primary_10_1109_TNNLS_2023_3339885 crossref_primary_10_1007_s00500_022_07484_z crossref_primary_10_1016_j_procs_2021_09_173 crossref_primary_10_1109_TNNLS_2022_3186528 crossref_primary_10_1016_j_ins_2020_06_010 crossref_primary_10_1016_j_neucom_2022_11_006 crossref_primary_10_1109_TCYB_2021_3053414 crossref_primary_10_1109_TNNLS_2020_3023711 crossref_primary_10_3390_s22030941 crossref_primary_10_1007_s10489_022_03510_7 crossref_primary_10_1007_s13369_023_08026_x crossref_primary_10_3390_info11040194 crossref_primary_10_1007_s13369_022_06746_0 crossref_primary_10_1109_TNNLS_2023_3331304 crossref_primary_10_1109_TAC_2022_3175628 crossref_primary_10_1109_JMASS_2023_3292259 crossref_primary_10_3233_JIFS_179130 crossref_primary_10_1109_TNNLS_2018_2884797 crossref_primary_10_1016_j_automatica_2024_111714 crossref_primary_10_1109_OJCSYS_2022_3209945 crossref_primary_10_1109_TIE_2022_3165288 crossref_primary_10_1016_j_jiixd_2024_01_003 crossref_primary_10_1007_s10208_025_09689_8 crossref_primary_10_1007_s13369_023_08245_2 crossref_primary_10_1109_TNNLS_2021_3107742 crossref_primary_10_1016_j_neucom_2024_128677 crossref_primary_10_1109_ACCESS_2023_3297274 crossref_primary_10_1109_TNNLS_2020_3042981 crossref_primary_10_1016_j_arcontrol_2019_09_008 crossref_primary_10_1109_TAC_2021_3049335 crossref_primary_10_1002_rnc_5132 crossref_primary_10_1109_TITS_2023_3249900 crossref_primary_10_1109_TNNLS_2023_3348422 crossref_primary_10_1088_1757_899X_1074_1_012014 crossref_primary_10_1371_journal_pone_0317662
Cites_doi	10.2514/6.2006-6429 10.1109/CDC.2003.1273070 10.1007/978-3-319-13823-7_31 10.1109/IROS.2015.7354295 10.1109/TSMCA.2009.2028239 10.2200/S00268ED1V01Y201005AIM009 10.1016/0005-1098(93)90122-A 10.1023/A:1017940631555 10.1163/1568553042674662 10.1109/TNNLS.2014.2360724 10.1109/TNNLS.2014.2333092 10.1109/TNN.2004.826221 10.1613/jair.1666 10.1109/CDC.2014.7039737 10.1109/TCYB.2015.2417170 10.1023/B:MACH.0000039779.47329.3a 10.1613/jair.301 10.1109/TNN.2006.881710 10.1109/TFUZZ.2015.2418000 10.1109/TNNLS.2014.2371046 10.1177/027836498600500106 10.1109/72.914523 10.1109/TNNLS.2014.2378812 10.1016/j.automatica.2008.11.017 10.3182/20070822-3-ZA-2920.00076 10.1109/ICRA.2013.6631230 10.1007/BF01840369 10.1016/S0005-1098(98)00153-8 10.1016/j.robot.2008.10.024 10.2514/1.12597 10.1016/B978-1-55860-335-6.50021-0 10.1162/NECO_a_00600 10.1007/BF00992698
ContentType	Journal Article
Copyright	Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018
Copyright_xml	– notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018
DBID	97E RIA RIE AAYXX CITATION NPM 7QF 7QO 7QP 7QQ 7QR 7SC 7SE 7SP 7SR 7TA 7TB 7TK 7U5 8BQ 8FD F28 FR3 H8D JG9 JQ2 KR7 L7M L~C L~D P64 7X8
DOI	10.1109/TNNLS.2017.2654539
DatabaseName	IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998-Present IEEE Electronic Library (IEL) CrossRef PubMed Aluminium Industry Abstracts Biotechnology Research Abstracts Calcium & Calcified Tissue Abstracts Ceramic Abstracts Chemoreception Abstracts Computer and Information Systems Abstracts Corrosion Abstracts Electronics & Communications Abstracts Engineered Materials Abstracts Materials Business File Mechanical & Transportation Engineering Abstracts Neurosciences Abstracts Solid State and Superconductivity Abstracts METADEX Technology Research Database ANTE: Abstracts in New Technology & Engineering Engineering Research Database Aerospace Database Materials Research Database ProQuest Computer Science Collection Civil Engineering Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional Biotechnology and BioEngineering Abstracts MEDLINE - Academic
DatabaseTitle	CrossRef PubMed Materials Research Database Technology Research Database Computer and Information Systems Abstracts – Academic Mechanical & Transportation Engineering Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Materials Business File Aerospace Database Engineered Materials Abstracts Biotechnology Research Abstracts Chemoreception Abstracts Advanced Technologies Database with Aerospace ANTE: Abstracts in New Technology & Engineering Civil Engineering Abstracts Aluminium Industry Abstracts Electronics & Communications Abstracts Ceramic Abstracts Neurosciences Abstracts METADEX Biotechnology and BioEngineering Abstracts Computer and Information Systems Abstracts Professional Solid State and Superconductivity Abstracts Engineering Research Database Calcium & Calcified Tissue Abstracts Corrosion Abstracts MEDLINE - Academic
DatabaseTitleList	MEDLINE - Academic PubMed Materials Research Database
Database_xml	– sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: RIE name: IEEE Xplore url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISSN	2162-2388
EndPage	1081
ExternalDocumentID	28182560 10_1109_TNNLS_2017_2654539 7842559
Genre	orig-research Journal Article
GroupedDBID	0R~ 4.4 5VS 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACIWK ACPRK AENEX AFRAH AGQYO AGSQL AHBIQ AKJIK AKQYR ALMA_UNASSIGNED_HOLDINGS ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ EBS EJD IFIPE IPLJI JAVBF M43 MS~ O9- OCL PQQKQ RIA RIE RNS AAYXX CITATION NPM RIG 7QF 7QO 7QP 7QQ 7QR 7SC 7SE 7SP 7SR 7TA 7TB 7TK 7U5 8BQ 8FD F28 FR3 H8D JG9 JQ2 KR7 L7M L~C L~D P64 7X8
ID	FETCH-LOGICAL-c351t-203c3fc2d2bc5804a48b5c95ed3981d53db5362ef807d52ff3e51dd7e4c20a03
IEDL.DBID	RIE
ISSN	2162-237X 2162-2388
IngestDate	Sat Sep 27 21:44:02 EDT 2025 Mon Jun 30 06:46:16 EDT 2025 Thu Jan 02 23:01:35 EST 2025 Wed Oct 01 00:44:42 EDT 2025 Thu Apr 24 23:05:23 EDT 2025 Wed Aug 27 02:52:22 EDT 2025
IsPeerReviewed	false
IsScholarly	true
Issue	4
Language	English
License	https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c351t-203c3fc2d2bc5804a48b5c95ed3981d53db5362ef807d52ff3e51dd7e4c20a03
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ORCID	0000-0002-5593-4471 0000-0003-1994-2965
PMID	28182560
PQID	2015121797
PQPubID	85436
PageCount	13
ParticipantIDs	proquest_journals_2015121797 crossref_citationtrail_10_1109_TNNLS_2017_2654539 proquest_miscellaneous_1867543607 pubmed_primary_28182560 ieee_primary_7842559 crossref_primary_10_1109_TNNLS_2017_2654539
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2018-04-01
PublicationDateYYYYMMDD	2018-04-01
PublicationDate_xml	– month: 04 year: 2018 text: 2018-04-01 day: 01
PublicationDecade	2010
PublicationPlace	United States
PublicationPlace_xml	– name: United States – name: Piscataway
PublicationTitle	IEEE transaction on neural networks and learning systems
PublicationTitleAbbrev	TNNLS
PublicationTitleAlternate	IEEE Trans Neural Netw Learn Syst
PublicationYear	2018
Publisher	IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml	– name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References	ref35 ref13 ref12 ref37 ref15 ref14 ref11 hans (ref30) 2008 ref10 moore (ref39) 1966; 4 ref2 pecka (ref31) 2015 bellman (ref9) 1957 ref1 neuneier (ref21) 1999 ref17 ref38 ref16 haussler (ref5) 1990; 2 ref18 geibel (ref36) 2001 thomas (ref20) 2015 gehring (ref29) 2013 garcía (ref19) 2015; 16 ref46 ref45 ref23 ref26 ref47 ref25 ref42 ref41 ref44 ref43 kaelbling (ref34) 1996; 4 ref28 ref27 sutton (ref4) 1998 ref8 moldovan (ref33) 2012 polo (ref32) 2011 ref3 bertsekas (ref7) 1996 ref6 geibel (ref24) 2005; 24 ref40 mihatsch (ref22) 2014; 49
References_xml	– ident: ref2 doi: 10.2514/6.2006-6429 – ident: ref42 doi: 10.1109/CDC.2003.1273070 – volume: 2 start-page: 1101 year: 1990 ident: ref5 article-title: Probably approximately correct learning publication-title: Proc 8th Nat Conf Artif Intell – ident: ref17 doi: 10.1007/978-3-319-13823-7_31 – year: 1996 ident: ref7 publication-title: Neuro-Dynamic Programming – ident: ref28 doi: 10.1109/IROS.2015.7354295 – ident: ref47 doi: 10.1109/TSMCA.2009.2028239 – ident: ref35 doi: 10.2200/S00268ED1V01Y201005AIM009 – volume: 16 start-page: 1437 year: 2015 ident: ref19 article-title: A comprehensive survey on safe reinforcement learning publication-title: J Mach Learn Res – ident: ref1 doi: 10.1016/0005-1098(93)90122-A – volume: 49 start-page: 267 year: 2014 ident: ref22 article-title: Risk-sensitive reinforcement learning publication-title: Mach Learn doi: 10.1023/A:1017940631555 – start-page: 143 year: 2008 ident: ref30 article-title: Safe exploration for reinforcement learning publication-title: Proc ESANN – ident: ref37 doi: 10.1163/1568553042674662 – year: 1998 ident: ref4 publication-title: Introduction to Reinforcement Learning – ident: ref11 doi: 10.1109/TNNLS.2014.2360724 – ident: ref16 doi: 10.1109/TNNLS.2014.2333092 – ident: ref14 doi: 10.1109/TNN.2004.826221 – volume: 24 start-page: 81 year: 2005 ident: ref24 article-title: Risk-sensitive reinforcement learning applied to control under constraints publication-title: J Artif Intell Res doi: 10.1613/jair.1666 – ident: ref46 doi: 10.1109/CDC.2014.7039737 – ident: ref13 doi: 10.1109/TCYB.2015.2417170 – ident: ref26 doi: 10.1023/B:MACH.0000039779.47329.3a – volume: 4 start-page: 237 year: 1996 ident: ref34 article-title: Reinforcement learning: A survey publication-title: J Artif Intell Res doi: 10.1613/jair.301 – ident: ref10 doi: 10.1109/TNN.2006.881710 – ident: ref12 doi: 10.1109/TFUZZ.2015.2418000 – start-page: 188 year: 2012 ident: ref33 article-title: Safe exploration in Markov decision processes publication-title: Proc 29th Int Conf Mach Learn – ident: ref6 doi: 10.1109/TNNLS.2014.2371046 – start-page: 2380 year: 2015 ident: ref20 article-title: High confidence policy improvement publication-title: Proc Int Conf Mach Learn (ICML) – start-page: 85 year: 2015 ident: ref31 article-title: Safe exploration for reinforcement learning' in real unstructured environments publication-title: Proc Comput Vis Winter Workshop – start-page: 1037 year: 2013 ident: ref29 article-title: Smart exploration in reinforcement learning using absolute temporal difference errors publication-title: Proc Int Conf Auton Agents and Multi Agent Syst – ident: ref40 doi: 10.1177/027836498600500106 – ident: ref8 doi: 10.1109/72.914523 – year: 1957 ident: ref9 publication-title: Dynamic Programming – ident: ref15 doi: 10.1109/TNNLS.2014.2378812 – ident: ref44 doi: 10.1016/j.automatica.2008.11.017 – start-page: 1031 year: 1999 ident: ref21 article-title: Risk sensitive reinforcement learning publication-title: Proc Adv Neural Inf Process Syst – start-page: 76 year: 2011 ident: ref32 article-title: Safe reinforcement learning in high-risk tasks through policy improvement publication-title: Proc IEEE Symp Adapt Dynamic Program Reinforcement Learn (ADPRL) – ident: ref45 doi: 10.3182/20070822-3-ZA-2920.00076 – ident: ref41 doi: 10.1109/ICRA.2013.6631230 – ident: ref43 doi: 10.1007/BF01840369 – volume: 4 year: 1966 ident: ref39 publication-title: Interval Analysis – start-page: 162 year: 2001 ident: ref36 article-title: Reinforcement learning with bounded risk publication-title: Proc ICML – ident: ref25 doi: 10.1016/S0005-1098(98)00153-8 – ident: ref27 doi: 10.1016/j.robot.2008.10.024 – ident: ref3 doi: 10.2514/1.12597 – ident: ref18 doi: 10.1016/B978-1-55860-335-6.50021-0 – ident: ref23 doi: 10.1162/NECO_a_00600 – ident: ref38 doi: 10.1007/BF00992698
SSID	ssj0000605649
Score	2.5670557
Snippet	Self-learning approaches, such as reinforcement learning, offer new possibilities for autonomous control of uncertain or time-varying systems. However,...
SourceID	proquest pubmed crossref ieee
SourceType	Aggregation Database Index Database Enrichment Source Publisher
StartPage	1069
SubjectTerms	Adaptation models Adaptive controllers Aerodynamics Aircraft Aircraft control Algorithms Altitude control Backups Complex systems Computer simulation Exploration Formulations Heuristic algorithms Learning Learning (artificial intelligence) Machine learning Measurement model-free control Reinforcement reinforcement learning (RL) Risk perception safe exploration Safety
Title	Safe Exploration Algorithms for Reinforcement Learning Controllers
URI	https://ieeexplore.ieee.org/document/7842559 https://www.ncbi.nlm.nih.gov/pubmed/28182560 https://www.proquest.com/docview/2015121797 https://www.proquest.com/docview/1867543607
Volume	29
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
journalDatabaseRights	– providerCode: PRVIEE databaseName: IEEE Xplore customDbUrl: eissn: 2162-2388 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0000605649 issn: 2162-237X databaseCode: RIE dateStart: 20120101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT9wwEB4BJy7QAoUtFBmpt5LFsWM7OVJUhBDsoSzS3iLHHgNiu1vtZi_8-trOQ2pFEbdI8SPxzGS-2DPfAHw1FIuKMZekUrgkY8omWlOdZDpPXWG0kSrkO9-O5NV9dj0RkzU47XNhEDEGn-EwXMazfDs3q7BVdqbCmZEo1mFd5bLJ1er3U6jH5TKiXZZKljCuJl2ODC3OxqPRzV0I5FJDJj1o4IEtNBAhBY__l0uKNVb-Dzej27nchtvugZtok-fhqq6G5uUfLsf3vtEH2GrxJzlvFOYjrOFsB7a72g6kNfVd-H6nHZImQi8Kj5xPH-aLp_rx15J4oEt-YuRcNXF7kbQ0rQ_kogl9n3pUuQfjyx_ji6ukrbeQGC7S2hsMN9wZZlllRE4zneWVMIVAywsPawW3lfD-Dl1OlRXMOY4itVZhZhjVlH-Cjdl8hgdAclYpS61U2qiMaq5RCIlCG_Qj5k4PIO1WvDQtF3koiTEt4z8JLcoosDIIrGwFNoBvfZ_fDRPHm613w2r3LduFHsBRJ9iyNdZl6Odhj_8yqQGc9Le9mYWzEz3D-WpZBt4_kXFJfZv9RiH6sTs9-vz6nIew6WfIm3CfI9ioFyv84pFMXR1HFf4Dyq7tKg
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Lb9QwEB6VcoALBQrtQgEjcYNsHT-TY6moFtjdA12kvUWOPS6oyy7qZi_8emznIYEAcYsUPxLPTOaLPfMNwCtLsawZ81mupM8E0y4zhppMmCL3pTVW6ZjvPJuryWfxYSmXe_BmyIVBxBR8huN4mc7y3cbu4lbZqY5nRrK8BbelEEK22VrDjgoNyFwlvMtyxTLG9bLPkqHl6WI-n17GUC49ZirABh75QiMVUvT5vzilVGXl74AzOZ6LA5j1j9zGm1yPd009tj9-Y3P833e6D_c6BErOWpV5AHu4fggHfXUH0hn7Iby9NB5JG6OXxEfOVlebm6_Nl29bEqAu-YSJddWmDUbSEbVekfM2-H0VcOUjWFy8W5xPsq7iQma5zJtgMtxyb5ljtZUFFUYUtbSlRMfLAGwld7UMHg99QbWTzHuOMndOo7CMGsofw_56s8ZjIAWrtaNOaWO1oIYblFKhNBbDiIU3I8j7Fa9sx0Yei2KsqvRXQssqCayKAqs6gY3g9dDne8vF8c_Wh3G1h5bdQo_gpBds1ZnrNvYLwCd8m_QIXg63g6HF0xOzxs1uW0XmPym4oqHNUasQw9i9Hj3585wv4M5kMZtW0_fzj0_hbpitaIN_TmC_udnhs4Brmvp5UuefXXvwdw
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Safe+Exploration+Algorithms+for+Reinforcement+Learning+Controllers&rft.jtitle=IEEE+transaction+on+neural+networks+and+learning+systems&rft.au=Mannucci%2C+Tommaso&rft.au=van+Kampen%2C+Erik-Jan&rft.au=de+Visser%2C+Cornelis&rft.au=Chu%2C+Qiping&rft.date=2018-04-01&rft.pub=IEEE&rft.issn=2162-237X&rft.volume=29&rft.issue=4&rft.spage=1069&rft.epage=1081&rft_id=info:doi/10.1109%2FTNNLS.2017.2654539&rft_id=info%3Apmid%2F28182560&rft.externalDocID=7842559
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2162-237X&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2162-237X&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2162-237X&client=summon