Safe Exploration Algorithms for Reinforcement Learning Controllers

Self-learning approaches, such as reinforcement learning, offer new possibilities for autonomous control of uncertain or time-varying systems. However, exploring an unknown environment under limited prediction capabilities is a challenge for a learning agent. If the environment is dangerous, free ex...

Full description

Saved in:
Bibliographic Details
Published inIEEE transaction on neural networks and learning systems Vol. 29; no. 4; pp. 1069 - 1081
Main Authors Mannucci, Tommaso, van Kampen, Erik-Jan, de Visser, Cornelis, Chu, Qiping
Format Journal Article
LanguageEnglish
Published United States IEEE 01.04.2018
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text
ISSN2162-237X
2162-2388
2162-2388
DOI10.1109/TNNLS.2017.2654539

Cover

Abstract Self-learning approaches, such as reinforcement learning, offer new possibilities for autonomous control of uncertain or time-varying systems. However, exploring an unknown environment under limited prediction capabilities is a challenge for a learning agent. If the environment is dangerous, free exploration can result in physical damage or in an otherwise unacceptable behavior. With respect to existing methods, the main contribution of this paper is the definition of a new approach that does not require global safety functions, nor specific formulations of the dynamics or of the environment, but relies on interval estimation of the dynamics of the agent during the exploration phase, assuming a limited capability of the agent to perceive the presence of incoming fatal states. Two algorithms are presented with this approach. The first is the Safety Handling Exploration with Risk Perception Algorithm (SHERPA), which provides safety by individuating temporary safety functions, called backups. SHERPA is shown in a simulated, simplified quadrotor task, for which dangerous states are avoided. The second algorithm, denominated OptiSHERPA, can safely handle more dynamically complex systems for which SHERPA is not sufficient through the use of safety metrics. An application of OptiSHERPA is simulated on an aircraft altitude control task.
AbstractList Self-learning approaches, such as reinforcement learning, offer new possibilities for autonomous control of uncertain or time-varying systems. However, exploring an unknown environment under limited prediction capabilities is a challenge for a learning agent. If the environment is dangerous, free exploration can result in physical damage or in an otherwise unacceptable behavior. With respect to existing methods, the main contribution of this paper is the definition of a new approach that does not require global safety functions, nor specific formulations of the dynamics or of the environment, but relies on interval estimation of the dynamics of the agent during the exploration phase, assuming a limited capability of the agent to perceive the presence of incoming fatal states. Two algorithms are presented with this approach. The first is the Safety Handling Exploration with Risk Perception Algorithm (SHERPA), which provides safety by individuating temporary safety functions, called backups. SHERPA is shown in a simulated, simplified quadrotor task, for which dangerous states are avoided. The second algorithm, denominated OptiSHERPA, can safely handle more dynamically complex systems for which SHERPA is not sufficient through the use of safety metrics. An application of OptiSHERPA is simulated on an aircraft altitude control task.Self-learning approaches, such as reinforcement learning, offer new possibilities for autonomous control of uncertain or time-varying systems. However, exploring an unknown environment under limited prediction capabilities is a challenge for a learning agent. If the environment is dangerous, free exploration can result in physical damage or in an otherwise unacceptable behavior. With respect to existing methods, the main contribution of this paper is the definition of a new approach that does not require global safety functions, nor specific formulations of the dynamics or of the environment, but relies on interval estimation of the dynamics of the agent during the exploration phase, assuming a limited capability of the agent to perceive the presence of incoming fatal states. Two algorithms are presented with this approach. The first is the Safety Handling Exploration with Risk Perception Algorithm (SHERPA), which provides safety by individuating temporary safety functions, called backups. SHERPA is shown in a simulated, simplified quadrotor task, for which dangerous states are avoided. The second algorithm, denominated OptiSHERPA, can safely handle more dynamically complex systems for which SHERPA is not sufficient through the use of safety metrics. An application of OptiSHERPA is simulated on an aircraft altitude control task.
Self-learning approaches, such as reinforcement learning, offer new possibilities for autonomous control of uncertain or time-varying systems. However, exploring an unknown environment under limited prediction capabilities is a challenge for a learning agent. If the environment is dangerous, free exploration can result in physical damage or in an otherwise unacceptable behavior. With respect to existing methods, the main contribution of this paper is the definition of a new approach that does not require global safety functions, nor specific formulations of the dynamics or of the environment, but relies on interval estimation of the dynamics of the agent during the exploration phase, assuming a limited capability of the agent to perceive the presence of incoming fatal states. Two algorithms are presented with this approach. The first is the Safety Handling Exploration with Risk Perception Algorithm (SHERPA), which provides safety by individuating temporary safety functions, called backups. SHERPA is shown in a simulated, simplified quadrotor task, for which dangerous states are avoided. The second algorithm, denominated OptiSHERPA, can safely handle more dynamically complex systems for which SHERPA is not sufficient through the use of safety metrics. An application of OptiSHERPA is simulated on an aircraft altitude control task.
Author Mannucci, Tommaso
Chu, Qiping
de Visser, Cornelis
van Kampen, Erik-Jan
Author_xml – sequence: 1
  givenname: Tommaso
  orcidid: 0000-0003-1994-2965
  surname: Mannucci
  fullname: Mannucci, Tommaso
  email: t.mannucci@tudelft.nl
  organization: Control and Simulation Division, Faculty of Aerospace Engineering, Delft University of Technology, Delft, The Netherlands
– sequence: 2
  givenname: Erik-Jan
  orcidid: 0000-0002-5593-4471
  surname: van Kampen
  fullname: van Kampen, Erik-Jan
  email: e.vankampen@tudelft.nl
  organization: Control and Simulation Division, Faculty of Aerospace Engineering, Delft University of Technology, Delft, The Netherlands
– sequence: 3
  givenname: Cornelis
  surname: de Visser
  fullname: de Visser, Cornelis
  email: c.c.devisser@tudelft.nl
  organization: Control and Simulation Division, Faculty of Aerospace Engineering, Delft University of Technology, Delft, The Netherlands
– sequence: 4
  givenname: Qiping
  surname: Chu
  fullname: Chu, Qiping
  email: q.p.chu@tudelft.nl
  organization: Control and Simulation Division, Faculty of Aerospace Engineering, Delft University of Technology, Delft, The Netherlands
BackLink https://www.ncbi.nlm.nih.gov/pubmed/28182560$$D View this record in MEDLINE/PubMed
BookMark eNp9kUlPwzAQhS1UxFL4AyChSFy4pHiJY-dYqrJIFUjQAzfLcSZglNjFTiX496QbBw7MZebwvafRe8do4LwDhM4IHhGCi-v54-PsZUQxESOa84yzYg8dUZLTlDIpB7-3eD1EpzF-4H5yzPOsOECHVBJJeY6P0M2LriGZfi0aH3RnvUvGzZsPtntvY1L7kDyDdf020ILrkhno4Kx7SybedcE3DYR4gvZr3UQ43e4hmt9O55P7dPZ09zAZz1LDOOlSiplhtaEVLQ2XONOZLLkpOFSskKTirCo5yynUEouK07pmwElVCcgMxRqzIbra2C6C_1xC7FRro4Gm0Q78Mioic8EzlmPRo5d_0A-_DK5_TvWBcUKJKFbUxZZali1UahFsq8O32oXTA3IDmOBjDFArY7t1SF3QtlEEq1UVal3FylmobRW9lP6R7tz_FZ1vRBYAfgVCZpTzgv0Ac2KTXQ
CODEN ITNNAL
CitedBy_id crossref_primary_10_1016_j_eswa_2024_124622
crossref_primary_10_1109_LRA_2024_3397531
crossref_primary_10_1109_TITS_2020_2989352
crossref_primary_10_1109_TCYB_2023_3283771
crossref_primary_10_1146_annurev_control_042920_020211
crossref_primary_10_1109_ACCESS_2019_2931884
crossref_primary_10_1016_j_ifacol_2021_08_562
crossref_primary_10_1109_ACCESS_2020_2973169
crossref_primary_10_1109_JSEN_2020_3029430
crossref_primary_10_1109_TNNLS_2018_2854796
crossref_primary_10_1109_TNNLS_2023_3339885
crossref_primary_10_1007_s00500_022_07484_z
crossref_primary_10_1016_j_procs_2021_09_173
crossref_primary_10_1109_TNNLS_2022_3186528
crossref_primary_10_1016_j_ins_2020_06_010
crossref_primary_10_1016_j_neucom_2022_11_006
crossref_primary_10_1109_TCYB_2021_3053414
crossref_primary_10_1109_TNNLS_2020_3023711
crossref_primary_10_3390_s22030941
crossref_primary_10_1007_s10489_022_03510_7
crossref_primary_10_1007_s13369_023_08026_x
crossref_primary_10_3390_info11040194
crossref_primary_10_1007_s13369_022_06746_0
crossref_primary_10_1109_TNNLS_2023_3331304
crossref_primary_10_1109_TAC_2022_3175628
crossref_primary_10_1109_JMASS_2023_3292259
crossref_primary_10_3233_JIFS_179130
crossref_primary_10_1109_TNNLS_2018_2884797
crossref_primary_10_1016_j_automatica_2024_111714
crossref_primary_10_1109_OJCSYS_2022_3209945
crossref_primary_10_1109_TIE_2022_3165288
crossref_primary_10_1016_j_jiixd_2024_01_003
crossref_primary_10_1007_s10208_025_09689_8
crossref_primary_10_1007_s13369_023_08245_2
crossref_primary_10_1109_TNNLS_2021_3107742
crossref_primary_10_1016_j_neucom_2024_128677
crossref_primary_10_1109_ACCESS_2023_3297274
crossref_primary_10_1109_TNNLS_2020_3042981
crossref_primary_10_1016_j_arcontrol_2019_09_008
crossref_primary_10_1109_TAC_2021_3049335
crossref_primary_10_1002_rnc_5132
crossref_primary_10_1109_TITS_2023_3249900
crossref_primary_10_1109_TNNLS_2023_3348422
crossref_primary_10_1088_1757_899X_1074_1_012014
crossref_primary_10_1371_journal_pone_0317662
Cites_doi 10.2514/6.2006-6429
10.1109/CDC.2003.1273070
10.1007/978-3-319-13823-7_31
10.1109/IROS.2015.7354295
10.1109/TSMCA.2009.2028239
10.2200/S00268ED1V01Y201005AIM009
10.1016/0005-1098(93)90122-A
10.1023/A:1017940631555
10.1163/1568553042674662
10.1109/TNNLS.2014.2360724
10.1109/TNNLS.2014.2333092
10.1109/TNN.2004.826221
10.1613/jair.1666
10.1109/CDC.2014.7039737
10.1109/TCYB.2015.2417170
10.1023/B:MACH.0000039779.47329.3a
10.1613/jair.301
10.1109/TNN.2006.881710
10.1109/TFUZZ.2015.2418000
10.1109/TNNLS.2014.2371046
10.1177/027836498600500106
10.1109/72.914523
10.1109/TNNLS.2014.2378812
10.1016/j.automatica.2008.11.017
10.3182/20070822-3-ZA-2920.00076
10.1109/ICRA.2013.6631230
10.1007/BF01840369
10.1016/S0005-1098(98)00153-8
10.1016/j.robot.2008.10.024
10.2514/1.12597
10.1016/B978-1-55860-335-6.50021-0
10.1162/NECO_a_00600
10.1007/BF00992698
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018
DBID 97E
RIA
RIE
AAYXX
CITATION
NPM
7QF
7QO
7QP
7QQ
7QR
7SC
7SE
7SP
7SR
7TA
7TB
7TK
7U5
8BQ
8FD
F28
FR3
H8D
JG9
JQ2
KR7
L7M
L~C
L~D
P64
7X8
DOI 10.1109/TNNLS.2017.2654539
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE All-Society Periodicals Package (ASPP) 1998-Present
IEEE Electronic Library (IEL)
CrossRef
PubMed
Aluminium Industry Abstracts
Biotechnology Research Abstracts
Calcium & Calcified Tissue Abstracts
Ceramic Abstracts
Chemoreception Abstracts
Computer and Information Systems Abstracts
Corrosion Abstracts
Electronics & Communications Abstracts
Engineered Materials Abstracts
Materials Business File
Mechanical & Transportation Engineering Abstracts
Neurosciences Abstracts
Solid State and Superconductivity Abstracts
METADEX
Technology Research Database
ANTE: Abstracts in New Technology & Engineering
Engineering Research Database
Aerospace Database
Materials Research Database
ProQuest Computer Science Collection
Civil Engineering Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
Biotechnology and BioEngineering Abstracts
MEDLINE - Academic
DatabaseTitle CrossRef
PubMed
Materials Research Database
Technology Research Database
Computer and Information Systems Abstracts – Academic
Mechanical & Transportation Engineering Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Materials Business File
Aerospace Database
Engineered Materials Abstracts
Biotechnology Research Abstracts
Chemoreception Abstracts
Advanced Technologies Database with Aerospace
ANTE: Abstracts in New Technology & Engineering
Civil Engineering Abstracts
Aluminium Industry Abstracts
Electronics & Communications Abstracts
Ceramic Abstracts
Neurosciences Abstracts
METADEX
Biotechnology and BioEngineering Abstracts
Computer and Information Systems Abstracts Professional
Solid State and Superconductivity Abstracts
Engineering Research Database
Calcium & Calcified Tissue Abstracts
Corrosion Abstracts
MEDLINE - Academic
DatabaseTitleList MEDLINE - Academic
PubMed

Materials Research Database
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: RIE
  name: IEEE Xplore
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 2162-2388
EndPage 1081
ExternalDocumentID 28182560
10_1109_TNNLS_2017_2654539
7842559
Genre orig-research
Journal Article
GroupedDBID 0R~
4.4
5VS
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABQJQ
ABVLG
ACIWK
ACPRK
AENEX
AFRAH
AGQYO
AGSQL
AHBIQ
AKJIK
AKQYR
ALMA_UNASSIGNED_HOLDINGS
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
EBS
EJD
IFIPE
IPLJI
JAVBF
M43
MS~
O9-
OCL
PQQKQ
RIA
RIE
RNS
AAYXX
CITATION
NPM
RIG
7QF
7QO
7QP
7QQ
7QR
7SC
7SE
7SP
7SR
7TA
7TB
7TK
7U5
8BQ
8FD
F28
FR3
H8D
JG9
JQ2
KR7
L7M
L~C
L~D
P64
7X8
ID FETCH-LOGICAL-c351t-203c3fc2d2bc5804a48b5c95ed3981d53db5362ef807d52ff3e51dd7e4c20a03
IEDL.DBID RIE
ISSN 2162-237X
2162-2388
IngestDate Sat Sep 27 21:44:02 EDT 2025
Mon Jun 30 06:46:16 EDT 2025
Thu Jan 02 23:01:35 EST 2025
Wed Oct 01 00:44:42 EDT 2025
Thu Apr 24 23:05:23 EDT 2025
Wed Aug 27 02:52:22 EDT 2025
IsPeerReviewed false
IsScholarly true
Issue 4
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c351t-203c3fc2d2bc5804a48b5c95ed3981d53db5362ef807d52ff3e51dd7e4c20a03
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ORCID 0000-0002-5593-4471
0000-0003-1994-2965
PMID 28182560
PQID 2015121797
PQPubID 85436
PageCount 13
ParticipantIDs proquest_journals_2015121797
crossref_citationtrail_10_1109_TNNLS_2017_2654539
proquest_miscellaneous_1867543607
pubmed_primary_28182560
ieee_primary_7842559
crossref_primary_10_1109_TNNLS_2017_2654539
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2018-04-01
PublicationDateYYYYMMDD 2018-04-01
PublicationDate_xml – month: 04
  year: 2018
  text: 2018-04-01
  day: 01
PublicationDecade 2010
PublicationPlace United States
PublicationPlace_xml – name: United States
– name: Piscataway
PublicationTitle IEEE transaction on neural networks and learning systems
PublicationTitleAbbrev TNNLS
PublicationTitleAlternate IEEE Trans Neural Netw Learn Syst
PublicationYear 2018
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref35
ref13
ref12
ref37
ref15
ref14
ref11
hans (ref30) 2008
ref10
moore (ref39) 1966; 4
ref2
pecka (ref31) 2015
bellman (ref9) 1957
ref1
neuneier (ref21) 1999
ref17
ref38
ref16
haussler (ref5) 1990; 2
ref18
geibel (ref36) 2001
thomas (ref20) 2015
gehring (ref29) 2013
garcía (ref19) 2015; 16
ref46
ref45
ref23
ref26
ref47
ref25
ref42
ref41
ref44
ref43
kaelbling (ref34) 1996; 4
ref28
ref27
sutton (ref4) 1998
ref8
moldovan (ref33) 2012
polo (ref32) 2011
ref3
bertsekas (ref7) 1996
ref6
geibel (ref24) 2005; 24
ref40
mihatsch (ref22) 2014; 49
References_xml – ident: ref2
  doi: 10.2514/6.2006-6429
– ident: ref42
  doi: 10.1109/CDC.2003.1273070
– volume: 2
  start-page: 1101
  year: 1990
  ident: ref5
  article-title: Probably approximately correct learning
  publication-title: Proc 8th Nat Conf Artif Intell
– ident: ref17
  doi: 10.1007/978-3-319-13823-7_31
– year: 1996
  ident: ref7
  publication-title: Neuro-Dynamic Programming
– ident: ref28
  doi: 10.1109/IROS.2015.7354295
– ident: ref47
  doi: 10.1109/TSMCA.2009.2028239
– ident: ref35
  doi: 10.2200/S00268ED1V01Y201005AIM009
– volume: 16
  start-page: 1437
  year: 2015
  ident: ref19
  article-title: A comprehensive survey on safe reinforcement learning
  publication-title: J Mach Learn Res
– ident: ref1
  doi: 10.1016/0005-1098(93)90122-A
– volume: 49
  start-page: 267
  year: 2014
  ident: ref22
  article-title: Risk-sensitive reinforcement learning
  publication-title: Mach Learn
  doi: 10.1023/A:1017940631555
– start-page: 143
  year: 2008
  ident: ref30
  article-title: Safe exploration for reinforcement learning
  publication-title: Proc ESANN
– ident: ref37
  doi: 10.1163/1568553042674662
– year: 1998
  ident: ref4
  publication-title: Introduction to Reinforcement Learning
– ident: ref11
  doi: 10.1109/TNNLS.2014.2360724
– ident: ref16
  doi: 10.1109/TNNLS.2014.2333092
– ident: ref14
  doi: 10.1109/TNN.2004.826221
– volume: 24
  start-page: 81
  year: 2005
  ident: ref24
  article-title: Risk-sensitive reinforcement learning applied to control under constraints
  publication-title: J Artif Intell Res
  doi: 10.1613/jair.1666
– ident: ref46
  doi: 10.1109/CDC.2014.7039737
– ident: ref13
  doi: 10.1109/TCYB.2015.2417170
– ident: ref26
  doi: 10.1023/B:MACH.0000039779.47329.3a
– volume: 4
  start-page: 237
  year: 1996
  ident: ref34
  article-title: Reinforcement learning: A survey
  publication-title: J Artif Intell Res
  doi: 10.1613/jair.301
– ident: ref10
  doi: 10.1109/TNN.2006.881710
– ident: ref12
  doi: 10.1109/TFUZZ.2015.2418000
– start-page: 188
  year: 2012
  ident: ref33
  article-title: Safe exploration in Markov decision processes
  publication-title: Proc 29th Int Conf Mach Learn
– ident: ref6
  doi: 10.1109/TNNLS.2014.2371046
– start-page: 2380
  year: 2015
  ident: ref20
  article-title: High confidence policy improvement
  publication-title: Proc Int Conf Mach Learn (ICML)
– start-page: 85
  year: 2015
  ident: ref31
  article-title: Safe exploration for reinforcement learning' in real unstructured environments
  publication-title: Proc Comput Vis Winter Workshop
– start-page: 1037
  year: 2013
  ident: ref29
  article-title: Smart exploration in reinforcement learning using absolute temporal difference errors
  publication-title: Proc Int Conf Auton Agents and Multi Agent Syst
– ident: ref40
  doi: 10.1177/027836498600500106
– ident: ref8
  doi: 10.1109/72.914523
– year: 1957
  ident: ref9
  publication-title: Dynamic Programming
– ident: ref15
  doi: 10.1109/TNNLS.2014.2378812
– ident: ref44
  doi: 10.1016/j.automatica.2008.11.017
– start-page: 1031
  year: 1999
  ident: ref21
  article-title: Risk sensitive reinforcement learning
  publication-title: Proc Adv Neural Inf Process Syst
– start-page: 76
  year: 2011
  ident: ref32
  article-title: Safe reinforcement learning in high-risk tasks through policy improvement
  publication-title: Proc IEEE Symp Adapt Dynamic Program Reinforcement Learn (ADPRL)
– ident: ref45
  doi: 10.3182/20070822-3-ZA-2920.00076
– ident: ref41
  doi: 10.1109/ICRA.2013.6631230
– ident: ref43
  doi: 10.1007/BF01840369
– volume: 4
  year: 1966
  ident: ref39
  publication-title: Interval Analysis
– start-page: 162
  year: 2001
  ident: ref36
  article-title: Reinforcement learning with bounded risk
  publication-title: Proc ICML
– ident: ref25
  doi: 10.1016/S0005-1098(98)00153-8
– ident: ref27
  doi: 10.1016/j.robot.2008.10.024
– ident: ref3
  doi: 10.2514/1.12597
– ident: ref18
  doi: 10.1016/B978-1-55860-335-6.50021-0
– ident: ref23
  doi: 10.1162/NECO_a_00600
– ident: ref38
  doi: 10.1007/BF00992698
SSID ssj0000605649
Score 2.5670557
Snippet Self-learning approaches, such as reinforcement learning, offer new possibilities for autonomous control of uncertain or time-varying systems. However,...
SourceID proquest
pubmed
crossref
ieee
SourceType Aggregation Database
Index Database
Enrichment Source
Publisher
StartPage 1069
SubjectTerms Adaptation models
Adaptive controllers
Aerodynamics
Aircraft
Aircraft control
Algorithms
Altitude control
Backups
Complex systems
Computer simulation
Exploration
Formulations
Heuristic algorithms
Learning
Learning (artificial intelligence)
Machine learning
Measurement
model-free control
Reinforcement
reinforcement learning (RL)
Risk perception
safe exploration
Safety
Title Safe Exploration Algorithms for Reinforcement Learning Controllers
URI https://ieeexplore.ieee.org/document/7842559
https://www.ncbi.nlm.nih.gov/pubmed/28182560
https://www.proquest.com/docview/2015121797
https://www.proquest.com/docview/1867543607
Volume 29
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVIEE
  databaseName: IEEE Xplore
  customDbUrl:
  eissn: 2162-2388
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0000605649
  issn: 2162-237X
  databaseCode: RIE
  dateStart: 20120101
  isFulltext: true
  titleUrlDefault: https://ieeexplore.ieee.org/
  providerName: IEEE
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT9wwEB4BJy7QAoUtFBmpt5LFsWM7OVJUhBDsoSzS3iLHHgNiu1vtZi_8-trOQ2pFEbdI8SPxzGS-2DPfAHw1FIuKMZekUrgkY8omWlOdZDpPXWG0kSrkO9-O5NV9dj0RkzU47XNhEDEGn-EwXMazfDs3q7BVdqbCmZEo1mFd5bLJ1er3U6jH5TKiXZZKljCuJl2ODC3OxqPRzV0I5FJDJj1o4IEtNBAhBY__l0uKNVb-Dzej27nchtvugZtok-fhqq6G5uUfLsf3vtEH2GrxJzlvFOYjrOFsB7a72g6kNfVd-H6nHZImQi8Kj5xPH-aLp_rx15J4oEt-YuRcNXF7kbQ0rQ_kogl9n3pUuQfjyx_ji6ukrbeQGC7S2hsMN9wZZlllRE4zneWVMIVAywsPawW3lfD-Dl1OlRXMOY4itVZhZhjVlH-Cjdl8hgdAclYpS61U2qiMaq5RCIlCG_Qj5k4PIO1WvDQtF3koiTEt4z8JLcoosDIIrGwFNoBvfZ_fDRPHm613w2r3LduFHsBRJ9iyNdZl6Odhj_8yqQGc9Le9mYWzEz3D-WpZBt4_kXFJfZv9RiH6sTs9-vz6nIew6WfIm3CfI9ioFyv84pFMXR1HFf4Dyq7tKg
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Lb9QwEB6VcoALBQrtQgEjcYNsHT-TY6moFtjdA12kvUWOPS6oyy7qZi_8emznIYEAcYsUPxLPTOaLPfMNwCtLsawZ81mupM8E0y4zhppMmCL3pTVW6ZjvPJuryWfxYSmXe_BmyIVBxBR8huN4mc7y3cbu4lbZqY5nRrK8BbelEEK22VrDjgoNyFwlvMtyxTLG9bLPkqHl6WI-n17GUC49ZirABh75QiMVUvT5vzilVGXl74AzOZ6LA5j1j9zGm1yPd009tj9-Y3P833e6D_c6BErOWpV5AHu4fggHfXUH0hn7Iby9NB5JG6OXxEfOVlebm6_Nl29bEqAu-YSJddWmDUbSEbVekfM2-H0VcOUjWFy8W5xPsq7iQma5zJtgMtxyb5ljtZUFFUYUtbSlRMfLAGwld7UMHg99QbWTzHuOMndOo7CMGsofw_56s8ZjIAWrtaNOaWO1oIYblFKhNBbDiIU3I8j7Fa9sx0Yei2KsqvRXQssqCayKAqs6gY3g9dDne8vF8c_Wh3G1h5bdQo_gpBds1ZnrNvYLwCd8m_QIXg63g6HF0xOzxs1uW0XmPym4oqHNUasQw9i9Hj3585wv4M5kMZtW0_fzj0_hbpitaIN_TmC_udnhs4Brmvp5UuefXXvwdw
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Safe+Exploration+Algorithms+for+Reinforcement+Learning+Controllers&rft.jtitle=IEEE+transaction+on+neural+networks+and+learning+systems&rft.au=Mannucci%2C+Tommaso&rft.au=van+Kampen%2C+Erik-Jan&rft.au=de+Visser%2C+Cornelis&rft.au=Chu%2C+Qiping&rft.date=2018-04-01&rft.pub=IEEE&rft.issn=2162-237X&rft.volume=29&rft.issue=4&rft.spage=1069&rft.epage=1081&rft_id=info:doi/10.1109%2FTNNLS.2017.2654539&rft_id=info%3Apmid%2F28182560&rft.externalDocID=7842559
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2162-237X&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2162-237X&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2162-237X&client=summon