Safe Exploration Algorithms for Reinforcement Learning Controllers
Self-learning approaches, such as reinforcement learning, offer new possibilities for autonomous control of uncertain or time-varying systems. However, exploring an unknown environment under limited prediction capabilities is a challenge for a learning agent. If the environment is dangerous, free ex...
        Saved in:
      
    
          | Published in | IEEE transaction on neural networks and learning systems Vol. 29; no. 4; pp. 1069 - 1081 | 
|---|---|
| Main Authors | , , , | 
| Format | Journal Article | 
| Language | English | 
| Published | 
        United States
          IEEE
    
        01.04.2018
     The Institute of Electrical and Electronics Engineers, Inc. (IEEE)  | 
| Subjects | |
| Online Access | Get full text | 
| ISSN | 2162-237X 2162-2388 2162-2388  | 
| DOI | 10.1109/TNNLS.2017.2654539 | 
Cover
| Abstract | Self-learning approaches, such as reinforcement learning, offer new possibilities for autonomous control of uncertain or time-varying systems. However, exploring an unknown environment under limited prediction capabilities is a challenge for a learning agent. If the environment is dangerous, free exploration can result in physical damage or in an otherwise unacceptable behavior. With respect to existing methods, the main contribution of this paper is the definition of a new approach that does not require global safety functions, nor specific formulations of the dynamics or of the environment, but relies on interval estimation of the dynamics of the agent during the exploration phase, assuming a limited capability of the agent to perceive the presence of incoming fatal states. Two algorithms are presented with this approach. The first is the Safety Handling Exploration with Risk Perception Algorithm (SHERPA), which provides safety by individuating temporary safety functions, called backups. SHERPA is shown in a simulated, simplified quadrotor task, for which dangerous states are avoided. The second algorithm, denominated OptiSHERPA, can safely handle more dynamically complex systems for which SHERPA is not sufficient through the use of safety metrics. An application of OptiSHERPA is simulated on an aircraft altitude control task. | 
    
|---|---|
| AbstractList | Self-learning approaches, such as reinforcement learning, offer new possibilities for autonomous control of uncertain or time-varying systems. However, exploring an unknown environment under limited prediction capabilities is a challenge for a learning agent. If the environment is dangerous, free exploration can result in physical damage or in an otherwise unacceptable behavior. With respect to existing methods, the main contribution of this paper is the definition of a new approach that does not require global safety functions, nor specific formulations of the dynamics or of the environment, but relies on interval estimation of the dynamics of the agent during the exploration phase, assuming a limited capability of the agent to perceive the presence of incoming fatal states. Two algorithms are presented with this approach. The first is the Safety Handling Exploration with Risk Perception Algorithm (SHERPA), which provides safety by individuating temporary safety functions, called backups. SHERPA is shown in a simulated, simplified quadrotor task, for which dangerous states are avoided. The second algorithm, denominated OptiSHERPA, can safely handle more dynamically complex systems for which SHERPA is not sufficient through the use of safety metrics. An application of OptiSHERPA is simulated on an aircraft altitude control task.Self-learning approaches, such as reinforcement learning, offer new possibilities for autonomous control of uncertain or time-varying systems. However, exploring an unknown environment under limited prediction capabilities is a challenge for a learning agent. If the environment is dangerous, free exploration can result in physical damage or in an otherwise unacceptable behavior. With respect to existing methods, the main contribution of this paper is the definition of a new approach that does not require global safety functions, nor specific formulations of the dynamics or of the environment, but relies on interval estimation of the dynamics of the agent during the exploration phase, assuming a limited capability of the agent to perceive the presence of incoming fatal states. Two algorithms are presented with this approach. The first is the Safety Handling Exploration with Risk Perception Algorithm (SHERPA), which provides safety by individuating temporary safety functions, called backups. SHERPA is shown in a simulated, simplified quadrotor task, for which dangerous states are avoided. The second algorithm, denominated OptiSHERPA, can safely handle more dynamically complex systems for which SHERPA is not sufficient through the use of safety metrics. An application of OptiSHERPA is simulated on an aircraft altitude control task. Self-learning approaches, such as reinforcement learning, offer new possibilities for autonomous control of uncertain or time-varying systems. However, exploring an unknown environment under limited prediction capabilities is a challenge for a learning agent. If the environment is dangerous, free exploration can result in physical damage or in an otherwise unacceptable behavior. With respect to existing methods, the main contribution of this paper is the definition of a new approach that does not require global safety functions, nor specific formulations of the dynamics or of the environment, but relies on interval estimation of the dynamics of the agent during the exploration phase, assuming a limited capability of the agent to perceive the presence of incoming fatal states. Two algorithms are presented with this approach. The first is the Safety Handling Exploration with Risk Perception Algorithm (SHERPA), which provides safety by individuating temporary safety functions, called backups. SHERPA is shown in a simulated, simplified quadrotor task, for which dangerous states are avoided. The second algorithm, denominated OptiSHERPA, can safely handle more dynamically complex systems for which SHERPA is not sufficient through the use of safety metrics. An application of OptiSHERPA is simulated on an aircraft altitude control task.  | 
    
| Author | Mannucci, Tommaso Chu, Qiping de Visser, Cornelis van Kampen, Erik-Jan  | 
    
| Author_xml | – sequence: 1 givenname: Tommaso orcidid: 0000-0003-1994-2965 surname: Mannucci fullname: Mannucci, Tommaso email: t.mannucci@tudelft.nl organization: Control and Simulation Division, Faculty of Aerospace Engineering, Delft University of Technology, Delft, The Netherlands – sequence: 2 givenname: Erik-Jan orcidid: 0000-0002-5593-4471 surname: van Kampen fullname: van Kampen, Erik-Jan email: e.vankampen@tudelft.nl organization: Control and Simulation Division, Faculty of Aerospace Engineering, Delft University of Technology, Delft, The Netherlands – sequence: 3 givenname: Cornelis surname: de Visser fullname: de Visser, Cornelis email: c.c.devisser@tudelft.nl organization: Control and Simulation Division, Faculty of Aerospace Engineering, Delft University of Technology, Delft, The Netherlands – sequence: 4 givenname: Qiping surname: Chu fullname: Chu, Qiping email: q.p.chu@tudelft.nl organization: Control and Simulation Division, Faculty of Aerospace Engineering, Delft University of Technology, Delft, The Netherlands  | 
    
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/28182560$$D View this record in MEDLINE/PubMed | 
    
| BookMark | eNp9kUlPwzAQhS1UxFL4AyChSFy4pHiJY-dYqrJIFUjQAzfLcSZglNjFTiX496QbBw7MZebwvafRe8do4LwDhM4IHhGCi-v54-PsZUQxESOa84yzYg8dUZLTlDIpB7-3eD1EpzF-4H5yzPOsOECHVBJJeY6P0M2LriGZfi0aH3RnvUvGzZsPtntvY1L7kDyDdf020ILrkhno4Kx7SybedcE3DYR4gvZr3UQ43e4hmt9O55P7dPZ09zAZz1LDOOlSiplhtaEVLQ2XONOZLLkpOFSskKTirCo5yynUEouK07pmwElVCcgMxRqzIbra2C6C_1xC7FRro4Gm0Q78Mioic8EzlmPRo5d_0A-_DK5_TvWBcUKJKFbUxZZali1UahFsq8O32oXTA3IDmOBjDFArY7t1SF3QtlEEq1UVal3FylmobRW9lP6R7tz_FZ1vRBYAfgVCZpTzgv0Ac2KTXQ | 
    
| CODEN | ITNNAL | 
    
| CitedBy_id | crossref_primary_10_1016_j_eswa_2024_124622 crossref_primary_10_1109_LRA_2024_3397531 crossref_primary_10_1109_TITS_2020_2989352 crossref_primary_10_1109_TCYB_2023_3283771 crossref_primary_10_1146_annurev_control_042920_020211 crossref_primary_10_1109_ACCESS_2019_2931884 crossref_primary_10_1016_j_ifacol_2021_08_562 crossref_primary_10_1109_ACCESS_2020_2973169 crossref_primary_10_1109_JSEN_2020_3029430 crossref_primary_10_1109_TNNLS_2018_2854796 crossref_primary_10_1109_TNNLS_2023_3339885 crossref_primary_10_1007_s00500_022_07484_z crossref_primary_10_1016_j_procs_2021_09_173 crossref_primary_10_1109_TNNLS_2022_3186528 crossref_primary_10_1016_j_ins_2020_06_010 crossref_primary_10_1016_j_neucom_2022_11_006 crossref_primary_10_1109_TCYB_2021_3053414 crossref_primary_10_1109_TNNLS_2020_3023711 crossref_primary_10_3390_s22030941 crossref_primary_10_1007_s10489_022_03510_7 crossref_primary_10_1007_s13369_023_08026_x crossref_primary_10_3390_info11040194 crossref_primary_10_1007_s13369_022_06746_0 crossref_primary_10_1109_TNNLS_2023_3331304 crossref_primary_10_1109_TAC_2022_3175628 crossref_primary_10_1109_JMASS_2023_3292259 crossref_primary_10_3233_JIFS_179130 crossref_primary_10_1109_TNNLS_2018_2884797 crossref_primary_10_1016_j_automatica_2024_111714 crossref_primary_10_1109_OJCSYS_2022_3209945 crossref_primary_10_1109_TIE_2022_3165288 crossref_primary_10_1016_j_jiixd_2024_01_003 crossref_primary_10_1007_s10208_025_09689_8 crossref_primary_10_1007_s13369_023_08245_2 crossref_primary_10_1109_TNNLS_2021_3107742 crossref_primary_10_1016_j_neucom_2024_128677 crossref_primary_10_1109_ACCESS_2023_3297274 crossref_primary_10_1109_TNNLS_2020_3042981 crossref_primary_10_1016_j_arcontrol_2019_09_008 crossref_primary_10_1109_TAC_2021_3049335 crossref_primary_10_1002_rnc_5132 crossref_primary_10_1109_TITS_2023_3249900 crossref_primary_10_1109_TNNLS_2023_3348422 crossref_primary_10_1088_1757_899X_1074_1_012014 crossref_primary_10_1371_journal_pone_0317662  | 
    
| Cites_doi | 10.2514/6.2006-6429 10.1109/CDC.2003.1273070 10.1007/978-3-319-13823-7_31 10.1109/IROS.2015.7354295 10.1109/TSMCA.2009.2028239 10.2200/S00268ED1V01Y201005AIM009 10.1016/0005-1098(93)90122-A 10.1023/A:1017940631555 10.1163/1568553042674662 10.1109/TNNLS.2014.2360724 10.1109/TNNLS.2014.2333092 10.1109/TNN.2004.826221 10.1613/jair.1666 10.1109/CDC.2014.7039737 10.1109/TCYB.2015.2417170 10.1023/B:MACH.0000039779.47329.3a 10.1613/jair.301 10.1109/TNN.2006.881710 10.1109/TFUZZ.2015.2418000 10.1109/TNNLS.2014.2371046 10.1177/027836498600500106 10.1109/72.914523 10.1109/TNNLS.2014.2378812 10.1016/j.automatica.2008.11.017 10.3182/20070822-3-ZA-2920.00076 10.1109/ICRA.2013.6631230 10.1007/BF01840369 10.1016/S0005-1098(98)00153-8 10.1016/j.robot.2008.10.024 10.2514/1.12597 10.1016/B978-1-55860-335-6.50021-0 10.1162/NECO_a_00600 10.1007/BF00992698  | 
    
| ContentType | Journal Article | 
    
| Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018 | 
    
| Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018 | 
    
| DBID | 97E RIA RIE AAYXX CITATION NPM 7QF 7QO 7QP 7QQ 7QR 7SC 7SE 7SP 7SR 7TA 7TB 7TK 7U5 8BQ 8FD F28 FR3 H8D JG9 JQ2 KR7 L7M L~C L~D P64 7X8  | 
    
| DOI | 10.1109/TNNLS.2017.2654539 | 
    
| DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998-Present IEEE Electronic Library (IEL) CrossRef PubMed Aluminium Industry Abstracts Biotechnology Research Abstracts Calcium & Calcified Tissue Abstracts Ceramic Abstracts Chemoreception Abstracts Computer and Information Systems Abstracts Corrosion Abstracts Electronics & Communications Abstracts Engineered Materials Abstracts Materials Business File Mechanical & Transportation Engineering Abstracts Neurosciences Abstracts Solid State and Superconductivity Abstracts METADEX Technology Research Database ANTE: Abstracts in New Technology & Engineering Engineering Research Database Aerospace Database Materials Research Database ProQuest Computer Science Collection Civil Engineering Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts  Academic Computer and Information Systems Abstracts Professional Biotechnology and BioEngineering Abstracts MEDLINE - Academic  | 
    
| DatabaseTitle | CrossRef PubMed Materials Research Database Technology Research Database Computer and Information Systems Abstracts – Academic Mechanical & Transportation Engineering Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Materials Business File Aerospace Database Engineered Materials Abstracts Biotechnology Research Abstracts Chemoreception Abstracts Advanced Technologies Database with Aerospace ANTE: Abstracts in New Technology & Engineering Civil Engineering Abstracts Aluminium Industry Abstracts Electronics & Communications Abstracts Ceramic Abstracts Neurosciences Abstracts METADEX Biotechnology and BioEngineering Abstracts Computer and Information Systems Abstracts Professional Solid State and Superconductivity Abstracts Engineering Research Database Calcium & Calcified Tissue Abstracts Corrosion Abstracts MEDLINE - Academic  | 
    
| DatabaseTitleList | MEDLINE - Academic PubMed Materials Research Database  | 
    
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: RIE name: IEEE Xplore url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher  | 
    
| DeliveryMethod | fulltext_linktorsrc | 
    
| Discipline | Computer Science | 
    
| EISSN | 2162-2388 | 
    
| EndPage | 1081 | 
    
| ExternalDocumentID | 28182560 10_1109_TNNLS_2017_2654539 7842559  | 
    
| Genre | orig-research Journal Article  | 
    
| GroupedDBID | 0R~ 4.4 5VS 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACIWK ACPRK AENEX AFRAH AGQYO AGSQL AHBIQ AKJIK AKQYR ALMA_UNASSIGNED_HOLDINGS ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ EBS EJD IFIPE IPLJI JAVBF M43 MS~ O9- OCL PQQKQ RIA RIE RNS AAYXX CITATION NPM RIG 7QF 7QO 7QP 7QQ 7QR 7SC 7SE 7SP 7SR 7TA 7TB 7TK 7U5 8BQ 8FD F28 FR3 H8D JG9 JQ2 KR7 L7M L~C L~D P64 7X8  | 
    
| ID | FETCH-LOGICAL-c351t-203c3fc2d2bc5804a48b5c95ed3981d53db5362ef807d52ff3e51dd7e4c20a03 | 
    
| IEDL.DBID | RIE | 
    
| ISSN | 2162-237X 2162-2388  | 
    
| IngestDate | Sat Sep 27 21:44:02 EDT 2025 Mon Jun 30 06:46:16 EDT 2025 Thu Jan 02 23:01:35 EST 2025 Wed Oct 01 00:44:42 EDT 2025 Thu Apr 24 23:05:23 EDT 2025 Wed Aug 27 02:52:22 EDT 2025  | 
    
| IsPeerReviewed | false | 
    
| IsScholarly | true | 
    
| Issue | 4 | 
    
| Language | English | 
    
| License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html | 
    
| LinkModel | DirectLink | 
    
| MergedId | FETCHMERGED-LOGICAL-c351t-203c3fc2d2bc5804a48b5c95ed3981d53db5362ef807d52ff3e51dd7e4c20a03 | 
    
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23  | 
    
| ORCID | 0000-0002-5593-4471 0000-0003-1994-2965  | 
    
| PMID | 28182560 | 
    
| PQID | 2015121797 | 
    
| PQPubID | 85436 | 
    
| PageCount | 13 | 
    
| ParticipantIDs | proquest_journals_2015121797 crossref_citationtrail_10_1109_TNNLS_2017_2654539 proquest_miscellaneous_1867543607 pubmed_primary_28182560 ieee_primary_7842559 crossref_primary_10_1109_TNNLS_2017_2654539  | 
    
| ProviderPackageCode | CITATION AAYXX  | 
    
| PublicationCentury | 2000 | 
    
| PublicationDate | 2018-04-01 | 
    
| PublicationDateYYYYMMDD | 2018-04-01 | 
    
| PublicationDate_xml | – month: 04 year: 2018 text: 2018-04-01 day: 01  | 
    
| PublicationDecade | 2010 | 
    
| PublicationPlace | United States | 
    
| PublicationPlace_xml | – name: United States – name: Piscataway  | 
    
| PublicationTitle | IEEE transaction on neural networks and learning systems | 
    
| PublicationTitleAbbrev | TNNLS | 
    
| PublicationTitleAlternate | IEEE Trans Neural Netw Learn Syst | 
    
| PublicationYear | 2018 | 
    
| Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE)  | 
    
| Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)  | 
    
| References | ref35 ref13 ref12 ref37 ref15 ref14 ref11 hans (ref30) 2008 ref10 moore (ref39) 1966; 4 ref2 pecka (ref31) 2015 bellman (ref9) 1957 ref1 neuneier (ref21) 1999 ref17 ref38 ref16 haussler (ref5) 1990; 2 ref18 geibel (ref36) 2001 thomas (ref20) 2015 gehring (ref29) 2013 garcía (ref19) 2015; 16 ref46 ref45 ref23 ref26 ref47 ref25 ref42 ref41 ref44 ref43 kaelbling (ref34) 1996; 4 ref28 ref27 sutton (ref4) 1998 ref8 moldovan (ref33) 2012 polo (ref32) 2011 ref3 bertsekas (ref7) 1996 ref6 geibel (ref24) 2005; 24 ref40 mihatsch (ref22) 2014; 49  | 
    
| References_xml | – ident: ref2 doi: 10.2514/6.2006-6429 – ident: ref42 doi: 10.1109/CDC.2003.1273070 – volume: 2 start-page: 1101 year: 1990 ident: ref5 article-title: Probably approximately correct learning publication-title: Proc 8th Nat Conf Artif Intell – ident: ref17 doi: 10.1007/978-3-319-13823-7_31 – year: 1996 ident: ref7 publication-title: Neuro-Dynamic Programming – ident: ref28 doi: 10.1109/IROS.2015.7354295 – ident: ref47 doi: 10.1109/TSMCA.2009.2028239 – ident: ref35 doi: 10.2200/S00268ED1V01Y201005AIM009 – volume: 16 start-page: 1437 year: 2015 ident: ref19 article-title: A comprehensive survey on safe reinforcement learning publication-title: J Mach Learn Res – ident: ref1 doi: 10.1016/0005-1098(93)90122-A – volume: 49 start-page: 267 year: 2014 ident: ref22 article-title: Risk-sensitive reinforcement learning publication-title: Mach Learn doi: 10.1023/A:1017940631555 – start-page: 143 year: 2008 ident: ref30 article-title: Safe exploration for reinforcement learning publication-title: Proc ESANN – ident: ref37 doi: 10.1163/1568553042674662 – year: 1998 ident: ref4 publication-title: Introduction to Reinforcement Learning – ident: ref11 doi: 10.1109/TNNLS.2014.2360724 – ident: ref16 doi: 10.1109/TNNLS.2014.2333092 – ident: ref14 doi: 10.1109/TNN.2004.826221 – volume: 24 start-page: 81 year: 2005 ident: ref24 article-title: Risk-sensitive reinforcement learning applied to control under constraints publication-title: J Artif Intell Res doi: 10.1613/jair.1666 – ident: ref46 doi: 10.1109/CDC.2014.7039737 – ident: ref13 doi: 10.1109/TCYB.2015.2417170 – ident: ref26 doi: 10.1023/B:MACH.0000039779.47329.3a – volume: 4 start-page: 237 year: 1996 ident: ref34 article-title: Reinforcement learning: A survey publication-title: J Artif Intell Res doi: 10.1613/jair.301 – ident: ref10 doi: 10.1109/TNN.2006.881710 – ident: ref12 doi: 10.1109/TFUZZ.2015.2418000 – start-page: 188 year: 2012 ident: ref33 article-title: Safe exploration in Markov decision processes publication-title: Proc 29th Int Conf Mach Learn – ident: ref6 doi: 10.1109/TNNLS.2014.2371046 – start-page: 2380 year: 2015 ident: ref20 article-title: High confidence policy improvement publication-title: Proc Int Conf Mach Learn (ICML) – start-page: 85 year: 2015 ident: ref31 article-title: Safe exploration for reinforcement learning' in real unstructured environments publication-title: Proc Comput Vis Winter Workshop – start-page: 1037 year: 2013 ident: ref29 article-title: Smart exploration in reinforcement learning using absolute temporal difference errors publication-title: Proc Int Conf Auton Agents and Multi Agent Syst – ident: ref40 doi: 10.1177/027836498600500106 – ident: ref8 doi: 10.1109/72.914523 – year: 1957 ident: ref9 publication-title: Dynamic Programming – ident: ref15 doi: 10.1109/TNNLS.2014.2378812 – ident: ref44 doi: 10.1016/j.automatica.2008.11.017 – start-page: 1031 year: 1999 ident: ref21 article-title: Risk sensitive reinforcement learning publication-title: Proc Adv Neural Inf Process Syst – start-page: 76 year: 2011 ident: ref32 article-title: Safe reinforcement learning in high-risk tasks through policy improvement publication-title: Proc IEEE Symp Adapt Dynamic Program Reinforcement Learn (ADPRL) – ident: ref45 doi: 10.3182/20070822-3-ZA-2920.00076 – ident: ref41 doi: 10.1109/ICRA.2013.6631230 – ident: ref43 doi: 10.1007/BF01840369 – volume: 4 year: 1966 ident: ref39 publication-title: Interval Analysis – start-page: 162 year: 2001 ident: ref36 article-title: Reinforcement learning with bounded risk publication-title: Proc ICML – ident: ref25 doi: 10.1016/S0005-1098(98)00153-8 – ident: ref27 doi: 10.1016/j.robot.2008.10.024 – ident: ref3 doi: 10.2514/1.12597 – ident: ref18 doi: 10.1016/B978-1-55860-335-6.50021-0 – ident: ref23 doi: 10.1162/NECO_a_00600 – ident: ref38 doi: 10.1007/BF00992698  | 
    
| SSID | ssj0000605649 | 
    
| Score | 2.5670557 | 
    
| Snippet | Self-learning approaches, such as reinforcement learning, offer new possibilities for autonomous control of uncertain or time-varying systems. However,... | 
    
| SourceID | proquest pubmed crossref ieee  | 
    
| SourceType | Aggregation Database Index Database Enrichment Source Publisher  | 
    
| StartPage | 1069 | 
    
| SubjectTerms | Adaptation models Adaptive controllers Aerodynamics Aircraft Aircraft control Algorithms Altitude control Backups Complex systems Computer simulation Exploration Formulations Heuristic algorithms Learning Learning (artificial intelligence) Machine learning Measurement model-free control Reinforcement reinforcement learning (RL) Risk perception safe exploration Safety  | 
    
| Title | Safe Exploration Algorithms for Reinforcement Learning Controllers | 
    
| URI | https://ieeexplore.ieee.org/document/7842559 https://www.ncbi.nlm.nih.gov/pubmed/28182560 https://www.proquest.com/docview/2015121797 https://www.proquest.com/docview/1867543607  | 
    
| Volume | 29 | 
    
| hasFullText | 1 | 
    
| inHoldings | 1 | 
    
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVIEE databaseName: IEEE Xplore customDbUrl: eissn: 2162-2388 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0000605649 issn: 2162-237X databaseCode: RIE dateStart: 20120101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE  | 
    
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT9wwEB4BJy7QAoUtFBmpt5LFsWM7OVJUhBDsoSzS3iLHHgNiu1vtZi_8-trOQ2pFEbdI8SPxzGS-2DPfAHw1FIuKMZekUrgkY8omWlOdZDpPXWG0kSrkO9-O5NV9dj0RkzU47XNhEDEGn-EwXMazfDs3q7BVdqbCmZEo1mFd5bLJ1er3U6jH5TKiXZZKljCuJl2ODC3OxqPRzV0I5FJDJj1o4IEtNBAhBY__l0uKNVb-Dzej27nchtvugZtok-fhqq6G5uUfLsf3vtEH2GrxJzlvFOYjrOFsB7a72g6kNfVd-H6nHZImQi8Kj5xPH-aLp_rx15J4oEt-YuRcNXF7kbQ0rQ_kogl9n3pUuQfjyx_ji6ukrbeQGC7S2hsMN9wZZlllRE4zneWVMIVAywsPawW3lfD-Dl1OlRXMOY4itVZhZhjVlH-Cjdl8hgdAclYpS61U2qiMaq5RCIlCG_Qj5k4PIO1WvDQtF3koiTEt4z8JLcoosDIIrGwFNoBvfZ_fDRPHm613w2r3LduFHsBRJ9iyNdZl6Odhj_8yqQGc9Le9mYWzEz3D-WpZBt4_kXFJfZv9RiH6sTs9-vz6nIew6WfIm3CfI9ioFyv84pFMXR1HFf4Dyq7tKg | 
    
| linkProvider | IEEE | 
    
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Lb9QwEB6VcoALBQrtQgEjcYNsHT-TY6moFtjdA12kvUWOPS6oyy7qZi_8emznIYEAcYsUPxLPTOaLPfMNwCtLsawZ81mupM8E0y4zhppMmCL3pTVW6ZjvPJuryWfxYSmXe_BmyIVBxBR8huN4mc7y3cbu4lbZqY5nRrK8BbelEEK22VrDjgoNyFwlvMtyxTLG9bLPkqHl6WI-n17GUC49ZirABh75QiMVUvT5vzilVGXl74AzOZ6LA5j1j9zGm1yPd009tj9-Y3P833e6D_c6BErOWpV5AHu4fggHfXUH0hn7Iby9NB5JG6OXxEfOVlebm6_Nl29bEqAu-YSJddWmDUbSEbVekfM2-H0VcOUjWFy8W5xPsq7iQma5zJtgMtxyb5ljtZUFFUYUtbSlRMfLAGwld7UMHg99QbWTzHuOMndOo7CMGsofw_56s8ZjIAWrtaNOaWO1oIYblFKhNBbDiIU3I8j7Fa9sx0Yei2KsqvRXQssqCayKAqs6gY3g9dDne8vF8c_Wh3G1h5bdQo_gpBds1ZnrNvYLwCd8m_QIXg63g6HF0xOzxs1uW0XmPym4oqHNUasQw9i9Hj3585wv4M5kMZtW0_fzj0_hbpitaIN_TmC_udnhs4Brmvp5UuefXXvwdw | 
    
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Safe+Exploration+Algorithms+for+Reinforcement+Learning+Controllers&rft.jtitle=IEEE+transaction+on+neural+networks+and+learning+systems&rft.au=Mannucci%2C+Tommaso&rft.au=van+Kampen%2C+Erik-Jan&rft.au=de+Visser%2C+Cornelis&rft.au=Chu%2C+Qiping&rft.date=2018-04-01&rft.pub=IEEE&rft.issn=2162-237X&rft.volume=29&rft.issue=4&rft.spage=1069&rft.epage=1081&rft_id=info:doi/10.1109%2FTNNLS.2017.2654539&rft_id=info%3Apmid%2F28182560&rft.externalDocID=7842559 | 
    
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2162-237X&client=summon | 
    
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2162-237X&client=summon | 
    
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2162-237X&client=summon |