Self-organization of action hierarchy and compositionality by reinforcement learning with recurrent neural networks

Recurrent neural networks (RNNs) for reinforcement learning (RL) have shown distinct advantages, e.g., solving memory-dependent tasks and meta-learning. However, little effort has been spent on improving RNN architectures and on understanding the underlying neural mechanisms for performance gain. In...

Full description

Saved in:
Bibliographic Details
Published inNeural networks Vol. 129; pp. 149 - 162
Main Authors Han, Dongqi, Doya, Kenji, Tani, Jun
Format Journal Article
LanguageEnglish
Published Elsevier Ltd 01.09.2020
Subjects
Online AccessGet full text
ISSN0893-6080
1879-2782
1879-2782
DOI10.1016/j.neunet.2020.06.002

Cover

Abstract Recurrent neural networks (RNNs) for reinforcement learning (RL) have shown distinct advantages, e.g., solving memory-dependent tasks and meta-learning. However, little effort has been spent on improving RNN architectures and on understanding the underlying neural mechanisms for performance gain. In this paper, we propose a novel, multiple-timescale, stochastic RNN for RL. Empirical results show that the network can autonomously learn to abstract sub-goals and can self-develop an action hierarchy using internal dynamics in a challenging continuous control task. Furthermore, we show that the self-developed compositionality of the network enhances faster re-learning when adapting to a new task that is a re-composition of previously learned sub-goals, than when starting from scratch. We also found that improved performance can be achieved when neural activities are subject to stochastic rather than deterministic dynamics.
AbstractList Recurrent neural networks (RNNs) for reinforcement learning (RL) have shown distinct advantages, e.g., solving memory-dependent tasks and meta-learning. However, little effort has been spent on improving RNN architectures and on understanding the underlying neural mechanisms for performance gain. In this paper, we propose a novel, multiple-timescale, stochastic RNN for RL. Empirical results show that the network can autonomously learn to abstract sub-goals and can self-develop an action hierarchy using internal dynamics in a challenging continuous control task. Furthermore, we show that the self-developed compositionality of the network enhances faster re-learning when adapting to a new task that is a re-composition of previously learned sub-goals, than when starting from scratch. We also found that improved performance can be achieved when neural activities are subject to stochastic rather than deterministic dynamics.Recurrent neural networks (RNNs) for reinforcement learning (RL) have shown distinct advantages, e.g., solving memory-dependent tasks and meta-learning. However, little effort has been spent on improving RNN architectures and on understanding the underlying neural mechanisms for performance gain. In this paper, we propose a novel, multiple-timescale, stochastic RNN for RL. Empirical results show that the network can autonomously learn to abstract sub-goals and can self-develop an action hierarchy using internal dynamics in a challenging continuous control task. Furthermore, we show that the self-developed compositionality of the network enhances faster re-learning when adapting to a new task that is a re-composition of previously learned sub-goals, than when starting from scratch. We also found that improved performance can be achieved when neural activities are subject to stochastic rather than deterministic dynamics.
Recurrent neural networks (RNNs) for reinforcement learning (RL) have shown distinct advantages, e.g., solving memory-dependent tasks and meta-learning. However, little effort has been spent on improving RNN architectures and on understanding the underlying neural mechanisms for performance gain. In this paper, we propose a novel, multiple-timescale, stochastic RNN for RL. Empirical results show that the network can autonomously learn to abstract sub-goals and can self-develop an action hierarchy using internal dynamics in a challenging continuous control task. Furthermore, we show that the self-developed compositionality of the network enhances faster re-learning when adapting to a new task that is a re-composition of previously learned sub-goals, than when starting from scratch. We also found that improved performance can be achieved when neural activities are subject to stochastic rather than deterministic dynamics.
Author Doya, Kenji
Han, Dongqi
Tani, Jun
Author_xml – sequence: 1
  givenname: Dongqi
  orcidid: 0000-0002-6872-7121
  surname: Han
  fullname: Han, Dongqi
  organization: Cognitive Neurorobotics Research Unit, Okinawa Institute of Science and Technology, Okinawa, Japan
– sequence: 2
  givenname: Kenji
  surname: Doya
  fullname: Doya, Kenji
  organization: Neural Computation Unit, Okinawa Institute of Science and Technology, Okinawa, Japan
– sequence: 3
  givenname: Jun
  surname: Tani
  fullname: Tani, Jun
  email: jun.tani@oist.jp
  organization: Cognitive Neurorobotics Research Unit, Okinawa Institute of Science and Technology, Okinawa, Japan
BookMark eNqNkU1v1DAQhi1UJLal_4BDjlwSxk7WSTggoYovqRIH2rM1cSZdL157sR1W4dfX23DiAJxmNO-8M6NnLtmF844Ye8Wh4sDlm33laHaUKgECKpAVgHjGNrxr-1K0nbhgG-j6upTQwQt2GeMeAGTX1BsWv5GdSh8e0JlfmIx3hZ8K1E_ZzlDAoHdLgW4stD8cfTRnBa1JSzEsRSDjJh80HcilwhIGZ9xDcTJplzU9h3Cu5-sC2hzSyYfv8SV7PqGNdP07XrH7jx_ubj6Xt18_fbl5f1vqRvSplASAwyip5QL4NEjqR9jKaTtq0dR92w-i1nU7cBpQTpw3QD3wLdbjICessb5i23Xu7I64nNBadQzmgGFRHNSZnNqrlZw6k1MgVSaXfa9X3zH4HzPFpA4marIWHfk5KtFw0Xct8Ca3vl1bdfAxBpqUNukJYwpo7L_2NH-Y__O8d6uNMruf-UUqakNO02gy8qRGb_4-4BHMC7Ii
CitedBy_id crossref_primary_10_1016_j_neunet_2022_02_026
crossref_primary_10_1007_s12559_022_10080_w
crossref_primary_10_3389_fncom_2022_892354
crossref_primary_10_1051_e3sconf_202127001036
crossref_primary_10_3389_fpsyt_2022_1008011
crossref_primary_10_1016_j_neunet_2022_02_024
crossref_primary_10_1155_2021_7607623
Cites_doi 10.1038/nature16961
10.1038/nature14236
10.1123/mcj.8.2.188
10.1016/0022-247X(65)90154-X
10.1038/nature24270
10.1038/s41598-017-15835-2
10.1038/nature23020
10.1371/journal.pbio.0040179
10.1023/A:1009778005914
10.1371/journal.pcbi.1004640
10.1037/0033-295X.108.1.57
10.1016/j.neuron.2012.03.016
10.1126/science.1239073
10.1523/JNEUROSCI.14-11-06924.1994
10.1038/s41593-018-0147-8
10.1038/nn.3862
10.1126/science.aau6249
10.1523/JNEUROSCI.13-01-00334.1993
10.1016/S0004-3702(99)00052-1
10.1016/j.neuron.2008.09.021
10.1016/j.neuron.2010.03.025
10.1103/PhysRev.36.823
10.1016/j.neuron.2015.09.008
10.1016/j.neuron.2016.09.038
10.1162/089976600300015961
10.1093/cercor/bhr117
10.1371/journal.pcbi.1000220
10.1613/jair.639
ContentType Journal Article
Copyright 2020 The Authors
Copyright © 2020 The Authors. Published by Elsevier Ltd.. All rights reserved.
Copyright_xml – notice: 2020 The Authors
– notice: Copyright © 2020 The Authors. Published by Elsevier Ltd.. All rights reserved.
DBID 6I.
AAFTH
AAYXX
CITATION
7X8
ADTOC
UNPAY
DOI 10.1016/j.neunet.2020.06.002
DatabaseName ScienceDirect Open Access Titles
Elsevier:ScienceDirect:Open Access
CrossRef
MEDLINE - Academic
Unpaywall for CDI: Periodical Content
Unpaywall
DatabaseTitle CrossRef
MEDLINE - Academic
DatabaseTitleList MEDLINE - Academic

Database_xml – sequence: 1
  dbid: UNPAY
  name: Unpaywall
  url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1879-2782
EndPage 162
ExternalDocumentID 10.1016/j.neunet.2020.06.002
10_1016_j_neunet_2020_06_002
S0893608020302070
GroupedDBID ---
--K
--M
-~X
.DC
.~1
0R~
123
186
1B1
1RT
1~.
1~5
29N
4.4
457
4G.
53G
5RE
5VS
6I.
6TJ
7-5
71M
8P~
9JM
9JN
AABNK
AACTN
AADPK
AAEDT
AAEDW
AAFTH
AAIAV
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AAQXK
AAXLA
AAXUO
AAYFN
ABAOU
ABBOA
ABCQJ
ABEFU
ABFNM
ABFRF
ABHFT
ABIVO
ABJNI
ABLJU
ABMAC
ABXDB
ABYKQ
ACAZW
ACDAQ
ACGFO
ACGFS
ACIUM
ACNNM
ACRLP
ACZNC
ADBBV
ADEZE
ADGUI
ADJOM
ADMUD
ADRHT
AEBSH
AECPX
AEFWE
AEKER
AENEX
AFKWA
AFTJW
AFXIZ
AGHFR
AGUBO
AGWIK
AGYEJ
AHHHB
AHJVU
AHZHX
AIALX
AIEXJ
AIKHN
AITUG
AJBFU
AJOXV
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
AOUOD
ARUGR
ASPBG
AVWKF
AXJTR
AZFZN
BJAXD
BKOJK
BLXMC
CS3
DU5
EBS
EFJIC
EFLBG
EJD
EO8
EO9
EP2
EP3
F0J
F5P
FDB
FEDTE
FGOYB
FIRID
FNPLU
FYGXN
G-2
G-Q
G8K
GBLVA
GBOLZ
HLZ
HMQ
HVGLF
HZ~
IHE
J1W
JJJVA
K-O
KOM
KZ1
LG9
LMP
M2V
M41
MHUIS
MO0
MOBAO
MVM
N9A
O-L
O9-
OAUVE
OZT
P-8
P-9
P2P
PC.
Q38
R2-
RIG
ROL
RPZ
SBC
SCC
SDF
SDG
SDP
SES
SEW
SNS
SPC
SPCBC
SSN
SST
SSV
SSW
SSZ
T5K
TAE
UAP
UNMZH
VOH
WUQ
XPP
ZMT
~G-
AATTM
AAXKI
AAYWO
AAYXX
ABDPE
ABWVN
ACLOT
ACRPL
ACVFH
ADCNI
ADNMO
AEIPS
AEUPX
AFJKZ
AFPUW
AGQPQ
AIGII
AIIUN
AKBMS
AKRWK
AKYEP
ANKPU
APXCP
CITATION
EFKBS
~HD
7X8
ADTOC
AGCQF
UNPAY
ID FETCH-LOGICAL-c429t-6e00abd6e71201fb6e9d056f5dc243979b23c37b1eba6f1140e9015a3db6fa3a3
IEDL.DBID .~1
ISSN 0893-6080
1879-2782
IngestDate Tue Aug 19 21:38:49 EDT 2025
Mon Sep 29 05:37:07 EDT 2025
Wed Oct 01 02:08:00 EDT 2025
Thu Apr 24 23:07:47 EDT 2025
Fri Feb 23 02:46:21 EST 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Keywords Multiple timescale
Recurrent neural network
Partially observable Markov decision process
Compositionality
Reinforcement learning
Language English
License This is an open access article under the CC BY-NC-ND license.
cc-by-nc-nd
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c429t-6e00abd6e71201fb6e9d056f5dc243979b23c37b1eba6f1140e9015a3db6fa3a3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ORCID 0000-0002-6872-7121
OpenAccessLink https://www.sciencedirect.com/science/article/pii/S0893608020302070
PQID 2412987014
PQPubID 23479
PageCount 14
ParticipantIDs unpaywall_primary_10_1016_j_neunet_2020_06_002
proquest_miscellaneous_2412987014
crossref_citationtrail_10_1016_j_neunet_2020_06_002
crossref_primary_10_1016_j_neunet_2020_06_002
elsevier_sciencedirect_doi_10_1016_j_neunet_2020_06_002
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2020-09-01
PublicationDateYYYYMMDD 2020-09-01
PublicationDate_xml – month: 09
  year: 2020
  text: 2020-09-01
  day: 01
PublicationDecade 2020
PublicationTitle Neural networks
PublicationYear 2020
Publisher Elsevier Ltd
Publisher_xml – name: Elsevier Ltd
References Sutton, Precup, Singh (b58) 1999; 112
Deisenroth, M., & Rasmussen, C. E. (2011). PILCO: A model-based and data-efficient approach to policy search. In
Friedman (b26) 1997; 1
Thrun, Mitchell (b61) 1994
Silver, Yang, Li (b54) 2013
Morandell, Huber (b41) 2017; 7
Brunskill, E., & Li, L. (2014). PAC-inspired option discovery in lifelong reinforcement learning. In
Frans, K., Ho, J., Chen, X., Abbeel, P., & Schulman, J. (2018). Meta learning shared hierarchies. In
Badre, Frank (b6) 2011; 22
Degris, T., White, M., & Sutton, R. S. Off-policy actor-critic. In
Uhlenbeck, Ornstein (b63) 1930; 36
Åström (b4) 1965; 10
(pp. 1928–1937).
(pp. 179–186).
Abel, D., Jinnai, Y., Guo, S. Y., Konidaris, G., & Littman, M. (2018). Policy and value transfer in lifelong reinforcement learning. In
Lillicrap, Hunt, Pritzel, Heess, Erez, Tassa (b37) 2015
Bellman (b10) 1957
(pp. 1856–1865).
Schmidhuber (b50) 1991
Al-Shedivat, M., Bansal, T., Burda, Y., Sutskever, I., Mordatch, I., & Abbeel, P. (2018). Continuous adaptation via meta-learning in nonstationary and competitive environments. In
(pp. 465–472).
Utsunomiya, Shibata (b64) 2008
Fox, Krishnan, Stoica, Goldberg (b23) 2017
Kapturowski, S., Ostrovski, G., Dabney, W., Quan, J., & Munos, R. (2018). Recurrentexperience replay in distributed reinforcement learning. In
.
Wierstra, Foerster, Peters, Schmidhuber (b69) 2007
Softky, Koch (b56) 1993; 13
Wang, Z., Bapst, V., Heess, N., Mnih, V., Munos, R., & Kavukcuoglu, K., et al. (2017). Sample efficient actor-critic with experience replay. In
Finn, Xu, Levine (b20) 2018
Haarnoja, T., Zhou, A., Abbeel, P., & Levine, S. (2018). soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In
Kingma, Welling (b36) 2013
Silver, Huang, Maddison, Guez, Sifre, Van Den Driessche (b52) 2016; 529
Tanaka, Doya, Okada, Ueda, Okamoto, Yamawaki (b59) 2016
Murray, Bernacchia, Freedman, Romo, Wallis, Cai (b42) 2014; 17
Ha, Schmidhuber (b27) 2018
Huys, Daffertshofer, Beek (b32) 2004; 8
Dietterich (b16) 2000; 13
Tessler, Givony, Zahavy, Mankowitz, Mannor (b60) 2017
Florensa, C., Duan, Y., & Abbeel, P. (2017). Stochastic neural networks for hierarchical reinforcement learning. In
Beck, Ma, Kiani, Hanks, Churchland, Roitman (b8) 2008; 60
Doya (b17) 2000; 12
Jaderberg, Czarnecki, Dunning, Marris, Lever, Castaneda (b33) 2019; 364
Beck, Ma, Pitkow, Latham, Pouget (b9) 2012; 74
Kaiser, Babaeizadeh, Milos, Osinski, Campbell, Czechowski (b34) 2019
Riemer, Liu, Tesauro (b46) 2018
(pp. 842–1850).
Fraccaro, Sønderby, Paquet, Winther (b24) 2016
Yoon, Kim, Dia, Kim, Bengio, Ahn (b71) 2018
Yamashita, Tani (b70) 2008; 4
Hausknecht, Stone (b30) 2015
Chung, Kastner, Dinh, Goel, Courville, Bengio (b13) 2015
Smith, Ghazizadeh, Shadmehr (b55) 2006; 4
(pp. 316–324).
Vezhnevets, Osindero, Schaul, Heess, Jaderberg, Silver (b65) 2017
Ahmadi, Tani (b2) 2019
Silver, Schrittwieser, Simonyan, Antonoglou, Huang, Guez (b53) 2017; 550
(pp. 1126–1135).
Mankowitz, Mann, Mannor (b38) 2016
Santoro, A., Bartunov, S., Botvinick, M., Wierstra, D., & Lillicrap, T. (2016). Meta-learning with memory-augmented neural networks. In
Sutton, Barto (b57) 1998
Zhang, McCarthy, Finn, Levine, Abbeel (b72) 2016
Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T., & Harley, T., et al. (2016). Asynchronous methods for deep reinforcement learning. In
Hartmann, Lazar, Nessler, Triesch (b29) 2015; 11
Heess, Hunt, Lillicrap, Silver (b31) 2015
Wang, Kurth-Nelson, Kumaran, Tirumala, Soyer, Leibo (b68) 2018; 21
Rusu, Rabinowitz, Desjardins, Soyer, Kirkpatrick, Kavukcuoglu (b48) 2016
Finn, C., Abbeel, P., & Levine, S. (2017). Model-agnostic meta-learning for fast adaptation of deep networks. In
Ramirez, Liu, Lin, Suh, Pignatelli, Redondo (b45) 2013; 341
Bacon, Harb, Precup (b5) 2017
Newell, Liu, Mayer-Kress (b43) 2001; 108
Thrun, Pratt (b62) 1998
Orbán, Berkes, Fiser, Lengyel (b44) 2016; 92
(pp. 20–29).
Vu, Mazurek, Kuo (b66) 1994; 14
Badre, Kayser, D’Esposito (b7) 2010; 66
Enomoto, Matsumoto, Nakai, Satoh, Sato, Ueda (b18) 2011
Runyan, Piasini, Panzeri, Harvey (b47) 2017; 548
Chaudhuri, Knoblauch, Gariel, Kennedy, Wang (b12) 2015; 88
Mnih, Kavukcuoglu, Silver, Rusu, Veness, Bellemare (b40) 2015; 518
Shibata, Sakashita (b51) 2015
Fortunato, M., Azar, M. G., Piot, B., Menick, J., Osband, I., & Graves, A., et al. (2018). Noisy networks for exploration. In
Runyan (10.1016/j.neunet.2020.06.002_b47) 2017; 548
Mankowitz (10.1016/j.neunet.2020.06.002_b38) 2016
Vu (10.1016/j.neunet.2020.06.002_b66) 1994; 14
Rusu (10.1016/j.neunet.2020.06.002_b48) 2016
Thrun (10.1016/j.neunet.2020.06.002_b62) 1998
Sutton (10.1016/j.neunet.2020.06.002_b57) 1998
Schmidhuber (10.1016/j.neunet.2020.06.002_b50) 1991
Beck (10.1016/j.neunet.2020.06.002_b8) 2008; 60
Fraccaro (10.1016/j.neunet.2020.06.002_b24) 2016
Ramirez (10.1016/j.neunet.2020.06.002_b45) 2013; 341
Vezhnevets (10.1016/j.neunet.2020.06.002_b65) 2017
Finn (10.1016/j.neunet.2020.06.002_b20) 2018
Hausknecht (10.1016/j.neunet.2020.06.002_b30) 2015
10.1016/j.neunet.2020.06.002_b49
Bellman (10.1016/j.neunet.2020.06.002_b10) 1957
Orbán (10.1016/j.neunet.2020.06.002_b44) 2016; 92
Hartmann (10.1016/j.neunet.2020.06.002_b29) 2015; 11
Tanaka (10.1016/j.neunet.2020.06.002_b59) 2016
Lillicrap (10.1016/j.neunet.2020.06.002_b37) 2015
Utsunomiya (10.1016/j.neunet.2020.06.002_b64) 2008
Thrun (10.1016/j.neunet.2020.06.002_b61) 1994
Fox (10.1016/j.neunet.2020.06.002_b23) 2017
Wierstra (10.1016/j.neunet.2020.06.002_b69) 2007
Zhang (10.1016/j.neunet.2020.06.002_b72) 2016
10.1016/j.neunet.2020.06.002_b11
10.1016/j.neunet.2020.06.002_b14
Jaderberg (10.1016/j.neunet.2020.06.002_b33) 2019; 364
10.1016/j.neunet.2020.06.002_b15
Bacon (10.1016/j.neunet.2020.06.002_b5) 2017
10.1016/j.neunet.2020.06.002_b19
Softky (10.1016/j.neunet.2020.06.002_b56) 1993; 13
Morandell (10.1016/j.neunet.2020.06.002_b41) 2017; 7
Ahmadi (10.1016/j.neunet.2020.06.002_b2) 2019
Silver (10.1016/j.neunet.2020.06.002_b52) 2016; 529
Doya (10.1016/j.neunet.2020.06.002_b17) 2000; 12
Åström (10.1016/j.neunet.2020.06.002_b4) 1965; 10
10.1016/j.neunet.2020.06.002_b21
10.1016/j.neunet.2020.06.002_b22
10.1016/j.neunet.2020.06.002_b67
Wang (10.1016/j.neunet.2020.06.002_b68) 2018; 21
Mnih (10.1016/j.neunet.2020.06.002_b40) 2015; 518
10.1016/j.neunet.2020.06.002_b25
Smith (10.1016/j.neunet.2020.06.002_b55) 2006; 4
10.1016/j.neunet.2020.06.002_b28
Enomoto (10.1016/j.neunet.2020.06.002_b18) 2011
Badre (10.1016/j.neunet.2020.06.002_b6) 2011; 22
Uhlenbeck (10.1016/j.neunet.2020.06.002_b63) 1930; 36
Riemer (10.1016/j.neunet.2020.06.002_b46) 2018
Dietterich (10.1016/j.neunet.2020.06.002_b16) 2000; 13
Chung (10.1016/j.neunet.2020.06.002_b13) 2015
Kaiser (10.1016/j.neunet.2020.06.002_b34) 2019
Friedman (10.1016/j.neunet.2020.06.002_b26) 1997; 1
Sutton (10.1016/j.neunet.2020.06.002_b58) 1999; 112
Silver (10.1016/j.neunet.2020.06.002_b53) 2017; 550
Newell (10.1016/j.neunet.2020.06.002_b43) 2001; 108
Yoon (10.1016/j.neunet.2020.06.002_b71) 2018
Beck (10.1016/j.neunet.2020.06.002_b9) 2012; 74
10.1016/j.neunet.2020.06.002_b3
Chaudhuri (10.1016/j.neunet.2020.06.002_b12) 2015; 88
Huys (10.1016/j.neunet.2020.06.002_b32) 2004; 8
10.1016/j.neunet.2020.06.002_b1
Heess (10.1016/j.neunet.2020.06.002_b31) 2015
Murray (10.1016/j.neunet.2020.06.002_b42) 2014; 17
Badre (10.1016/j.neunet.2020.06.002_b7) 2010; 66
Tessler (10.1016/j.neunet.2020.06.002_b60) 2017
10.1016/j.neunet.2020.06.002_b35
Silver (10.1016/j.neunet.2020.06.002_b54) 2013
10.1016/j.neunet.2020.06.002_b39
Yamashita (10.1016/j.neunet.2020.06.002_b70) 2008; 4
Kingma (10.1016/j.neunet.2020.06.002_b36) 2013
Shibata (10.1016/j.neunet.2020.06.002_b51) 2015
Ha (10.1016/j.neunet.2020.06.002_b27) 2018
References_xml – start-page: 697
  year: 2007
  end-page: 706
  ident: b69
  article-title: Solving deep memory POMDPs with recurrent policy gradients
  publication-title: International conference on artificial neural networks
– year: 2019
  ident: b34
  article-title: Model-based reinforcement learning for Atari
– reference: Deisenroth, M., & Rasmussen, C. E. (2011). PILCO: A model-based and data-efficient approach to policy search. In
– start-page: 520
  year: 2016
  end-page: 527
  ident: b72
  article-title: Learning deep neural network policies with continuous memory states
  publication-title: 2016 IEEE international conference on robotics and automation (ICRA)
– reference: Al-Shedivat, M., Bansal, T., Burda, Y., Sutskever, I., Mordatch, I., & Abbeel, P. (2018). Continuous adaptation via meta-learning in nonstationary and competitive environments. In
– start-page: 9516
  year: 2018
  end-page: 9527
  ident: b20
  article-title: Probabilistic model-agnostic meta-learning
  publication-title: Advances in neural information processing systems
– start-page: 2980
  year: 2015
  end-page: 2988
  ident: b13
  article-title: A recurrent latent variable model for sequential data
  publication-title: Advances in neural information processing systems
– reference: Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T., & Harley, T., et al. (2016). Asynchronous methods for deep reinforcement learning. In
– volume: 66
  start-page: 315
  year: 2010
  end-page: 326
  ident: b7
  article-title: Frontal cortex and the discovery of abstract action rules
  publication-title: Neuron
– volume: 4
  year: 2006
  ident: b55
  article-title: Interacting adaptive processes with different timescales underlie short-term motor learning
  publication-title: PLoS Biology
– start-page: 10424
  year: 2018
  end-page: 10434
  ident: b46
  article-title: Learning abstract options
  publication-title: Advances in neural information processing systems
– reference: (pp. 316–324).
– volume: 518
  start-page: 529
  year: 2015
  ident: b40
  article-title: Human-level control through deep reinforcement learning
  publication-title: Nature
– reference: (pp. 1126–1135).
– start-page: 05
  year: 2013
  ident: b54
  article-title: Lifelong machine learning systems: Beyond learning algorithms.
  publication-title: AAAI spring symposium: lifelong machine learning, Vol. 13
– year: 2017
  ident: b5
  article-title: The option-critic architecture
  publication-title: Thirty-First AAAI conference on artificial intelligence
– year: 1994
  ident: b61
  article-title: Learning one more thing
– start-page: 7332
  year: 2018
  end-page: 7342
  ident: b71
  article-title: Bayesian model-agnostic meta-learning
  publication-title: Advances in neural information processing systems
– start-page: 2450
  year: 2018
  end-page: 2462
  ident: b27
  article-title: Recurrent world models facilitate policy evolution
  publication-title: Advances in neural information processing systems
– volume: 108
  start-page: 57
  year: 2001
  ident: b43
  article-title: Time scales in motor learning and development
  publication-title: Psychological Review
– reference: (pp. 842–1850).
– reference: Florensa, C., Duan, Y., & Abbeel, P. (2017). Stochastic neural networks for hierarchical reinforcement learning. In
– reference: Haarnoja, T., Zhou, A., Abbeel, P., & Levine, S. (2018). soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In
– volume: 8
  start-page: 188
  year: 2004
  end-page: 212
  ident: b32
  article-title: Multiple time scales and multiform dynamics in learning to juggle
  publication-title: Motor Control
– start-page: 3
  year: 1998
  end-page: 17
  ident: b62
  article-title: Learning to learn: Introduction and overview
  publication-title: Learning to learn
– year: 2017
  ident: b23
  article-title: Multi-level discovery of deep options
– year: 2017
  ident: b60
  article-title: A deep hierarchical approach to lifelong learning in minecraft
  publication-title: Thirty-First AAAI conference on artificial intelligence
– volume: 17
  start-page: 1661
  year: 2014
  ident: b42
  article-title: A hierarchy of intrinsic timescales across primate cortex
  publication-title: Nature Neuroscience
– volume: 112
  start-page: 181
  year: 1999
  end-page: 211
  ident: b58
  article-title: Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning
  publication-title: Artificial Intelligence
– volume: 22
  start-page: 527
  year: 2011
  end-page: 536
  ident: b6
  article-title: Mechanisms of hierarchical reinforcement learning in cortico–striatal circuits 2: Evidence from fmri
  publication-title: Cerebral Cortex
– volume: 550
  start-page: 354
  year: 2017
  ident: b53
  article-title: Mastering the game of go without human knowledge
  publication-title: Nature
– start-page: 201014457
  year: 2011
  ident: b18
  article-title: Dopamine neurons learn to encode the long-term value of multiple future rewards
  publication-title: Proceedings of the National Academy of Sciences
– volume: 11
  year: 2015
  ident: b29
  article-title: Where’s the noise? Key features of spontaneous activity and neural variability arise through learning in a deterministic network
  publication-title: PLoS Computational Biology
– start-page: 1588
  year: 2016
  end-page: 1596
  ident: b38
  article-title: Adaptive skills adaptive partitions
  publication-title: Advances in neural information processing systems
– start-page: 679
  year: 1957
  end-page: 684
  ident: b10
  article-title: A Markovian decision process
  publication-title: Journal of Mathematics and Mechanics
– volume: 12
  start-page: 219
  year: 2000
  end-page: 245
  ident: b17
  article-title: Reinforcement learning in continuous time and space
  publication-title: Neural Computation
– volume: 10
  start-page: 174
  year: 1965
  end-page: 205
  ident: b4
  article-title: Optimal control of Markov processes with incomplete state information
  publication-title: Journal of Mathematical Analysis and Applications
– year: 2015
  ident: b37
  article-title: Continuous control with deep reinforcement learning
– year: 2016
  ident: b48
  article-title: Progressive neural networks
– volume: 14
  start-page: 6924
  year: 1994
  end-page: 6934
  ident: b66
  article-title: Identification of a forebrain motor programming network for the learned song of zebra finches
  publication-title: Journal of Neuroscience
– start-page: 1
  year: 2019
  end-page: 50
  ident: b2
  article-title: A novel predictive-coding-inspired variational RNN model for online prediction and recognition
  publication-title: Neural Computation
– reference: (pp. 179–186).
– reference: (pp. 20–29).
– volume: 36
  start-page: 823
  year: 1930
  ident: b63
  article-title: On the theory of the brownian motion
  publication-title: Physical Review
– volume: 341
  start-page: 387
  year: 2013
  end-page: 391
  ident: b45
  article-title: Creating a false memory in the hippocampus
  publication-title: Science
– start-page: 500
  year: 1991
  end-page: 506
  ident: b50
  article-title: Reinforcement learning in Markovian and non-Markovian environments
  publication-title: Advances in neural information processing systems
– reference: Wang, Z., Bapst, V., Heess, N., Mnih, V., Munos, R., & Kavukcuoglu, K., et al. (2017). Sample efficient actor-critic with experience replay. In
– reference: Finn, C., Abbeel, P., & Levine, S. (2017). Model-agnostic meta-learning for fast adaptation of deep networks. In
– reference: Frans, K., Ho, J., Chen, X., Abbeel, P., & Schulman, J. (2018). Meta learning shared hierarchies. In
– volume: 60
  start-page: 1142
  year: 2008
  end-page: 1152
  ident: b8
  article-title: Probabilistic population codes for Bayesian decision making
  publication-title: Neuron
– start-page: 3540
  year: 2017
  end-page: 3549
  ident: b65
  article-title: Feudal networks for hierarchical reinforcement learning
  publication-title: Proceedings of the 34th international conference on machine learning-volume 70
– volume: 529
  start-page: 484
  year: 2016
  ident: b52
  article-title: Mastering the game of go with deep neural networks and tree search
  publication-title: nature
– volume: 1
  start-page: 55
  year: 1997
  end-page: 77
  ident: b26
  article-title: On bias, variance, 0/1–loss, and the curse-of-dimensionality
  publication-title: Data Mining and Knowledge Discovery
– year: 2015
  ident: b31
  article-title: Memory-based control with recurrent neural networks
– volume: 7
  start-page: 15759
  year: 2017
  ident: b41
  article-title: The role of forelimb motor cortex areas in goal directed action in mice
  publication-title: Scientific Reports
– reference: Brunskill, E., & Li, L. (2014). PAC-inspired option discovery in lifelong reinforcement learning. In
– start-page: 2199
  year: 2016
  end-page: 2207
  ident: b24
  article-title: Sequential neural models with stochastic layers
  publication-title: Advances in neural information processing systems
– year: 2015
  ident: b30
  article-title: Deep recurrent Q-learning for partially observable MDPs
  publication-title: 2015 AAAI Fall Symposium Series
– volume: 364
  start-page: 859
  year: 2019
  end-page: 865
  ident: b33
  article-title: Human-level performance in 3D multiplayer games with population-based reinforcement learning
  publication-title: Science
– year: 2013
  ident: b36
  article-title: Auto-encoding variational Bayes
– reference: Santoro, A., Bartunov, S., Botvinick, M., Wierstra, D., & Lillicrap, T. (2016). Meta-learning with memory-augmented neural networks. In
– reference: Fortunato, M., Azar, M. G., Piot, B., Menick, J., Osband, I., & Graves, A., et al. (2018). Noisy networks for exploration. In
– reference: (pp. 1928–1937).
– reference: (pp. 465–472).
– volume: 92
  start-page: 530
  year: 2016
  end-page: 543
  ident: b44
  article-title: Neural variability and sampling-based probabilistic representations in the visual cortex
  publication-title: Neuron
– volume: 21
  start-page: 860
  year: 2018
  ident: b68
  article-title: Prefrontal cortex as a meta-reinforcement learning system
  publication-title: Nature Neuroscience
– reference: .
– start-page: 1
  year: 2015
  end-page: 8
  ident: b51
  article-title: Reinforcement learning with internal-dynamics-based exploration using a chaotic neural network
  publication-title: Neural networks (IJCNN), 2015 international joint conference on
– reference: Abel, D., Jinnai, Y., Guo, S. Y., Konidaris, G., & Littman, M. (2018). Policy and value transfer in lifelong reinforcement learning. In
– reference: Kapturowski, S., Ostrovski, G., Dabney, W., Quan, J., & Munos, R. (2018). Recurrentexperience replay in distributed reinforcement learning. In
– volume: 548
  start-page: 92
  year: 2017
  ident: b47
  article-title: Distinct timescales of population coding across cortex
  publication-title: Nature
– volume: 74
  start-page: 30
  year: 2012
  end-page: 39
  ident: b9
  article-title: Not noisy, just wrong: the role of suboptimal inference in behavioral variability
  publication-title: Neuron
– volume: 13
  start-page: 334
  year: 1993
  end-page: 350
  ident: b56
  article-title: The highly irregular firing of cortical cells is inconsistent with temporal integration of random EPSPs
  publication-title: Journal of Neuroscience
– year: 1998
  ident: b57
  article-title: Reinforcement learning: An introduction, Vol. 1
– reference: Degris, T., White, M., & Sutton, R. S. Off-policy actor-critic. In
– start-page: 970
  year: 2008
  end-page: 978
  ident: b64
  article-title: Contextual behaviors and internal representations acquired by reinforcement learning with a recurrent neural network in a continuous state and action space task
  publication-title: International conference on neural information processing
– reference: (pp. 1856–1865).
– start-page: 593
  year: 2016
  end-page: 616
  ident: b59
  article-title: Prediction of immediate and future rewards differentially recruits cortico-basal ganglia loops
  publication-title: Behavioral economics of preferences, choices, and happiness
– volume: 13
  start-page: 227
  year: 2000
  end-page: 303
  ident: b16
  article-title: Hierarchical reinforcement learning with the maxQ value function decomposition
  publication-title: Journal of Artificial Intelligence Research
– volume: 4
  year: 2008
  ident: b70
  article-title: Emergence of functional hierarchy in a multiple timescale neural network model: a humanoid robot experiment
  publication-title: PLoS Computational Biology
– volume: 88
  start-page: 419
  year: 2015
  end-page: 431
  ident: b12
  article-title: A large-scale circuit mechanism for hierarchical dynamical processing in the primate cortex
  publication-title: Neuron
– year: 2015
  ident: 10.1016/j.neunet.2020.06.002_b30
  article-title: Deep recurrent Q-learning for partially observable MDPs
– ident: 10.1016/j.neunet.2020.06.002_b49
– year: 2017
  ident: 10.1016/j.neunet.2020.06.002_b60
  article-title: A deep hierarchical approach to lifelong learning in minecraft
– year: 2017
  ident: 10.1016/j.neunet.2020.06.002_b5
  article-title: The option-critic architecture
– volume: 529
  start-page: 484
  issue: 7587
  year: 2016
  ident: 10.1016/j.neunet.2020.06.002_b52
  article-title: Mastering the game of go with deep neural networks and tree search
  publication-title: nature
  doi: 10.1038/nature16961
– start-page: 2980
  year: 2015
  ident: 10.1016/j.neunet.2020.06.002_b13
  article-title: A recurrent latent variable model for sequential data
– ident: 10.1016/j.neunet.2020.06.002_b22
– volume: 518
  start-page: 529
  issue: 7540
  year: 2015
  ident: 10.1016/j.neunet.2020.06.002_b40
  article-title: Human-level control through deep reinforcement learning
  publication-title: Nature
  doi: 10.1038/nature14236
– volume: 8
  start-page: 188
  issue: 2
  year: 2004
  ident: 10.1016/j.neunet.2020.06.002_b32
  article-title: Multiple time scales and multiform dynamics in learning to juggle
  publication-title: Motor Control
  doi: 10.1123/mcj.8.2.188
– year: 1998
  ident: 10.1016/j.neunet.2020.06.002_b57
– volume: 10
  start-page: 174
  issue: 1
  year: 1965
  ident: 10.1016/j.neunet.2020.06.002_b4
  article-title: Optimal control of Markov processes with incomplete state information
  publication-title: Journal of Mathematical Analysis and Applications
  doi: 10.1016/0022-247X(65)90154-X
– ident: 10.1016/j.neunet.2020.06.002_b35
– volume: 550
  start-page: 354
  issue: 7676
  year: 2017
  ident: 10.1016/j.neunet.2020.06.002_b53
  article-title: Mastering the game of go without human knowledge
  publication-title: Nature
  doi: 10.1038/nature24270
– start-page: 10424
  year: 2018
  ident: 10.1016/j.neunet.2020.06.002_b46
  article-title: Learning abstract options
– ident: 10.1016/j.neunet.2020.06.002_b3
– start-page: 520
  year: 2016
  ident: 10.1016/j.neunet.2020.06.002_b72
  article-title: Learning deep neural network policies with continuous memory states
– volume: 7
  start-page: 15759
  issue: 1
  year: 2017
  ident: 10.1016/j.neunet.2020.06.002_b41
  article-title: The role of forelimb motor cortex areas in goal directed action in mice
  publication-title: Scientific Reports
  doi: 10.1038/s41598-017-15835-2
– volume: 548
  start-page: 92
  issue: 7665
  year: 2017
  ident: 10.1016/j.neunet.2020.06.002_b47
  article-title: Distinct timescales of population coding across cortex
  publication-title: Nature
  doi: 10.1038/nature23020
– volume: 4
  issue: 6
  year: 2006
  ident: 10.1016/j.neunet.2020.06.002_b55
  article-title: Interacting adaptive processes with different timescales underlie short-term motor learning
  publication-title: PLoS Biology
  doi: 10.1371/journal.pbio.0040179
– start-page: 679
  year: 1957
  ident: 10.1016/j.neunet.2020.06.002_b10
  article-title: A Markovian decision process
  publication-title: Journal of Mathematics and Mechanics
– ident: 10.1016/j.neunet.2020.06.002_b39
– volume: 1
  start-page: 55
  issue: 1
  year: 1997
  ident: 10.1016/j.neunet.2020.06.002_b26
  article-title: On bias, variance, 0/1–loss, and the curse-of-dimensionality
  publication-title: Data Mining and Knowledge Discovery
  doi: 10.1023/A:1009778005914
– volume: 11
  issue: 12
  year: 2015
  ident: 10.1016/j.neunet.2020.06.002_b29
  article-title: Where’s the noise? Key features of spontaneous activity and neural variability arise through learning in a deterministic network
  publication-title: PLoS Computational Biology
  doi: 10.1371/journal.pcbi.1004640
– volume: 108
  start-page: 57
  issue: 1
  year: 2001
  ident: 10.1016/j.neunet.2020.06.002_b43
  article-title: Time scales in motor learning and development
  publication-title: Psychological Review
  doi: 10.1037/0033-295X.108.1.57
– ident: 10.1016/j.neunet.2020.06.002_b25
– start-page: 1
  year: 2019
  ident: 10.1016/j.neunet.2020.06.002_b2
  article-title: A novel predictive-coding-inspired variational RNN model for online prediction and recognition
  publication-title: Neural Computation
– volume: 74
  start-page: 30
  issue: 1
  year: 2012
  ident: 10.1016/j.neunet.2020.06.002_b9
  article-title: Not noisy, just wrong: the role of suboptimal inference in behavioral variability
  publication-title: Neuron
  doi: 10.1016/j.neuron.2012.03.016
– ident: 10.1016/j.neunet.2020.06.002_b21
– ident: 10.1016/j.neunet.2020.06.002_b67
– start-page: 2450
  year: 2018
  ident: 10.1016/j.neunet.2020.06.002_b27
  article-title: Recurrent world models facilitate policy evolution
– volume: 341
  start-page: 387
  issue: 6144
  year: 2013
  ident: 10.1016/j.neunet.2020.06.002_b45
  article-title: Creating a false memory in the hippocampus
  publication-title: Science
  doi: 10.1126/science.1239073
– volume: 14
  start-page: 6924
  issue: 11
  year: 1994
  ident: 10.1016/j.neunet.2020.06.002_b66
  article-title: Identification of a forebrain motor programming network for the learned song of zebra finches
  publication-title: Journal of Neuroscience
  doi: 10.1523/JNEUROSCI.14-11-06924.1994
– ident: 10.1016/j.neunet.2020.06.002_b15
– ident: 10.1016/j.neunet.2020.06.002_b11
– volume: 21
  start-page: 860
  issue: 6
  year: 2018
  ident: 10.1016/j.neunet.2020.06.002_b68
  article-title: Prefrontal cortex as a meta-reinforcement learning system
  publication-title: Nature Neuroscience
  doi: 10.1038/s41593-018-0147-8
– volume: 17
  start-page: 1661
  issue: 12
  year: 2014
  ident: 10.1016/j.neunet.2020.06.002_b42
  article-title: A hierarchy of intrinsic timescales across primate cortex
  publication-title: Nature Neuroscience
  doi: 10.1038/nn.3862
– start-page: 9516
  year: 2018
  ident: 10.1016/j.neunet.2020.06.002_b20
  article-title: Probabilistic model-agnostic meta-learning
– ident: 10.1016/j.neunet.2020.06.002_b19
– year: 2015
  ident: 10.1016/j.neunet.2020.06.002_b37
– start-page: 500
  year: 1991
  ident: 10.1016/j.neunet.2020.06.002_b50
  article-title: Reinforcement learning in Markovian and non-Markovian environments
– volume: 364
  start-page: 859
  issue: 6443
  year: 2019
  ident: 10.1016/j.neunet.2020.06.002_b33
  article-title: Human-level performance in 3D multiplayer games with population-based reinforcement learning
  publication-title: Science
  doi: 10.1126/science.aau6249
– start-page: 7332
  year: 2018
  ident: 10.1016/j.neunet.2020.06.002_b71
  article-title: Bayesian model-agnostic meta-learning
– year: 2015
  ident: 10.1016/j.neunet.2020.06.002_b31
– start-page: 05
  year: 2013
  ident: 10.1016/j.neunet.2020.06.002_b54
  article-title: Lifelong machine learning systems: Beyond learning algorithms.
– volume: 13
  start-page: 334
  issue: 1
  year: 1993
  ident: 10.1016/j.neunet.2020.06.002_b56
  article-title: The highly irregular firing of cortical cells is inconsistent with temporal integration of random EPSPs
  publication-title: Journal of Neuroscience
  doi: 10.1523/JNEUROSCI.13-01-00334.1993
– volume: 112
  start-page: 181
  issue: 1–2
  year: 1999
  ident: 10.1016/j.neunet.2020.06.002_b58
  article-title: Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning
  publication-title: Artificial Intelligence
  doi: 10.1016/S0004-3702(99)00052-1
– ident: 10.1016/j.neunet.2020.06.002_b28
– volume: 60
  start-page: 1142
  issue: 6
  year: 2008
  ident: 10.1016/j.neunet.2020.06.002_b8
  article-title: Probabilistic population codes for Bayesian decision making
  publication-title: Neuron
  doi: 10.1016/j.neuron.2008.09.021
– ident: 10.1016/j.neunet.2020.06.002_b1
– year: 2019
  ident: 10.1016/j.neunet.2020.06.002_b34
– year: 2016
  ident: 10.1016/j.neunet.2020.06.002_b48
– ident: 10.1016/j.neunet.2020.06.002_b14
– year: 2013
  ident: 10.1016/j.neunet.2020.06.002_b36
– start-page: 593
  year: 2016
  ident: 10.1016/j.neunet.2020.06.002_b59
  article-title: Prediction of immediate and future rewards differentially recruits cortico-basal ganglia loops
– volume: 66
  start-page: 315
  issue: 2
  year: 2010
  ident: 10.1016/j.neunet.2020.06.002_b7
  article-title: Frontal cortex and the discovery of abstract action rules
  publication-title: Neuron
  doi: 10.1016/j.neuron.2010.03.025
– start-page: 3
  year: 1998
  ident: 10.1016/j.neunet.2020.06.002_b62
  article-title: Learning to learn: Introduction and overview
– start-page: 697
  year: 2007
  ident: 10.1016/j.neunet.2020.06.002_b69
  article-title: Solving deep memory POMDPs with recurrent policy gradients
– year: 1994
  ident: 10.1016/j.neunet.2020.06.002_b61
– start-page: 1588
  year: 2016
  ident: 10.1016/j.neunet.2020.06.002_b38
  article-title: Adaptive skills adaptive partitions
– volume: 36
  start-page: 823
  issue: 5
  year: 1930
  ident: 10.1016/j.neunet.2020.06.002_b63
  article-title: On the theory of the brownian motion
  publication-title: Physical Review
  doi: 10.1103/PhysRev.36.823
– start-page: 1
  year: 2015
  ident: 10.1016/j.neunet.2020.06.002_b51
  article-title: Reinforcement learning with internal-dynamics-based exploration using a chaotic neural network
– start-page: 970
  year: 2008
  ident: 10.1016/j.neunet.2020.06.002_b64
  article-title: Contextual behaviors and internal representations acquired by reinforcement learning with a recurrent neural network in a continuous state and action space task
– volume: 88
  start-page: 419
  issue: 2
  year: 2015
  ident: 10.1016/j.neunet.2020.06.002_b12
  article-title: A large-scale circuit mechanism for hierarchical dynamical processing in the primate cortex
  publication-title: Neuron
  doi: 10.1016/j.neuron.2015.09.008
– volume: 92
  start-page: 530
  issue: 2
  year: 2016
  ident: 10.1016/j.neunet.2020.06.002_b44
  article-title: Neural variability and sampling-based probabilistic representations in the visual cortex
  publication-title: Neuron
  doi: 10.1016/j.neuron.2016.09.038
– volume: 12
  start-page: 219
  issue: 1
  year: 2000
  ident: 10.1016/j.neunet.2020.06.002_b17
  article-title: Reinforcement learning in continuous time and space
  publication-title: Neural Computation
  doi: 10.1162/089976600300015961
– volume: 22
  start-page: 527
  issue: 3
  year: 2011
  ident: 10.1016/j.neunet.2020.06.002_b6
  article-title: Mechanisms of hierarchical reinforcement learning in cortico–striatal circuits 2: Evidence from fmri
  publication-title: Cerebral Cortex
  doi: 10.1093/cercor/bhr117
– volume: 4
  issue: 11
  year: 2008
  ident: 10.1016/j.neunet.2020.06.002_b70
  article-title: Emergence of functional hierarchy in a multiple timescale neural network model: a humanoid robot experiment
  publication-title: PLoS Computational Biology
  doi: 10.1371/journal.pcbi.1000220
– volume: 13
  start-page: 227
  year: 2000
  ident: 10.1016/j.neunet.2020.06.002_b16
  article-title: Hierarchical reinforcement learning with the maxQ value function decomposition
  publication-title: Journal of Artificial Intelligence Research
  doi: 10.1613/jair.639
– start-page: 2199
  year: 2016
  ident: 10.1016/j.neunet.2020.06.002_b24
  article-title: Sequential neural models with stochastic layers
– start-page: 3540
  year: 2017
  ident: 10.1016/j.neunet.2020.06.002_b65
  article-title: Feudal networks for hierarchical reinforcement learning
– year: 2017
  ident: 10.1016/j.neunet.2020.06.002_b23
– start-page: 201014457
  year: 2011
  ident: 10.1016/j.neunet.2020.06.002_b18
  article-title: Dopamine neurons learn to encode the long-term value of multiple future rewards
  publication-title: Proceedings of the National Academy of Sciences
SSID ssj0006843
Score 2.4059074
Snippet Recurrent neural networks (RNNs) for reinforcement learning (RL) have shown distinct advantages, e.g., solving memory-dependent tasks and meta-learning....
SourceID unpaywall
proquest
crossref
elsevier
SourceType Open Access Repository
Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 149
SubjectTerms Compositionality
Multiple timescale
Partially observable Markov decision process
Recurrent neural network
Reinforcement learning
SummonAdditionalLinks – databaseName: Unpaywall
  dbid: UNPAY
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3fS8MwED7GfNAXf4vzFxF8NKNrunR9FHGI4BB0oE_hkqaCjm64FZl_vZemHVMQ9bFNU9Lk0vuSfPcdwBndznrdXsojKxMeyazHdWAMz6S2VIShMC7e-XYgr4fRzWP3sQHndSzMl_P7koeV24IW6bSSC4NSadMpR65Id5zUhJXh4O7iqQSKieAyKBOlufzZPCTPV0fK_fCanzzREtJcLfIJzt9xNFpyOv0NuK2b67kmr-1iptvm45uS41-_ZxPWK_TJLry5bEHD5tuwUWd2YNVE34HpvR1lfLwUp8nGGfMxEMxlz3bTY84wT5njpFfErxLSMz1nb7bUYzXl1iOrElM8M7fnS2XGS0IxJ6VJjck9EX26C8P-1cPlNa_SM3BDTmzGpQ0C1Km0cYdQRKalTVKCU1k3NaGDOYmmgRax7liNMqN1V2Ad-EDhIv9QoNiDZj7O7T4wLcI0jjFOhTBR10pETW6zFyBa1BHKFoh6qJSptMtdCo2RqklqL8r3q3L9qkquXtgCvqg18dodvzwf11agKvzhcYWiUfyl5mltNIqmpztzwdyOi6kigBQm9E_sRC1oL6zpT805-G-FQ1hzV54BdwTN2VthjwkyzfRJNVM-AUlvFeI
  priority: 102
  providerName: Unpaywall
Title Self-organization of action hierarchy and compositionality by reinforcement learning with recurrent neural networks
URI https://dx.doi.org/10.1016/j.neunet.2020.06.002
https://www.proquest.com/docview/2412987014
https://doi.org/10.1016/j.neunet.2020.06.002
UnpaywallVersion publishedVersion
Volume 129
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVESC
  databaseName: Baden-Württemberg Complete Freedom Collection (Elsevier)
  customDbUrl:
  eissn: 1879-2782
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0006843
  issn: 0893-6080
  databaseCode: GBLVA
  dateStart: 20110101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
– providerCode: PRVESC
  databaseName: Elsevier SD Complete Freedom Collection [SCCMFC]
  customDbUrl:
  eissn: 1879-2782
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0006843
  issn: 0893-6080
  databaseCode: ACRLP
  dateStart: 19950101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
– providerCode: PRVESC
  databaseName: Elsevier SD Freedom Collection
  customDbUrl:
  eissn: 1879-2782
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0006843
  issn: 0893-6080
  databaseCode: .~1
  dateStart: 19950101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
– providerCode: PRVESC
  databaseName: Elsevier SD Freedom Collection Journals [SCFCJ]
  customDbUrl:
  eissn: 1879-2782
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0006843
  issn: 0893-6080
  databaseCode: AIKHN
  dateStart: 19950101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
– providerCode: PRVLSH
  databaseName: Elsevier Journals
  customDbUrl:
  mediaType: online
  eissn: 1879-2782
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0006843
  issn: 0893-6080
  databaseCode: AKRWK
  dateStart: 19930101
  isFulltext: true
  providerName: Library Specific Holdings
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3da9swEBele9hetu6LZWuLCntV41iO7DyGspK1NAy6QPckTvJ5dAQlNAkjL_vbd2fJbQaDjj0Z6wOE7nT3s_y7OyE-UnNTDataFWhGqjBNpVzmvWqMQ-qCXHuOd76amsmsuLgZ3uyJsy4WhmmVyfZHm95a69TST7vZX97e9q8zcrWGQ0VJT3PSXI5gLwzT-k5_PdA8TBWZczRY8egufK7leAXcBGRGZZ61WTzT5cpf3NMO_Hy6CUvY_oT5fMcTnR-I5wlCynFc5Uuxh-GVeNGVZ5DptL4Wq2ucN2qxE2wpF42MgQySS2Czjm8lhFoysTyxt1pcLt1W3mGbVNW394cyVZf4Lvnilvp8zOskOR8mLSZENvnqjZidf_p6NlGpxoLy5InWymCWgasNlgOCAo0zOKoJEzXD2ueMVUaOpKVLN0AHpqGPpwwZQYDm8D3QoN-K_bAI-E5Ip_O6LKGstfbFEA2AI99XZQAIrgDTE7rbWutTAnKugzG3HdPsh40CsSwQ2xLu8p5Q97OWMQHHI-PLTmr2D0Wy5CMemXnSCdnSGeMfJxBwsVlZQjn5iAzboOiJ03vp_9Ny3v_3cj6IZ_wW-WyHYn99t8EjAkBrd9xq-LF4Mv58OZnSczb9Mv72G6eTDH0
linkProvider Elsevier
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LaxsxEB5S55Be-i51nyr0qnq92tWujyE0OE3iSxLITUja2ZJiZBPbFP_7zqy0xoVCSq96wKAZzXyr_WYG4AsNt3VZN7JAPZGFbmvpMu9lqx3SlM2V53zny5me3hTfb8vbAzjpc2GYVpl8f_TpnbdOI6N0mqPl3d3oKqNQqzlVlOw0J8t9BIdFST55AIfHZ-fT2c4h6zqS52i95A19Bl1H8wq4CcikyjzrCnmm95W_RKg9BHq0CUu7_WXn871gdPoMniQUKY6joM_hAMMLeNp3aBDpwr6E1RXOW7nYy7cUi1bEXAbBXbDZzLfChkYwtzwRuDpoLtxW3GNXV9V3T4giNZj4IfjtluZ8LO0kuCQmCRMioXz1Cm5Ov12fTGVqsyA9BaO11Jhl1jUaqzGhgdZpnDQEi9qy8TnDlYkjhanKjdFZ3dL3U4YMIqziDD6rrHoNg7AI-AaEU3lTVbZqlPJFidpaR-GvzqxF6wqrh6D6ozU-1SDnVhhz05PNfpqoEMMKMR3nLh-C3O1axhocD6yveq2ZP2zJUJh4YOfnXsmGrhn_O7EBF5uVIaCTT8i3jYshfN1p_5_Eefvf4nyCo-n15YW5OJudv4PHPBPpbe9hsL7f4AfCQ2v3Mdn7b0qPDXg
linkToUnpaywall http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3fS8MwED7GfNAXf4vzFxF8NKNrunR9FHGI4BB0oE_hkqaCjm64FZl_vZemHVMQ9bFNU9Lk0vuSfPcdwBndznrdXsojKxMeyazHdWAMz6S2VIShMC7e-XYgr4fRzWP3sQHndSzMl_P7koeV24IW6bSSC4NSadMpR65Id5zUhJXh4O7iqQSKieAyKBOlufzZPCTPV0fK_fCanzzREtJcLfIJzt9xNFpyOv0NuK2b67kmr-1iptvm45uS41-_ZxPWK_TJLry5bEHD5tuwUWd2YNVE34HpvR1lfLwUp8nGGfMxEMxlz3bTY84wT5njpFfErxLSMz1nb7bUYzXl1iOrElM8M7fnS2XGS0IxJ6VJjck9EX26C8P-1cPlNa_SM3BDTmzGpQ0C1Km0cYdQRKalTVKCU1k3NaGDOYmmgRax7liNMqN1V2Ad-EDhIv9QoNiDZj7O7T4wLcI0jjFOhTBR10pETW6zFyBa1BHKFoh6qJSptMtdCo2RqklqL8r3q3L9qkquXtgCvqg18dodvzwf11agKvzhcYWiUfyl5mltNIqmpztzwdyOi6kigBQm9E_sRC1oL6zpT805-G-FQ1hzV54BdwTN2VthjwkyzfRJNVM-AUlvFeI
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Self-organization+of+action+hierarchy+and+compositionality+by+reinforcement+learning+with+recurrent+neural+networks&rft.jtitle=Neural+networks&rft.au=Han%2C+Dongqi&rft.au=Doya%2C+Kenji&rft.au=Tani%2C+Jun&rft.date=2020-09-01&rft.pub=Elsevier+Ltd&rft.issn=0893-6080&rft.eissn=1879-2782&rft.volume=129&rft.spage=149&rft.epage=162&rft_id=info:doi/10.1016%2Fj.neunet.2020.06.002&rft.externalDocID=S0893608020302070
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0893-6080&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0893-6080&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0893-6080&client=summon