Self-organization of action hierarchy and compositionality by reinforcement learning with recurrent neural networks

Recurrent neural networks (RNNs) for reinforcement learning (RL) have shown distinct advantages, e.g., solving memory-dependent tasks and meta-learning. However, little effort has been spent on improving RNN architectures and on understanding the underlying neural mechanisms for performance gain. In...

Full description

Saved in:

Bibliographic Details
Published in	Neural networks Vol. 129; pp. 149 - 162
Main Authors	Han, Dongqi, Doya, Kenji, Tani, Jun
Format	Journal Article
Language	English
Published	Elsevier Ltd 01.09.2020
Subjects	Compositionality Multiple timescale Partially observable Markov decision process Recurrent neural network Reinforcement learning Multiple timescale Recurrent neural network Partially observable Markov decision process Compositionality Reinforcement learning
Online Access	Get full text
ISSN	0893-6080 1879-2782 1879-2782
DOI	10.1016/j.neunet.2020.06.002

Cover

Abstract	Recurrent neural networks (RNNs) for reinforcement learning (RL) have shown distinct advantages, e.g., solving memory-dependent tasks and meta-learning. However, little effort has been spent on improving RNN architectures and on understanding the underlying neural mechanisms for performance gain. In this paper, we propose a novel, multiple-timescale, stochastic RNN for RL. Empirical results show that the network can autonomously learn to abstract sub-goals and can self-develop an action hierarchy using internal dynamics in a challenging continuous control task. Furthermore, we show that the self-developed compositionality of the network enhances faster re-learning when adapting to a new task that is a re-composition of previously learned sub-goals, than when starting from scratch. We also found that improved performance can be achieved when neural activities are subject to stochastic rather than deterministic dynamics.
AbstractList	Recurrent neural networks (RNNs) for reinforcement learning (RL) have shown distinct advantages, e.g., solving memory-dependent tasks and meta-learning. However, little effort has been spent on improving RNN architectures and on understanding the underlying neural mechanisms for performance gain. In this paper, we propose a novel, multiple-timescale, stochastic RNN for RL. Empirical results show that the network can autonomously learn to abstract sub-goals and can self-develop an action hierarchy using internal dynamics in a challenging continuous control task. Furthermore, we show that the self-developed compositionality of the network enhances faster re-learning when adapting to a new task that is a re-composition of previously learned sub-goals, than when starting from scratch. We also found that improved performance can be achieved when neural activities are subject to stochastic rather than deterministic dynamics.Recurrent neural networks (RNNs) for reinforcement learning (RL) have shown distinct advantages, e.g., solving memory-dependent tasks and meta-learning. However, little effort has been spent on improving RNN architectures and on understanding the underlying neural mechanisms for performance gain. In this paper, we propose a novel, multiple-timescale, stochastic RNN for RL. Empirical results show that the network can autonomously learn to abstract sub-goals and can self-develop an action hierarchy using internal dynamics in a challenging continuous control task. Furthermore, we show that the self-developed compositionality of the network enhances faster re-learning when adapting to a new task that is a re-composition of previously learned sub-goals, than when starting from scratch. We also found that improved performance can be achieved when neural activities are subject to stochastic rather than deterministic dynamics. Recurrent neural networks (RNNs) for reinforcement learning (RL) have shown distinct advantages, e.g., solving memory-dependent tasks and meta-learning. However, little effort has been spent on improving RNN architectures and on understanding the underlying neural mechanisms for performance gain. In this paper, we propose a novel, multiple-timescale, stochastic RNN for RL. Empirical results show that the network can autonomously learn to abstract sub-goals and can self-develop an action hierarchy using internal dynamics in a challenging continuous control task. Furthermore, we show that the self-developed compositionality of the network enhances faster re-learning when adapting to a new task that is a re-composition of previously learned sub-goals, than when starting from scratch. We also found that improved performance can be achieved when neural activities are subject to stochastic rather than deterministic dynamics.
Author	Doya, Kenji Han, Dongqi Tani, Jun
Author_xml	– sequence: 1 givenname: Dongqi orcidid: 0000-0002-6872-7121 surname: Han fullname: Han, Dongqi organization: Cognitive Neurorobotics Research Unit, Okinawa Institute of Science and Technology, Okinawa, Japan – sequence: 2 givenname: Kenji surname: Doya fullname: Doya, Kenji organization: Neural Computation Unit, Okinawa Institute of Science and Technology, Okinawa, Japan – sequence: 3 givenname: Jun surname: Tani fullname: Tani, Jun email: jun.tani@oist.jp organization: Cognitive Neurorobotics Research Unit, Okinawa Institute of Science and Technology, Okinawa, Japan
BookMark	eNqNkU1v1DAQhi1UJLal_4BDjlwSxk7WSTggoYovqRIH2rM1cSZdL157sR1W4dfX23DiAJxmNO-8M6NnLtmF844Ye8Wh4sDlm33laHaUKgECKpAVgHjGNrxr-1K0nbhgG-j6upTQwQt2GeMeAGTX1BsWv5GdSh8e0JlfmIx3hZ8K1E_ZzlDAoHdLgW4stD8cfTRnBa1JSzEsRSDjJh80HcilwhIGZ9xDcTJplzU9h3Cu5-sC2hzSyYfv8SV7PqGNdP07XrH7jx_ubj6Xt18_fbl5f1vqRvSplASAwyip5QL4NEjqR9jKaTtq0dR92w-i1nU7cBpQTpw3QD3wLdbjICessb5i23Xu7I64nNBadQzmgGFRHNSZnNqrlZw6k1MgVSaXfa9X3zH4HzPFpA4marIWHfk5KtFw0Xct8Ca3vl1bdfAxBpqUNukJYwpo7L_2NH-Y__O8d6uNMruf-UUqakNO02gy8qRGb_4-4BHMC7Ii
CitedBy_id	crossref_primary_10_1016_j_neunet_2022_02_026 crossref_primary_10_1007_s12559_022_10080_w crossref_primary_10_3389_fncom_2022_892354 crossref_primary_10_1051_e3sconf_202127001036 crossref_primary_10_3389_fpsyt_2022_1008011 crossref_primary_10_1016_j_neunet_2022_02_024 crossref_primary_10_1155_2021_7607623
Cites_doi	10.1038/nature16961 10.1038/nature14236 10.1123/mcj.8.2.188 10.1016/0022-247X(65)90154-X 10.1038/nature24270 10.1038/s41598-017-15835-2 10.1038/nature23020 10.1371/journal.pbio.0040179 10.1023/A:1009778005914 10.1371/journal.pcbi.1004640 10.1037/0033-295X.108.1.57 10.1016/j.neuron.2012.03.016 10.1126/science.1239073 10.1523/JNEUROSCI.14-11-06924.1994 10.1038/s41593-018-0147-8 10.1038/nn.3862 10.1126/science.aau6249 10.1523/JNEUROSCI.13-01-00334.1993 10.1016/S0004-3702(99)00052-1 10.1016/j.neuron.2008.09.021 10.1016/j.neuron.2010.03.025 10.1103/PhysRev.36.823 10.1016/j.neuron.2015.09.008 10.1016/j.neuron.2016.09.038 10.1162/089976600300015961 10.1093/cercor/bhr117 10.1371/journal.pcbi.1000220 10.1613/jair.639
ContentType	Journal Article
Copyright	2020 The Authors Copyright © 2020 The Authors. Published by Elsevier Ltd.. All rights reserved.
Copyright_xml	– notice: 2020 The Authors – notice: Copyright © 2020 The Authors. Published by Elsevier Ltd.. All rights reserved.
DBID	6I. AAFTH AAYXX CITATION 7X8 ADTOC UNPAY
DOI	10.1016/j.neunet.2020.06.002
DatabaseName	ScienceDirect Open Access Titles Elsevier:ScienceDirect:Open Access CrossRef MEDLINE - Academic Unpaywall for CDI: Periodical Content Unpaywall
DatabaseTitle	CrossRef MEDLINE - Academic
DatabaseTitleList	MEDLINE - Academic
Database_xml	– sequence: 1 dbid: UNPAY name: Unpaywall url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/ sourceTypes: Open Access Repository
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISSN	1879-2782
EndPage	162
ExternalDocumentID	10.1016/j.neunet.2020.06.002 10_1016_j_neunet_2020_06_002 S0893608020302070
GroupedDBID	--- --K --M -~X .DC .~1 0R~ 123 186 1B1 1RT 1~. 1~5 29N 4.4 457 4G. 53G 5RE 5VS 6I. 6TJ 7-5 71M 8P~ 9JM 9JN AABNK AACTN AADPK AAEDT AAEDW AAFTH AAIAV AAIKJ AAKOC AALRI AAOAW AAQFI AAQXK AAXLA AAXUO AAYFN ABAOU ABBOA ABCQJ ABEFU ABFNM ABFRF ABHFT ABIVO ABJNI ABLJU ABMAC ABXDB ABYKQ ACAZW ACDAQ ACGFO ACGFS ACIUM ACNNM ACRLP ACZNC ADBBV ADEZE ADGUI ADJOM ADMUD ADRHT AEBSH AECPX AEFWE AEKER AENEX AFKWA AFTJW AFXIZ AGHFR AGUBO AGWIK AGYEJ AHHHB AHJVU AHZHX AIALX AIEXJ AIKHN AITUG AJBFU AJOXV ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ AOUOD ARUGR ASPBG AVWKF AXJTR AZFZN BJAXD BKOJK BLXMC CS3 DU5 EBS EFJIC EFLBG EJD EO8 EO9 EP2 EP3 F0J F5P FDB FEDTE FGOYB FIRID FNPLU FYGXN G-2 G-Q G8K GBLVA GBOLZ HLZ HMQ HVGLF HZ~ IHE J1W JJJVA K-O KOM KZ1 LG9 LMP M2V M41 MHUIS MO0 MOBAO MVM N9A O-L O9- OAUVE OZT P-8 P-9 P2P PC. Q38 R2- RIG ROL RPZ SBC SCC SDF SDG SDP SES SEW SNS SPC SPCBC SSN SST SSV SSW SSZ T5K TAE UAP UNMZH VOH WUQ XPP ZMT ~G- AATTM AAXKI AAYWO AAYXX ABDPE ABWVN ACLOT ACRPL ACVFH ADCNI ADNMO AEIPS AEUPX AFJKZ AFPUW AGQPQ AIGII AIIUN AKBMS AKRWK AKYEP ANKPU APXCP CITATION EFKBS ~HD 7X8 ADTOC AGCQF UNPAY
ID	FETCH-LOGICAL-c429t-6e00abd6e71201fb6e9d056f5dc243979b23c37b1eba6f1140e9015a3db6fa3a3
IEDL.DBID	.~1
ISSN	0893-6080 1879-2782
IngestDate	Tue Aug 19 21:38:49 EDT 2025 Mon Sep 29 05:37:07 EDT 2025 Wed Oct 01 02:08:00 EDT 2025 Thu Apr 24 23:07:47 EDT 2025 Fri Feb 23 02:46:21 EST 2024
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Keywords	Multiple timescale Recurrent neural network Partially observable Markov decision process Compositionality Reinforcement learning
Language	English
License	This is an open access article under the CC BY-NC-ND license. cc-by-nc-nd
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c429t-6e00abd6e71201fb6e9d056f5dc243979b23c37b1eba6f1140e9015a3db6fa3a3
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ORCID	0000-0002-6872-7121
OpenAccessLink	https://www.sciencedirect.com/science/article/pii/S0893608020302070
PQID	2412987014
PQPubID	23479
PageCount	14
ParticipantIDs	unpaywall_primary_10_1016_j_neunet_2020_06_002 proquest_miscellaneous_2412987014 crossref_citationtrail_10_1016_j_neunet_2020_06_002 crossref_primary_10_1016_j_neunet_2020_06_002 elsevier_sciencedirect_doi_10_1016_j_neunet_2020_06_002
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2020-09-01
PublicationDateYYYYMMDD	2020-09-01
PublicationDate_xml	– month: 09 year: 2020 text: 2020-09-01 day: 01
PublicationDecade	2020
PublicationTitle	Neural networks
PublicationYear	2020
Publisher	Elsevier Ltd
Publisher_xml	– name: Elsevier Ltd
References	Sutton, Precup, Singh (b58) 1999; 112 Deisenroth, M., & Rasmussen, C. E. (2011). PILCO: A model-based and data-efficient approach to policy search. In Friedman (b26) 1997; 1 Thrun, Mitchell (b61) 1994 Silver, Yang, Li (b54) 2013 Morandell, Huber (b41) 2017; 7 Brunskill, E., & Li, L. (2014). PAC-inspired option discovery in lifelong reinforcement learning. In Frans, K., Ho, J., Chen, X., Abbeel, P., & Schulman, J. (2018). Meta learning shared hierarchies. In Badre, Frank (b6) 2011; 22 Degris, T., White, M., & Sutton, R. S. Off-policy actor-critic. In Uhlenbeck, Ornstein (b63) 1930; 36 Åström (b4) 1965; 10 (pp. 1928–1937). (pp. 179–186). Abel, D., Jinnai, Y., Guo, S. Y., Konidaris, G., & Littman, M. (2018). Policy and value transfer in lifelong reinforcement learning. In Lillicrap, Hunt, Pritzel, Heess, Erez, Tassa (b37) 2015 Bellman (b10) 1957 (pp. 1856–1865). Schmidhuber (b50) 1991 Al-Shedivat, M., Bansal, T., Burda, Y., Sutskever, I., Mordatch, I., & Abbeel, P. (2018). Continuous adaptation via meta-learning in nonstationary and competitive environments. In (pp. 465–472). Utsunomiya, Shibata (b64) 2008 Fox, Krishnan, Stoica, Goldberg (b23) 2017 Kapturowski, S., Ostrovski, G., Dabney, W., Quan, J., & Munos, R. (2018). Recurrentexperience replay in distributed reinforcement learning. In . Wierstra, Foerster, Peters, Schmidhuber (b69) 2007 Softky, Koch (b56) 1993; 13 Wang, Z., Bapst, V., Heess, N., Mnih, V., Munos, R., & Kavukcuoglu, K., et al. (2017). Sample efficient actor-critic with experience replay. In Finn, Xu, Levine (b20) 2018 Haarnoja, T., Zhou, A., Abbeel, P., & Levine, S. (2018). soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Kingma, Welling (b36) 2013 Silver, Huang, Maddison, Guez, Sifre, Van Den Driessche (b52) 2016; 529 Tanaka, Doya, Okada, Ueda, Okamoto, Yamawaki (b59) 2016 Murray, Bernacchia, Freedman, Romo, Wallis, Cai (b42) 2014; 17 Ha, Schmidhuber (b27) 2018 Huys, Daffertshofer, Beek (b32) 2004; 8 Dietterich (b16) 2000; 13 Tessler, Givony, Zahavy, Mankowitz, Mannor (b60) 2017 Florensa, C., Duan, Y., & Abbeel, P. (2017). Stochastic neural networks for hierarchical reinforcement learning. In Beck, Ma, Kiani, Hanks, Churchland, Roitman (b8) 2008; 60 Doya (b17) 2000; 12 Jaderberg, Czarnecki, Dunning, Marris, Lever, Castaneda (b33) 2019; 364 Beck, Ma, Pitkow, Latham, Pouget (b9) 2012; 74 Kaiser, Babaeizadeh, Milos, Osinski, Campbell, Czechowski (b34) 2019 Riemer, Liu, Tesauro (b46) 2018 (pp. 842–1850). Fraccaro, Sønderby, Paquet, Winther (b24) 2016 Yoon, Kim, Dia, Kim, Bengio, Ahn (b71) 2018 Yamashita, Tani (b70) 2008; 4 Hausknecht, Stone (b30) 2015 Chung, Kastner, Dinh, Goel, Courville, Bengio (b13) 2015 Smith, Ghazizadeh, Shadmehr (b55) 2006; 4 (pp. 316–324). Vezhnevets, Osindero, Schaul, Heess, Jaderberg, Silver (b65) 2017 Ahmadi, Tani (b2) 2019 Silver, Schrittwieser, Simonyan, Antonoglou, Huang, Guez (b53) 2017; 550 (pp. 1126–1135). Mankowitz, Mann, Mannor (b38) 2016 Santoro, A., Bartunov, S., Botvinick, M., Wierstra, D., & Lillicrap, T. (2016). Meta-learning with memory-augmented neural networks. In Sutton, Barto (b57) 1998 Zhang, McCarthy, Finn, Levine, Abbeel (b72) 2016 Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T., & Harley, T., et al. (2016). Asynchronous methods for deep reinforcement learning. In Hartmann, Lazar, Nessler, Triesch (b29) 2015; 11 Heess, Hunt, Lillicrap, Silver (b31) 2015 Wang, Kurth-Nelson, Kumaran, Tirumala, Soyer, Leibo (b68) 2018; 21 Rusu, Rabinowitz, Desjardins, Soyer, Kirkpatrick, Kavukcuoglu (b48) 2016 Finn, C., Abbeel, P., & Levine, S. (2017). Model-agnostic meta-learning for fast adaptation of deep networks. In Ramirez, Liu, Lin, Suh, Pignatelli, Redondo (b45) 2013; 341 Bacon, Harb, Precup (b5) 2017 Newell, Liu, Mayer-Kress (b43) 2001; 108 Thrun, Pratt (b62) 1998 Orbán, Berkes, Fiser, Lengyel (b44) 2016; 92 (pp. 20–29). Vu, Mazurek, Kuo (b66) 1994; 14 Badre, Kayser, D’Esposito (b7) 2010; 66 Enomoto, Matsumoto, Nakai, Satoh, Sato, Ueda (b18) 2011 Runyan, Piasini, Panzeri, Harvey (b47) 2017; 548 Chaudhuri, Knoblauch, Gariel, Kennedy, Wang (b12) 2015; 88 Mnih, Kavukcuoglu, Silver, Rusu, Veness, Bellemare (b40) 2015; 518 Shibata, Sakashita (b51) 2015 Fortunato, M., Azar, M. G., Piot, B., Menick, J., Osband, I., & Graves, A., et al. (2018). Noisy networks for exploration. In Runyan (10.1016/j.neunet.2020.06.002_b47) 2017; 548 Mankowitz (10.1016/j.neunet.2020.06.002_b38) 2016 Vu (10.1016/j.neunet.2020.06.002_b66) 1994; 14 Rusu (10.1016/j.neunet.2020.06.002_b48) 2016 Thrun (10.1016/j.neunet.2020.06.002_b62) 1998 Sutton (10.1016/j.neunet.2020.06.002_b57) 1998 Schmidhuber (10.1016/j.neunet.2020.06.002_b50) 1991 Beck (10.1016/j.neunet.2020.06.002_b8) 2008; 60 Fraccaro (10.1016/j.neunet.2020.06.002_b24) 2016 Ramirez (10.1016/j.neunet.2020.06.002_b45) 2013; 341 Vezhnevets (10.1016/j.neunet.2020.06.002_b65) 2017 Finn (10.1016/j.neunet.2020.06.002_b20) 2018 Hausknecht (10.1016/j.neunet.2020.06.002_b30) 2015 10.1016/j.neunet.2020.06.002_b49 Bellman (10.1016/j.neunet.2020.06.002_b10) 1957 Orbán (10.1016/j.neunet.2020.06.002_b44) 2016; 92 Hartmann (10.1016/j.neunet.2020.06.002_b29) 2015; 11 Tanaka (10.1016/j.neunet.2020.06.002_b59) 2016 Lillicrap (10.1016/j.neunet.2020.06.002_b37) 2015 Utsunomiya (10.1016/j.neunet.2020.06.002_b64) 2008 Thrun (10.1016/j.neunet.2020.06.002_b61) 1994 Fox (10.1016/j.neunet.2020.06.002_b23) 2017 Wierstra (10.1016/j.neunet.2020.06.002_b69) 2007 Zhang (10.1016/j.neunet.2020.06.002_b72) 2016 10.1016/j.neunet.2020.06.002_b11 10.1016/j.neunet.2020.06.002_b14 Jaderberg (10.1016/j.neunet.2020.06.002_b33) 2019; 364 10.1016/j.neunet.2020.06.002_b15 Bacon (10.1016/j.neunet.2020.06.002_b5) 2017 10.1016/j.neunet.2020.06.002_b19 Softky (10.1016/j.neunet.2020.06.002_b56) 1993; 13 Morandell (10.1016/j.neunet.2020.06.002_b41) 2017; 7 Ahmadi (10.1016/j.neunet.2020.06.002_b2) 2019 Silver (10.1016/j.neunet.2020.06.002_b52) 2016; 529 Doya (10.1016/j.neunet.2020.06.002_b17) 2000; 12 Åström (10.1016/j.neunet.2020.06.002_b4) 1965; 10 10.1016/j.neunet.2020.06.002_b21 10.1016/j.neunet.2020.06.002_b22 10.1016/j.neunet.2020.06.002_b67 Wang (10.1016/j.neunet.2020.06.002_b68) 2018; 21 Mnih (10.1016/j.neunet.2020.06.002_b40) 2015; 518 10.1016/j.neunet.2020.06.002_b25 Smith (10.1016/j.neunet.2020.06.002_b55) 2006; 4 10.1016/j.neunet.2020.06.002_b28 Enomoto (10.1016/j.neunet.2020.06.002_b18) 2011 Badre (10.1016/j.neunet.2020.06.002_b6) 2011; 22 Uhlenbeck (10.1016/j.neunet.2020.06.002_b63) 1930; 36 Riemer (10.1016/j.neunet.2020.06.002_b46) 2018 Dietterich (10.1016/j.neunet.2020.06.002_b16) 2000; 13 Chung (10.1016/j.neunet.2020.06.002_b13) 2015 Kaiser (10.1016/j.neunet.2020.06.002_b34) 2019 Friedman (10.1016/j.neunet.2020.06.002_b26) 1997; 1 Sutton (10.1016/j.neunet.2020.06.002_b58) 1999; 112 Silver (10.1016/j.neunet.2020.06.002_b53) 2017; 550 Newell (10.1016/j.neunet.2020.06.002_b43) 2001; 108 Yoon (10.1016/j.neunet.2020.06.002_b71) 2018 Beck (10.1016/j.neunet.2020.06.002_b9) 2012; 74 10.1016/j.neunet.2020.06.002_b3 Chaudhuri (10.1016/j.neunet.2020.06.002_b12) 2015; 88 Huys (10.1016/j.neunet.2020.06.002_b32) 2004; 8 10.1016/j.neunet.2020.06.002_b1 Heess (10.1016/j.neunet.2020.06.002_b31) 2015 Murray (10.1016/j.neunet.2020.06.002_b42) 2014; 17 Badre (10.1016/j.neunet.2020.06.002_b7) 2010; 66 Tessler (10.1016/j.neunet.2020.06.002_b60) 2017 10.1016/j.neunet.2020.06.002_b35 Silver (10.1016/j.neunet.2020.06.002_b54) 2013 10.1016/j.neunet.2020.06.002_b39 Yamashita (10.1016/j.neunet.2020.06.002_b70) 2008; 4 Kingma (10.1016/j.neunet.2020.06.002_b36) 2013 Shibata (10.1016/j.neunet.2020.06.002_b51) 2015 Ha (10.1016/j.neunet.2020.06.002_b27) 2018
References_xml	– start-page: 697 year: 2007 end-page: 706 ident: b69 article-title: Solving deep memory POMDPs with recurrent policy gradients publication-title: International conference on artificial neural networks – year: 2019 ident: b34 article-title: Model-based reinforcement learning for Atari – reference: Deisenroth, M., & Rasmussen, C. E. (2011). PILCO: A model-based and data-efficient approach to policy search. In – start-page: 520 year: 2016 end-page: 527 ident: b72 article-title: Learning deep neural network policies with continuous memory states publication-title: 2016 IEEE international conference on robotics and automation (ICRA) – reference: Al-Shedivat, M., Bansal, T., Burda, Y., Sutskever, I., Mordatch, I., & Abbeel, P. (2018). Continuous adaptation via meta-learning in nonstationary and competitive environments. In – start-page: 9516 year: 2018 end-page: 9527 ident: b20 article-title: Probabilistic model-agnostic meta-learning publication-title: Advances in neural information processing systems – start-page: 2980 year: 2015 end-page: 2988 ident: b13 article-title: A recurrent latent variable model for sequential data publication-title: Advances in neural information processing systems – reference: Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T., & Harley, T., et al. (2016). Asynchronous methods for deep reinforcement learning. In – volume: 66 start-page: 315 year: 2010 end-page: 326 ident: b7 article-title: Frontal cortex and the discovery of abstract action rules publication-title: Neuron – volume: 4 year: 2006 ident: b55 article-title: Interacting adaptive processes with different timescales underlie short-term motor learning publication-title: PLoS Biology – start-page: 10424 year: 2018 end-page: 10434 ident: b46 article-title: Learning abstract options publication-title: Advances in neural information processing systems – reference: (pp. 316–324). – volume: 518 start-page: 529 year: 2015 ident: b40 article-title: Human-level control through deep reinforcement learning publication-title: Nature – reference: (pp. 1126–1135). – start-page: 05 year: 2013 ident: b54 article-title: Lifelong machine learning systems: Beyond learning algorithms. publication-title: AAAI spring symposium: lifelong machine learning, Vol. 13 – year: 2017 ident: b5 article-title: The option-critic architecture publication-title: Thirty-First AAAI conference on artificial intelligence – year: 1994 ident: b61 article-title: Learning one more thing – start-page: 7332 year: 2018 end-page: 7342 ident: b71 article-title: Bayesian model-agnostic meta-learning publication-title: Advances in neural information processing systems – start-page: 2450 year: 2018 end-page: 2462 ident: b27 article-title: Recurrent world models facilitate policy evolution publication-title: Advances in neural information processing systems – volume: 108 start-page: 57 year: 2001 ident: b43 article-title: Time scales in motor learning and development publication-title: Psychological Review – reference: (pp. 842–1850). – reference: Florensa, C., Duan, Y., & Abbeel, P. (2017). Stochastic neural networks for hierarchical reinforcement learning. In – reference: Haarnoja, T., Zhou, A., Abbeel, P., & Levine, S. (2018). soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In – volume: 8 start-page: 188 year: 2004 end-page: 212 ident: b32 article-title: Multiple time scales and multiform dynamics in learning to juggle publication-title: Motor Control – start-page: 3 year: 1998 end-page: 17 ident: b62 article-title: Learning to learn: Introduction and overview publication-title: Learning to learn – year: 2017 ident: b23 article-title: Multi-level discovery of deep options – year: 2017 ident: b60 article-title: A deep hierarchical approach to lifelong learning in minecraft publication-title: Thirty-First AAAI conference on artificial intelligence – volume: 17 start-page: 1661 year: 2014 ident: b42 article-title: A hierarchy of intrinsic timescales across primate cortex publication-title: Nature Neuroscience – volume: 112 start-page: 181 year: 1999 end-page: 211 ident: b58 article-title: Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning publication-title: Artificial Intelligence – volume: 22 start-page: 527 year: 2011 end-page: 536 ident: b6 article-title: Mechanisms of hierarchical reinforcement learning in cortico–striatal circuits 2: Evidence from fmri publication-title: Cerebral Cortex – volume: 550 start-page: 354 year: 2017 ident: b53 article-title: Mastering the game of go without human knowledge publication-title: Nature – start-page: 201014457 year: 2011 ident: b18 article-title: Dopamine neurons learn to encode the long-term value of multiple future rewards publication-title: Proceedings of the National Academy of Sciences – volume: 11 year: 2015 ident: b29 article-title: Where’s the noise? Key features of spontaneous activity and neural variability arise through learning in a deterministic network publication-title: PLoS Computational Biology – start-page: 1588 year: 2016 end-page: 1596 ident: b38 article-title: Adaptive skills adaptive partitions publication-title: Advances in neural information processing systems – start-page: 679 year: 1957 end-page: 684 ident: b10 article-title: A Markovian decision process publication-title: Journal of Mathematics and Mechanics – volume: 12 start-page: 219 year: 2000 end-page: 245 ident: b17 article-title: Reinforcement learning in continuous time and space publication-title: Neural Computation – volume: 10 start-page: 174 year: 1965 end-page: 205 ident: b4 article-title: Optimal control of Markov processes with incomplete state information publication-title: Journal of Mathematical Analysis and Applications – year: 2015 ident: b37 article-title: Continuous control with deep reinforcement learning – year: 2016 ident: b48 article-title: Progressive neural networks – volume: 14 start-page: 6924 year: 1994 end-page: 6934 ident: b66 article-title: Identification of a forebrain motor programming network for the learned song of zebra finches publication-title: Journal of Neuroscience – start-page: 1 year: 2019 end-page: 50 ident: b2 article-title: A novel predictive-coding-inspired variational RNN model for online prediction and recognition publication-title: Neural Computation – reference: (pp. 179–186). – reference: (pp. 20–29). – volume: 36 start-page: 823 year: 1930 ident: b63 article-title: On the theory of the brownian motion publication-title: Physical Review – volume: 341 start-page: 387 year: 2013 end-page: 391 ident: b45 article-title: Creating a false memory in the hippocampus publication-title: Science – start-page: 500 year: 1991 end-page: 506 ident: b50 article-title: Reinforcement learning in Markovian and non-Markovian environments publication-title: Advances in neural information processing systems – reference: Wang, Z., Bapst, V., Heess, N., Mnih, V., Munos, R., & Kavukcuoglu, K., et al. (2017). Sample efficient actor-critic with experience replay. In – reference: Finn, C., Abbeel, P., & Levine, S. (2017). Model-agnostic meta-learning for fast adaptation of deep networks. In – reference: Frans, K., Ho, J., Chen, X., Abbeel, P., & Schulman, J. (2018). Meta learning shared hierarchies. In – volume: 60 start-page: 1142 year: 2008 end-page: 1152 ident: b8 article-title: Probabilistic population codes for Bayesian decision making publication-title: Neuron – start-page: 3540 year: 2017 end-page: 3549 ident: b65 article-title: Feudal networks for hierarchical reinforcement learning publication-title: Proceedings of the 34th international conference on machine learning-volume 70 – volume: 529 start-page: 484 year: 2016 ident: b52 article-title: Mastering the game of go with deep neural networks and tree search publication-title: nature – volume: 1 start-page: 55 year: 1997 end-page: 77 ident: b26 article-title: On bias, variance, 0/1–loss, and the curse-of-dimensionality publication-title: Data Mining and Knowledge Discovery – year: 2015 ident: b31 article-title: Memory-based control with recurrent neural networks – volume: 7 start-page: 15759 year: 2017 ident: b41 article-title: The role of forelimb motor cortex areas in goal directed action in mice publication-title: Scientific Reports – reference: Brunskill, E., & Li, L. (2014). PAC-inspired option discovery in lifelong reinforcement learning. In – start-page: 2199 year: 2016 end-page: 2207 ident: b24 article-title: Sequential neural models with stochastic layers publication-title: Advances in neural information processing systems – year: 2015 ident: b30 article-title: Deep recurrent Q-learning for partially observable MDPs publication-title: 2015 AAAI Fall Symposium Series – volume: 364 start-page: 859 year: 2019 end-page: 865 ident: b33 article-title: Human-level performance in 3D multiplayer games with population-based reinforcement learning publication-title: Science – year: 2013 ident: b36 article-title: Auto-encoding variational Bayes – reference: Santoro, A., Bartunov, S., Botvinick, M., Wierstra, D., & Lillicrap, T. (2016). Meta-learning with memory-augmented neural networks. In – reference: Fortunato, M., Azar, M. G., Piot, B., Menick, J., Osband, I., & Graves, A., et al. (2018). Noisy networks for exploration. In – reference: (pp. 1928–1937). – reference: (pp. 465–472). – volume: 92 start-page: 530 year: 2016 end-page: 543 ident: b44 article-title: Neural variability and sampling-based probabilistic representations in the visual cortex publication-title: Neuron – volume: 21 start-page: 860 year: 2018 ident: b68 article-title: Prefrontal cortex as a meta-reinforcement learning system publication-title: Nature Neuroscience – reference: . – start-page: 1 year: 2015 end-page: 8 ident: b51 article-title: Reinforcement learning with internal-dynamics-based exploration using a chaotic neural network publication-title: Neural networks (IJCNN), 2015 international joint conference on – reference: Abel, D., Jinnai, Y., Guo, S. Y., Konidaris, G., & Littman, M. (2018). Policy and value transfer in lifelong reinforcement learning. In – reference: Kapturowski, S., Ostrovski, G., Dabney, W., Quan, J., & Munos, R. (2018). Recurrentexperience replay in distributed reinforcement learning. In – volume: 548 start-page: 92 year: 2017 ident: b47 article-title: Distinct timescales of population coding across cortex publication-title: Nature – volume: 74 start-page: 30 year: 2012 end-page: 39 ident: b9 article-title: Not noisy, just wrong: the role of suboptimal inference in behavioral variability publication-title: Neuron – volume: 13 start-page: 334 year: 1993 end-page: 350 ident: b56 article-title: The highly irregular firing of cortical cells is inconsistent with temporal integration of random EPSPs publication-title: Journal of Neuroscience – year: 1998 ident: b57 article-title: Reinforcement learning: An introduction, Vol. 1 – reference: Degris, T., White, M., & Sutton, R. S. Off-policy actor-critic. In – start-page: 970 year: 2008 end-page: 978 ident: b64 article-title: Contextual behaviors and internal representations acquired by reinforcement learning with a recurrent neural network in a continuous state and action space task publication-title: International conference on neural information processing – reference: (pp. 1856–1865). – start-page: 593 year: 2016 end-page: 616 ident: b59 article-title: Prediction of immediate and future rewards differentially recruits cortico-basal ganglia loops publication-title: Behavioral economics of preferences, choices, and happiness – volume: 13 start-page: 227 year: 2000 end-page: 303 ident: b16 article-title: Hierarchical reinforcement learning with the maxQ value function decomposition publication-title: Journal of Artificial Intelligence Research – volume: 4 year: 2008 ident: b70 article-title: Emergence of functional hierarchy in a multiple timescale neural network model: a humanoid robot experiment publication-title: PLoS Computational Biology – volume: 88 start-page: 419 year: 2015 end-page: 431 ident: b12 article-title: A large-scale circuit mechanism for hierarchical dynamical processing in the primate cortex publication-title: Neuron – year: 2015 ident: 10.1016/j.neunet.2020.06.002_b30 article-title: Deep recurrent Q-learning for partially observable MDPs – ident: 10.1016/j.neunet.2020.06.002_b49 – year: 2017 ident: 10.1016/j.neunet.2020.06.002_b60 article-title: A deep hierarchical approach to lifelong learning in minecraft – year: 2017 ident: 10.1016/j.neunet.2020.06.002_b5 article-title: The option-critic architecture – volume: 529 start-page: 484 issue: 7587 year: 2016 ident: 10.1016/j.neunet.2020.06.002_b52 article-title: Mastering the game of go with deep neural networks and tree search publication-title: nature doi: 10.1038/nature16961 – start-page: 2980 year: 2015 ident: 10.1016/j.neunet.2020.06.002_b13 article-title: A recurrent latent variable model for sequential data – ident: 10.1016/j.neunet.2020.06.002_b22 – volume: 518 start-page: 529 issue: 7540 year: 2015 ident: 10.1016/j.neunet.2020.06.002_b40 article-title: Human-level control through deep reinforcement learning publication-title: Nature doi: 10.1038/nature14236 – volume: 8 start-page: 188 issue: 2 year: 2004 ident: 10.1016/j.neunet.2020.06.002_b32 article-title: Multiple time scales and multiform dynamics in learning to juggle publication-title: Motor Control doi: 10.1123/mcj.8.2.188 – year: 1998 ident: 10.1016/j.neunet.2020.06.002_b57 – volume: 10 start-page: 174 issue: 1 year: 1965 ident: 10.1016/j.neunet.2020.06.002_b4 article-title: Optimal control of Markov processes with incomplete state information publication-title: Journal of Mathematical Analysis and Applications doi: 10.1016/0022-247X(65)90154-X – ident: 10.1016/j.neunet.2020.06.002_b35 – volume: 550 start-page: 354 issue: 7676 year: 2017 ident: 10.1016/j.neunet.2020.06.002_b53 article-title: Mastering the game of go without human knowledge publication-title: Nature doi: 10.1038/nature24270 – start-page: 10424 year: 2018 ident: 10.1016/j.neunet.2020.06.002_b46 article-title: Learning abstract options – ident: 10.1016/j.neunet.2020.06.002_b3 – start-page: 520 year: 2016 ident: 10.1016/j.neunet.2020.06.002_b72 article-title: Learning deep neural network policies with continuous memory states – volume: 7 start-page: 15759 issue: 1 year: 2017 ident: 10.1016/j.neunet.2020.06.002_b41 article-title: The role of forelimb motor cortex areas in goal directed action in mice publication-title: Scientific Reports doi: 10.1038/s41598-017-15835-2 – volume: 548 start-page: 92 issue: 7665 year: 2017 ident: 10.1016/j.neunet.2020.06.002_b47 article-title: Distinct timescales of population coding across cortex publication-title: Nature doi: 10.1038/nature23020 – volume: 4 issue: 6 year: 2006 ident: 10.1016/j.neunet.2020.06.002_b55 article-title: Interacting adaptive processes with different timescales underlie short-term motor learning publication-title: PLoS Biology doi: 10.1371/journal.pbio.0040179 – start-page: 679 year: 1957 ident: 10.1016/j.neunet.2020.06.002_b10 article-title: A Markovian decision process publication-title: Journal of Mathematics and Mechanics – ident: 10.1016/j.neunet.2020.06.002_b39 – volume: 1 start-page: 55 issue: 1 year: 1997 ident: 10.1016/j.neunet.2020.06.002_b26 article-title: On bias, variance, 0/1–loss, and the curse-of-dimensionality publication-title: Data Mining and Knowledge Discovery doi: 10.1023/A:1009778005914 – volume: 11 issue: 12 year: 2015 ident: 10.1016/j.neunet.2020.06.002_b29 article-title: Where’s the noise? Key features of spontaneous activity and neural variability arise through learning in a deterministic network publication-title: PLoS Computational Biology doi: 10.1371/journal.pcbi.1004640 – volume: 108 start-page: 57 issue: 1 year: 2001 ident: 10.1016/j.neunet.2020.06.002_b43 article-title: Time scales in motor learning and development publication-title: Psychological Review doi: 10.1037/0033-295X.108.1.57 – ident: 10.1016/j.neunet.2020.06.002_b25 – start-page: 1 year: 2019 ident: 10.1016/j.neunet.2020.06.002_b2 article-title: A novel predictive-coding-inspired variational RNN model for online prediction and recognition publication-title: Neural Computation – volume: 74 start-page: 30 issue: 1 year: 2012 ident: 10.1016/j.neunet.2020.06.002_b9 article-title: Not noisy, just wrong: the role of suboptimal inference in behavioral variability publication-title: Neuron doi: 10.1016/j.neuron.2012.03.016 – ident: 10.1016/j.neunet.2020.06.002_b21 – ident: 10.1016/j.neunet.2020.06.002_b67 – start-page: 2450 year: 2018 ident: 10.1016/j.neunet.2020.06.002_b27 article-title: Recurrent world models facilitate policy evolution – volume: 341 start-page: 387 issue: 6144 year: 2013 ident: 10.1016/j.neunet.2020.06.002_b45 article-title: Creating a false memory in the hippocampus publication-title: Science doi: 10.1126/science.1239073 – volume: 14 start-page: 6924 issue: 11 year: 1994 ident: 10.1016/j.neunet.2020.06.002_b66 article-title: Identification of a forebrain motor programming network for the learned song of zebra finches publication-title: Journal of Neuroscience doi: 10.1523/JNEUROSCI.14-11-06924.1994 – ident: 10.1016/j.neunet.2020.06.002_b15 – ident: 10.1016/j.neunet.2020.06.002_b11 – volume: 21 start-page: 860 issue: 6 year: 2018 ident: 10.1016/j.neunet.2020.06.002_b68 article-title: Prefrontal cortex as a meta-reinforcement learning system publication-title: Nature Neuroscience doi: 10.1038/s41593-018-0147-8 – volume: 17 start-page: 1661 issue: 12 year: 2014 ident: 10.1016/j.neunet.2020.06.002_b42 article-title: A hierarchy of intrinsic timescales across primate cortex publication-title: Nature Neuroscience doi: 10.1038/nn.3862 – start-page: 9516 year: 2018 ident: 10.1016/j.neunet.2020.06.002_b20 article-title: Probabilistic model-agnostic meta-learning – ident: 10.1016/j.neunet.2020.06.002_b19 – year: 2015 ident: 10.1016/j.neunet.2020.06.002_b37 – start-page: 500 year: 1991 ident: 10.1016/j.neunet.2020.06.002_b50 article-title: Reinforcement learning in Markovian and non-Markovian environments – volume: 364 start-page: 859 issue: 6443 year: 2019 ident: 10.1016/j.neunet.2020.06.002_b33 article-title: Human-level performance in 3D multiplayer games with population-based reinforcement learning publication-title: Science doi: 10.1126/science.aau6249 – start-page: 7332 year: 2018 ident: 10.1016/j.neunet.2020.06.002_b71 article-title: Bayesian model-agnostic meta-learning – year: 2015 ident: 10.1016/j.neunet.2020.06.002_b31 – start-page: 05 year: 2013 ident: 10.1016/j.neunet.2020.06.002_b54 article-title: Lifelong machine learning systems: Beyond learning algorithms. – volume: 13 start-page: 334 issue: 1 year: 1993 ident: 10.1016/j.neunet.2020.06.002_b56 article-title: The highly irregular firing of cortical cells is inconsistent with temporal integration of random EPSPs publication-title: Journal of Neuroscience doi: 10.1523/JNEUROSCI.13-01-00334.1993 – volume: 112 start-page: 181 issue: 1–2 year: 1999 ident: 10.1016/j.neunet.2020.06.002_b58 article-title: Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning publication-title: Artificial Intelligence doi: 10.1016/S0004-3702(99)00052-1 – ident: 10.1016/j.neunet.2020.06.002_b28 – volume: 60 start-page: 1142 issue: 6 year: 2008 ident: 10.1016/j.neunet.2020.06.002_b8 article-title: Probabilistic population codes for Bayesian decision making publication-title: Neuron doi: 10.1016/j.neuron.2008.09.021 – ident: 10.1016/j.neunet.2020.06.002_b1 – year: 2019 ident: 10.1016/j.neunet.2020.06.002_b34 – year: 2016 ident: 10.1016/j.neunet.2020.06.002_b48 – ident: 10.1016/j.neunet.2020.06.002_b14 – year: 2013 ident: 10.1016/j.neunet.2020.06.002_b36 – start-page: 593 year: 2016 ident: 10.1016/j.neunet.2020.06.002_b59 article-title: Prediction of immediate and future rewards differentially recruits cortico-basal ganglia loops – volume: 66 start-page: 315 issue: 2 year: 2010 ident: 10.1016/j.neunet.2020.06.002_b7 article-title: Frontal cortex and the discovery of abstract action rules publication-title: Neuron doi: 10.1016/j.neuron.2010.03.025 – start-page: 3 year: 1998 ident: 10.1016/j.neunet.2020.06.002_b62 article-title: Learning to learn: Introduction and overview – start-page: 697 year: 2007 ident: 10.1016/j.neunet.2020.06.002_b69 article-title: Solving deep memory POMDPs with recurrent policy gradients – year: 1994 ident: 10.1016/j.neunet.2020.06.002_b61 – start-page: 1588 year: 2016 ident: 10.1016/j.neunet.2020.06.002_b38 article-title: Adaptive skills adaptive partitions – volume: 36 start-page: 823 issue: 5 year: 1930 ident: 10.1016/j.neunet.2020.06.002_b63 article-title: On the theory of the brownian motion publication-title: Physical Review doi: 10.1103/PhysRev.36.823 – start-page: 1 year: 2015 ident: 10.1016/j.neunet.2020.06.002_b51 article-title: Reinforcement learning with internal-dynamics-based exploration using a chaotic neural network – start-page: 970 year: 2008 ident: 10.1016/j.neunet.2020.06.002_b64 article-title: Contextual behaviors and internal representations acquired by reinforcement learning with a recurrent neural network in a continuous state and action space task – volume: 88 start-page: 419 issue: 2 year: 2015 ident: 10.1016/j.neunet.2020.06.002_b12 article-title: A large-scale circuit mechanism for hierarchical dynamical processing in the primate cortex publication-title: Neuron doi: 10.1016/j.neuron.2015.09.008 – volume: 92 start-page: 530 issue: 2 year: 2016 ident: 10.1016/j.neunet.2020.06.002_b44 article-title: Neural variability and sampling-based probabilistic representations in the visual cortex publication-title: Neuron doi: 10.1016/j.neuron.2016.09.038 – volume: 12 start-page: 219 issue: 1 year: 2000 ident: 10.1016/j.neunet.2020.06.002_b17 article-title: Reinforcement learning in continuous time and space publication-title: Neural Computation doi: 10.1162/089976600300015961 – volume: 22 start-page: 527 issue: 3 year: 2011 ident: 10.1016/j.neunet.2020.06.002_b6 article-title: Mechanisms of hierarchical reinforcement learning in cortico–striatal circuits 2: Evidence from fmri publication-title: Cerebral Cortex doi: 10.1093/cercor/bhr117 – volume: 4 issue: 11 year: 2008 ident: 10.1016/j.neunet.2020.06.002_b70 article-title: Emergence of functional hierarchy in a multiple timescale neural network model: a humanoid robot experiment publication-title: PLoS Computational Biology doi: 10.1371/journal.pcbi.1000220 – volume: 13 start-page: 227 year: 2000 ident: 10.1016/j.neunet.2020.06.002_b16 article-title: Hierarchical reinforcement learning with the maxQ value function decomposition publication-title: Journal of Artificial Intelligence Research doi: 10.1613/jair.639 – start-page: 2199 year: 2016 ident: 10.1016/j.neunet.2020.06.002_b24 article-title: Sequential neural models with stochastic layers – start-page: 3540 year: 2017 ident: 10.1016/j.neunet.2020.06.002_b65 article-title: Feudal networks for hierarchical reinforcement learning – year: 2017 ident: 10.1016/j.neunet.2020.06.002_b23 – start-page: 201014457 year: 2011 ident: 10.1016/j.neunet.2020.06.002_b18 article-title: Dopamine neurons learn to encode the long-term value of multiple future rewards publication-title: Proceedings of the National Academy of Sciences
SSID	ssj0006843
Score	2.4059074
Snippet	Recurrent neural networks (RNNs) for reinforcement learning (RL) have shown distinct advantages, e.g., solving memory-dependent tasks and meta-learning....
SourceID	unpaywall proquest crossref elsevier
SourceType	Open Access Repository Aggregation Database Enrichment Source Index Database Publisher
StartPage	149
SubjectTerms	Compositionality Multiple timescale Partially observable Markov decision process Recurrent neural network Reinforcement learning
SummonAdditionalLinks	– databaseName: Unpaywall dbid: UNPAY link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3fS8MwED7GfNAXf4vzFxF8NKNrunR9FHGI4BB0oE_hkqaCjm64FZl_vZemHVMQ9bFNU9Lk0vuSfPcdwBndznrdXsojKxMeyazHdWAMz6S2VIShMC7e-XYgr4fRzWP3sQHndSzMl_P7koeV24IW6bSSC4NSadMpR65Id5zUhJXh4O7iqQSKieAyKBOlufzZPCTPV0fK_fCanzzREtJcLfIJzt9xNFpyOv0NuK2b67kmr-1iptvm45uS41-_ZxPWK_TJLry5bEHD5tuwUWd2YNVE34HpvR1lfLwUp8nGGfMxEMxlz3bTY84wT5njpFfErxLSMz1nb7bUYzXl1iOrElM8M7fnS2XGS0IxJ6VJjck9EX26C8P-1cPlNa_SM3BDTmzGpQ0C1Km0cYdQRKalTVKCU1k3NaGDOYmmgRax7liNMqN1V2Ad-EDhIv9QoNiDZj7O7T4wLcI0jjFOhTBR10pETW6zFyBa1BHKFoh6qJSptMtdCo2RqklqL8r3q3L9qkquXtgCvqg18dodvzwf11agKvzhcYWiUfyl5mltNIqmpztzwdyOi6kigBQm9E_sRC1oL6zpT805-G-FQ1hzV54BdwTN2VthjwkyzfRJNVM-AUlvFeI priority: 102 providerName: Unpaywall
Title	Self-organization of action hierarchy and compositionality by reinforcement learning with recurrent neural networks
URI	https://dx.doi.org/10.1016/j.neunet.2020.06.002 https://www.proquest.com/docview/2412987014 https://doi.org/10.1016/j.neunet.2020.06.002
UnpaywallVersion	publishedVersion
Volume	129
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
journalDatabaseRights	– providerCode: PRVESC databaseName: Baden-Württemberg Complete Freedom Collection (Elsevier) customDbUrl: eissn: 1879-2782 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0006843 issn: 0893-6080 databaseCode: GBLVA dateStart: 20110101 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier – providerCode: PRVESC databaseName: Elsevier SD Complete Freedom Collection [SCCMFC] customDbUrl: eissn: 1879-2782 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0006843 issn: 0893-6080 databaseCode: ACRLP dateStart: 19950101 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier – providerCode: PRVESC databaseName: Elsevier SD Freedom Collection customDbUrl: eissn: 1879-2782 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0006843 issn: 0893-6080 databaseCode: .~1 dateStart: 19950101 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier – providerCode: PRVESC databaseName: Elsevier SD Freedom Collection Journals [SCFCJ] customDbUrl: eissn: 1879-2782 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0006843 issn: 0893-6080 databaseCode: AIKHN dateStart: 19950101 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier – providerCode: PRVLSH databaseName: Elsevier Journals customDbUrl: mediaType: online eissn: 1879-2782 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0006843 issn: 0893-6080 databaseCode: AKRWK dateStart: 19930101 isFulltext: true providerName: Library Specific Holdings
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3da9swEBele9hetu6LZWuLCntV41iO7DyGspK1NAy6QPckTvJ5dAQlNAkjL_vbd2fJbQaDjj0Z6wOE7nT3s_y7OyE-UnNTDataFWhGqjBNpVzmvWqMQ-qCXHuOd76amsmsuLgZ3uyJsy4WhmmVyfZHm95a69TST7vZX97e9q8zcrWGQ0VJT3PSXI5gLwzT-k5_PdA8TBWZczRY8egufK7leAXcBGRGZZ61WTzT5cpf3NMO_Hy6CUvY_oT5fMcTnR-I5wlCynFc5Uuxh-GVeNGVZ5DptL4Wq2ucN2qxE2wpF42MgQySS2Czjm8lhFoysTyxt1pcLt1W3mGbVNW394cyVZf4Lvnilvp8zOskOR8mLSZENvnqjZidf_p6NlGpxoLy5InWymCWgasNlgOCAo0zOKoJEzXD2ueMVUaOpKVLN0AHpqGPpwwZQYDm8D3QoN-K_bAI-E5Ip_O6LKGstfbFEA2AI99XZQAIrgDTE7rbWutTAnKugzG3HdPsh40CsSwQ2xLu8p5Q97OWMQHHI-PLTmr2D0Wy5CMemXnSCdnSGeMfJxBwsVlZQjn5iAzboOiJ03vp_9Ny3v_3cj6IZ_wW-WyHYn99t8EjAkBrd9xq-LF4Mv58OZnSczb9Mv72G6eTDH0
linkProvider	Elsevier
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LaxsxEB5S55Be-i51nyr0qnq92tWujyE0OE3iSxLITUja2ZJiZBPbFP_7zqy0xoVCSq96wKAZzXyr_WYG4AsNt3VZN7JAPZGFbmvpMu9lqx3SlM2V53zny5me3hTfb8vbAzjpc2GYVpl8f_TpnbdOI6N0mqPl3d3oKqNQqzlVlOw0J8t9BIdFST55AIfHZ-fT2c4h6zqS52i95A19Bl1H8wq4CcikyjzrCnmm95W_RKg9BHq0CUu7_WXn871gdPoMniQUKY6joM_hAMMLeNp3aBDpwr6E1RXOW7nYy7cUi1bEXAbBXbDZzLfChkYwtzwRuDpoLtxW3GNXV9V3T4giNZj4IfjtluZ8LO0kuCQmCRMioXz1Cm5Ov12fTGVqsyA9BaO11Jhl1jUaqzGhgdZpnDQEi9qy8TnDlYkjhanKjdFZ3dL3U4YMIqziDD6rrHoNg7AI-AaEU3lTVbZqlPJFidpaR-GvzqxF6wqrh6D6ozU-1SDnVhhz05PNfpqoEMMKMR3nLh-C3O1axhocD6yveq2ZP2zJUJh4YOfnXsmGrhn_O7EBF5uVIaCTT8i3jYshfN1p_5_Eefvf4nyCo-n15YW5OJudv4PHPBPpbe9hsL7f4AfCQ2v3Mdn7b0qPDXg
linkToUnpaywall	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3fS8MwED7GfNAXf4vzFxF8NKNrunR9FHGI4BB0oE_hkqaCjm64FZl_vZemHVMQ9bFNU9Lk0vuSfPcdwBndznrdXsojKxMeyazHdWAMz6S2VIShMC7e-XYgr4fRzWP3sQHndSzMl_P7koeV24IW6bSSC4NSadMpR65Id5zUhJXh4O7iqQSKieAyKBOlufzZPCTPV0fK_fCanzzREtJcLfIJzt9xNFpyOv0NuK2b67kmr-1iptvm45uS41-_ZxPWK_TJLry5bEHD5tuwUWd2YNVE34HpvR1lfLwUp8nGGfMxEMxlz3bTY84wT5njpFfErxLSMz1nb7bUYzXl1iOrElM8M7fnS2XGS0IxJ6VJjck9EX26C8P-1cPlNa_SM3BDTmzGpQ0C1Km0cYdQRKalTVKCU1k3NaGDOYmmgRax7liNMqN1V2Ad-EDhIv9QoNiDZj7O7T4wLcI0jjFOhTBR10pETW6zFyBa1BHKFoh6qJSptMtdCo2RqklqL8r3q3L9qkquXtgCvqg18dodvzwf11agKvzhcYWiUfyl5mltNIqmpztzwdyOi6kigBQm9E_sRC1oL6zpT805-G-FQ1hzV54BdwTN2VthjwkyzfRJNVM-AUlvFeI
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Self-organization+of+action+hierarchy+and+compositionality+by+reinforcement+learning+with+recurrent+neural+networks&rft.jtitle=Neural+networks&rft.au=Han%2C+Dongqi&rft.au=Doya%2C+Kenji&rft.au=Tani%2C+Jun&rft.date=2020-09-01&rft.pub=Elsevier+Ltd&rft.issn=0893-6080&rft.eissn=1879-2782&rft.volume=129&rft.spage=149&rft.epage=162&rft_id=info:doi/10.1016%2Fj.neunet.2020.06.002&rft.externalDocID=S0893608020302070
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0893-6080&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0893-6080&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0893-6080&client=summon