Risk-Sensitive Reinforcement Learning

We derive a family of risk-sensitive reinforcement learning methods for agents, who face sequential decision-making tasks in uncertain environments. By applying a utility function to the temporal difference (TD) error, nonlinear transformations are effectively applied not only to the received reward...

Full description

Saved in:
Bibliographic Details
Published inNeural computation Vol. 26; no. 7; pp. 1298 - 1328
Main Authors Shen, Yun, Tobia, Michael J., Sommer, Tobias, Obermayer, Klaus
Format Journal Article
LanguageEnglish
Published One Rogers Street, Cambridge, MA 02142-1209, USA MIT Press 01.07.2014
MIT Press Journals, The
Subjects
Online AccessGet full text
ISSN0899-7667
1530-888X
1530-888X
DOI10.1162/NECO_a_00600

Cover

Abstract We derive a family of risk-sensitive reinforcement learning methods for agents, who face sequential decision-making tasks in uncertain environments. By applying a utility function to the temporal difference (TD) error, nonlinear transformations are effectively applied not only to the received rewards but also to the true transition probabilities of the underlying Markov decision process. When appropriate utility functions are chosen, the agents’ behaviors express key features of human behavior as predicted by prospect theory (Kahneman & Tversky, ), for example, different risk preferences for gains and losses, as well as the shape of subjective probability curves. We derive a risk-sensitive Q-learning algorithm, which is necessary for modeling human behavior when transition probabilities are unknown, and prove its convergence. As a proof of principle for the applicability of the new framework, we apply it to quantify human behavior in a sequential investment task. We find that the risk-sensitive variant provides a significantly better fit to the behavioral data and that it leads to an interpretation of the subject's responses that is indeed consistent with prospect theory. The analysis of simultaneously measured fMRI signals shows a significant correlation of the risk-sensitive TD error with BOLD signal change in the ventral striatum. In addition we find a significant correlation of the risk-sensitive Q-values with neural activity in the striatum, cingulate cortex, and insula that is not present if standard Q-values are used.
AbstractList We derive a family of risk-sensitive reinforcement learning methods for agents, who face sequential decision-making tasks in uncertain environments. By applying a utility function to the temporal difference (TD) error, nonlinear transformations are effectively applied not only to the received rewards but also to the true transition probabilities of the underlying Markov decision process. When appropriate utility functions are chosen, the agents' behaviors express key features of human behavior as predicted by prospect theory (Kahneman & Tversky, 1979 ), for example, different risk preferences for gains and losses, as well as the shape of subjective probability curves. We derive a risk-sensitive Q-learning algorithm, which is necessary for modeling human behavior when transition probabilities are unknown, and prove its convergence. As a proof of principle for the applicability of the new framework, we apply it to quantify human behavior in a sequential investment task. We find that the risk-sensitive variant provides a significantly better fit to the behavioral data and that it leads to an interpretation of the subject's responses that is indeed consistent with prospect theory. The analysis of simultaneously measured fMRI signals shows a significant correlation of the risk-sensitive TD error with BOLD signal change in the ventral striatum. In addition we find a significant correlation of the risk-sensitive Q-values with neural activity in the striatum, cingulate cortex, and insula that is not present if standard Q-values are used.
We derive a family of risk-sensitive reinforcement learning methods for agents, who face sequential decision-making tasks in uncertain environments. By applying a utility function to the temporal difference (TD) error, nonlinear transformations are effectively applied not only to the received rewards but also to the true transition probabilities of the underlying Markov decision process. When appropriate utility functions are chosen, the agents’ behaviors express key features of human behavior as predicted by prospect theory (Kahneman & Tversky, ), for example, different risk preferences for gains and losses, as well as the shape of subjective probability curves. We derive a risk-sensitive Q-learning algorithm, which is necessary for modeling human behavior when transition probabilities are unknown, and prove its convergence. As a proof of principle for the applicability of the new framework, we apply it to quantify human behavior in a sequential investment task. We find that the risk-sensitive variant provides a significantly better fit to the behavioral data and that it leads to an interpretation of the subject's responses that is indeed consistent with prospect theory. The analysis of simultaneously measured fMRI signals shows a significant correlation of the risk-sensitive TD error with BOLD signal change in the ventral striatum. In addition we find a significant correlation of the risk-sensitive Q-values with neural activity in the striatum, cingulate cortex, and insula that is not present if standard Q-values are used.
We derive a family of risk-sensitive reinforcement learning methods for agents, who face sequential decision-making tasks in uncertain environments. By applying a utility function to the temporal difference (TD) error, nonlinear transformations are effectively applied not only to the received rewards but also to the true transition probabilities of the underlying Markov decision process. When appropriate utility functions are chosen, the agents' behaviors express key features of human behavior as predicted by prospect theory (Kahneman & Tversky, 1979 ), for example, different risk preferences for gains and losses, as well as the shape of subjective probability curves. We derive a risk-sensitive Q-learning algorithm, which is necessary for modeling human behavior when transition probabilities are unknown, and prove its convergence. As a proof of principle for the applicability of the new framework, we apply it to quantify human behavior in a sequential investment task. We find that the risk-sensitive variant provides a significantly better fit to the behavioral data and that it leads to an interpretation of the subject's responses that is indeed consistent with prospect theory. The analysis of simultaneously measured fMRI signals shows a significant correlation of the risk-sensitive TD error with BOLD signal change in the ventral striatum. In addition we find a significant correlation of the risk-sensitive Q-values with neural activity in the striatum, cingulate cortex, and insula that is not present if standard Q-values are used.We derive a family of risk-sensitive reinforcement learning methods for agents, who face sequential decision-making tasks in uncertain environments. By applying a utility function to the temporal difference (TD) error, nonlinear transformations are effectively applied not only to the received rewards but also to the true transition probabilities of the underlying Markov decision process. When appropriate utility functions are chosen, the agents' behaviors express key features of human behavior as predicted by prospect theory (Kahneman & Tversky, 1979 ), for example, different risk preferences for gains and losses, as well as the shape of subjective probability curves. We derive a risk-sensitive Q-learning algorithm, which is necessary for modeling human behavior when transition probabilities are unknown, and prove its convergence. As a proof of principle for the applicability of the new framework, we apply it to quantify human behavior in a sequential investment task. We find that the risk-sensitive variant provides a significantly better fit to the behavioral data and that it leads to an interpretation of the subject's responses that is indeed consistent with prospect theory. The analysis of simultaneously measured fMRI signals shows a significant correlation of the risk-sensitive TD error with BOLD signal change in the ventral striatum. In addition we find a significant correlation of the risk-sensitive Q-values with neural activity in the striatum, cingulate cortex, and insula that is not present if standard Q-values are used.
Author Shen, Yun
Tobia, Michael J.
Sommer, Tobias
Obermayer, Klaus
Author_xml – sequence: 1
  givenname: Yun
  surname: Shen
  fullname: Shen, Yun
  email: yun@ni.tu-berlin.de
  organization: Technical University, 10587 Berlin, Germany yun@ni.tu-berlin.de
– sequence: 2
  givenname: Michael J.
  surname: Tobia
  fullname: Tobia, Michael J.
  email: m.tobia@uke.de
  organization: University Medical Center Hamburg-Eppendorf, 20246 Hamburg, Germany m.tobia@uke.de
– sequence: 3
  givenname: Tobias
  surname: Sommer
  fullname: Sommer, Tobias
  email: t.sommer@uke.de
  organization: University Medical Center Hamburg-Eppendorf, 20246 Hamburg, Germany t.sommer@uke.de
– sequence: 4
  givenname: Klaus
  surname: Obermayer
  fullname: Obermayer, Klaus
  organization: Technical University, 10587 Berlin, Germany, and Bernstein Center for Computational Neuroscience Berlin, 10115 Berlin, Germany oby@ni.tu-berlin.de
BackLink https://www.ncbi.nlm.nih.gov/pubmed/24708369$$D View this record in MEDLINE/PubMed
BookMark eNqF0UtrGzEUBWBRUmrH6a7rECiBLjLplTR67RJMXmBicBLoTsiaO0WpR-NI40D76zupnWBCQlda6DsHXd1dshPbiIR8oXBMqWTfr8_GU-ssgAT4QIZUcCi01j92yBC0MYWSUg3Ibs730BsK4hMZsFKB5tIMyeEs5F_FDcYcuvCIBzMMsW6TxwZjdzBBl2KIP_fIx9otMn7enCNyd352O74sJtOLq_HppPACdFeUKLwsoaaKaTpXtCqdmfOKCu9ZXfsSSl0ZQBCl0xJqXSnNnABaIYfSO8NH5Nu6d5nahxXmzjYhe1wsXMR2lS2V_ZCCSmD_p4JTI5TQqqdfX9H7dpViP8iTElxLbWiv9jdqNW-wsssUGpd-2-e_6gFbA5_anBPW1ofOdaGNXXJhYSnYp4XY7YX0oaNXoefed_hmqiZsv_JtevIGjejbRyaDshyYYNwyYLRPWzD2T1j-u3-p-Aslvao6
CODEN NEUCEB
CitedBy_id crossref_primary_10_1016_j_sysconle_2021_105009
crossref_primary_10_1109_ACCESS_2024_3486549
crossref_primary_10_1111_risa_14104
crossref_primary_10_1109_LRA_2021_3070252
crossref_primary_10_1016_j_artint_2024_104096
crossref_primary_10_1007_s10462_023_10468_6
crossref_primary_10_1016_j_isatra_2021_06_010
crossref_primary_10_1007_s10458_022_09596_0
crossref_primary_10_2139_ssrn_3971071
crossref_primary_10_3390_a16070325
crossref_primary_10_1080_14697688_2023_2244531
crossref_primary_10_1109_TAC_2020_2989702
crossref_primary_10_2139_ssrn_4149461
crossref_primary_10_1109_LWC_2024_3430516
crossref_primary_10_2139_ssrn_4613523
crossref_primary_10_1061__ASCE_CP_1943_5487_0000991
crossref_primary_10_1109_TAC_2019_2926674
crossref_primary_10_1111_mafi_12382
crossref_primary_10_1137_22M1527209
crossref_primary_10_1016_j_artint_2022_103743
crossref_primary_10_1177_0278364918772017
crossref_primary_10_3389_fpsyg_2015_01342
crossref_primary_10_1080_21642583_2018_1528483
crossref_primary_10_1126_sciadv_ade7972
crossref_primary_10_1162_NECO_a_00887
crossref_primary_10_1007_s40747_024_01621_x
crossref_primary_10_1007_s00521_023_09300_7
crossref_primary_10_1145_3603148
crossref_primary_10_1146_annurev_control_053018_023634
crossref_primary_10_3390_biomimetics7040193
crossref_primary_10_1109_TNNLS_2017_2654539
crossref_primary_10_1002_mde_3002
crossref_primary_10_1111_mafi_12388
crossref_primary_10_1109_LCSYS_2022_3185404
crossref_primary_10_1109_TNNLS_2021_3106818
Cites_doi 10.1137/120899005
10.1093/cercor/bhn098
10.1002/9780470316887
10.1073/pnas.0900102106
10.1111/1467-9965.00068
10.2307/1914185
10.1016/j.neuroimage.2011.06.087
10.1371/journal.pcbi.1000857
10.1017/CBO9780511840203
10.1126/science.275.5306.1593
10.1523/JNEUROSCI.4286-07.2008
10.1007/s007800200072
10.1523/JNEUROSCI.5498-10.2012
10.1515/9783110212075
10.1007/BF00122574
10.1016/S0167-6911(96)00051-5
10.1016/j.conb.2004.10.016
10.1016/B978-1-55860-335-6.50021-0
10.1287/moor.27.2.294.324
10.1007/s10107-010-0393-3
10.1287/mnsc.18.7.356
10.1016/S0896-6273(02)00967-4
10.1007/s00186-009-0285-6
10.1016/j.neuroimage.2013.11.051
10.1023/A:1017940631555
ContentType Journal Article
Copyright Copyright MIT Press Journals Jul 2014
Copyright_xml – notice: Copyright MIT Press Journals Jul 2014
DBID AAYXX
CITATION
CGR
CUY
CVF
ECM
EIF
NPM
7SC
8FD
JQ2
L7M
L~C
L~D
7X8
DOI 10.1162/NECO_a_00600
DatabaseName CrossRef
Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
Computer and Information Systems Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
MEDLINE - Academic
DatabaseTitle CrossRef
MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
Computer and Information Systems Abstracts
Technology Research Database
Computer and Information Systems Abstracts – Academic
Advanced Technologies Database with Aerospace
ProQuest Computer Science Collection
Computer and Information Systems Abstracts Professional
MEDLINE - Academic
DatabaseTitleList MEDLINE

CrossRef
Computer and Information Systems Abstracts
MEDLINE - Academic
Computer and Information Systems Abstracts
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: EIF
  name: MEDLINE
  url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search
  sourceTypes: Index Database
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1530-888X
EndPage 1328
ExternalDocumentID 3333811681
24708369
10_1162_NECO_a_00600
neco_a_00600.pdf
Genre Letter
Feature
Correspondence
GroupedDBID -
0R
123
36B
4.4
4S
6IK
AAJGR
AALMD
AAPBV
ABDBF
ABDNZ
ABFLS
ABIVO
ABPTK
ACGFO
ADIYS
AEGXH
AEILP
AENEX
AFHIN
AIAGR
ALMA_UNASSIGNED_HOLDINGS
ARCSS
AVWKF
AZFZN
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CAG
CS3
DC
DU5
EAP
EAS
EBC
EBD
EBS
ECS
EDO
EJD
EMB
EMK
EPL
EPS
EST
ESX
F5P
FEDTE
FNEHJ
HVGLF
HZ
I-F
IPLJI
JAVBF
MCG
MKJ
O9-
OCL
P2P
PK0
PQEST
PQQKQ
RMI
SV3
TUS
WG8
WH7
X
XJE
ZWS
---
-~X
.4S
.DC
0R~
41~
53G
AAFWJ
AAYXX
ABAZT
ABEFU
ABJNI
ABVLG
ACUHS
ACYGS
ADMLS
AMVHM
CITATION
COF
EMOBN
HZ~
H~9
MINIK
CGR
CUY
CVF
ECM
EIF
NPM
7SC
8FD
JQ2
L7M
L~C
L~D
7X8
ID FETCH-LOGICAL-c508t-4e5c640f17281b71d4a9b3d15cc2ffc4048d90e054a860f8d782a501de304ca93
ISSN 0899-7667
1530-888X
IngestDate Thu Sep 04 17:20:57 EDT 2025
Thu Sep 04 16:05:49 EDT 2025
Sat Aug 16 22:51:27 EDT 2025
Mon Jul 21 06:01:39 EDT 2025
Thu Apr 24 23:08:52 EDT 2025
Wed Oct 01 02:03:08 EDT 2025
Sun Jul 17 10:31:12 EDT 2022
Tue Mar 01 17:17:39 EST 2022
IsPeerReviewed true
IsScholarly true
Issue 7
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c508t-4e5c640f17281b71d4a9b3d15cc2ffc4048d90e054a860f8d782a501de304ca93
Notes July, 2014
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 14
content type line 23
ObjectType-Correspondence-1
ObjectType-Article-1
ObjectType-Feature-2
PMID 24708369
PQID 1535386891
PQPubID 37252
PageCount 31
ParticipantIDs proquest_miscellaneous_1531957587
proquest_journals_1535386891
crossref_primary_10_1162_NECO_a_00600
mit_journals_necov26i7_302523_2021_11_09_zip_neco_a_00600
mit_journals_10_1162_NECO_a_00600
crossref_citationtrail_10_1162_NECO_a_00600
pubmed_primary_24708369
proquest_miscellaneous_1660051602
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2014-07-01
PublicationDateYYYYMMDD 2014-07-01
PublicationDate_xml – month: 07
  year: 2014
  text: 2014-07-01
  day: 01
PublicationDecade 2010
PublicationPlace One Rogers Street, Cambridge, MA 02142-1209, USA
PublicationPlace_xml – name: One Rogers Street, Cambridge, MA 02142-1209, USA
– name: United States
– name: Cambridge
PublicationTitle Neural computation
PublicationTitleAlternate Neural Comput
PublicationYear 2014
Publisher MIT Press
MIT Press Journals, The
Publisher_xml – name: MIT Press
– name: MIT Press Journals, The
References B20
B21
B22
B23
B25
B26
B27
B29
Gollier C. (B12) 2004
Ghosh J. (B8) 2006
Bertsekas D. (B2) 1996
B30
B31
B10
B32
B13
B14
B15
B16
B17
Sutton R. (B28) 1998
B18
B19
B1
B3
B5
B6
Glimcher P. (B11) 2008
B7
B9
Savage L. (B24) 1972
References_xml – ident: B27
  doi: 10.1137/120899005
– ident: B10
  doi: 10.1093/cercor/bhn098
– volume-title: Neuroeconomics: Decision making and the brain
  year: 2008
  ident: B11
– ident: B22
  doi: 10.1002/9780470316887
– ident: B32
  doi: 10.1073/pnas.0900102106
– volume-title: The foundations of statistics
  year: 1972
  ident: B24
– volume-title: Neuro-dynamic programming
  year: 1996
  ident: B2
– ident: B1
  doi: 10.1111/1467-9965.00068
– ident: B16
  doi: 10.2307/1914185
– ident: B29
  doi: 10.1016/j.neuroimage.2011.06.087
– volume-title: Reinforcement learning
  year: 1998
  ident: B28
– ident: B18
  doi: 10.1371/journal.pcbi.1000857
– ident: B9
  doi: 10.1017/CBO9780511840203
– volume-title: An introduction to Bayesian analysis
  year: 2006
  ident: B8
– ident: B26
  doi: 10.1126/science.275.5306.1593
– ident: B21
  doi: 10.1523/JNEUROSCI.4286-07.2008
– ident: B6
  doi: 10.1007/s007800200072
– ident: B19
  doi: 10.1523/JNEUROSCI.5498-10.2012
– ident: B7
  doi: 10.1515/9783110212075
– ident: B31
  doi: 10.1007/BF00122574
– ident: B14
  doi: 10.1016/S0167-6911(96)00051-5
– ident: B20
  doi: 10.1016/j.conb.2004.10.016
– ident: B13
  doi: 10.1016/B978-1-55860-335-6.50021-0
– ident: B3
  doi: 10.1287/moor.27.2.294.324
– ident: B23
  doi: 10.1007/s10107-010-0393-3
– ident: B15
  doi: 10.1287/mnsc.18.7.356
– ident: B25
  doi: 10.1016/S0896-6273(02)00967-4
– ident: B5
  doi: 10.1007/s00186-009-0285-6
– ident: B30
  doi: 10.1016/j.neuroimage.2013.11.051
– volume-title: The economics of risk and time
  year: 2004
  ident: B12
– ident: B17
  doi: 10.1023/A:1017940631555
SSID ssj0006105
Score 2.4643967
Snippet We derive a family of risk-sensitive reinforcement learning methods for agents, who face sequential decision-making tasks in uncertain environments. By...
SourceID proquest
pubmed
crossref
mit
SourceType Aggregation Database
Index Database
Enrichment Source
Publisher
StartPage 1298
SubjectTerms Algorithms
Behavior
Brain - physiology
Brain Mapping
Decision making
Decision Making - physiology
Decision making models
Human behavior
Humans
Learning
Letters
Magnetic Resonance Imaging
Markov analysis
Markov Chains
Mathematical analysis
Mathematical models
Models, Psychological
Neuropsychology
Nonlinear Dynamics
Oxygen - blood
Probability
Reinforcement
Reinforcement (Psychology)
Risk
Signal transduction
Tasks
Transition probabilities
Utilities
Utility functions
Title Risk-Sensitive Reinforcement Learning
URI https://direct.mit.edu/neco/article/doi/10.1162/NECO_a_00600
https://www.ncbi.nlm.nih.gov/pubmed/24708369
https://www.proquest.com/docview/1535386891
https://www.proquest.com/docview/1531957587
https://www.proquest.com/docview/1660051602
Volume 26
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVEBS
  databaseName: EBSCOhost Academic Search Ultimate
  customDbUrl: https://search.ebscohost.com/login.aspx?authtype=ip,shib&custid=s3936755&profile=ehost&defaultdb=asn
  eissn: 1530-888X
  dateEnd: 20241105
  omitProxy: true
  ssIdentifier: ssj0006105
  issn: 0899-7667
  databaseCode: ABDBF
  dateStart: 19970101
  isFulltext: true
  titleUrlDefault: https://search.ebscohost.com/direct.asp?db=asn
  providerName: EBSCOhost
– providerCode: PRVEBS
  databaseName: EBSCOhost Mathematics Source - HOST
  customDbUrl:
  eissn: 1530-888X
  dateEnd: 20241105
  omitProxy: false
  ssIdentifier: ssj0006105
  issn: 0899-7667
  databaseCode: AMVHM
  dateStart: 19970101
  isFulltext: true
  titleUrlDefault: https://www.ebsco.com/products/research-databases/mathematics-source
  providerName: EBSCOhost
– providerCode: PRVEBS
  databaseName: Inspec with Full Text
  customDbUrl:
  eissn: 1530-888X
  dateEnd: 20241105
  omitProxy: false
  ssIdentifier: ssj0006105
  issn: 0899-7667
  databaseCode: ADMLS
  dateStart: 19970101
  isFulltext: true
  titleUrlDefault: https://www.ebsco.com/products/research-databases/inspec-full-text
  providerName: EBSCOhost
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1bb9MwFLZY98IL90thoCAxXlDAsR07fpymThVaW2lrUXmyHCeZKraksBSJ_XqOEydNpBUNXqIqOa5bf_bx55wbQu8pTYjOOKy0JJY-S7DxNdHElyLOEmtMjKgNFJ5M-XjBvizDZVPe3UWXlPEnc3NrXMn_oAr3AFcbJfsPyLZfCjfgM-ALV0AYrnfC-Gx1_d0_ty7olQPQWVqlQTXVG78mc-pFl37aVBxVQhBbyqFngz93URrfNlvP3CKuPWmda33HglRcuaIrlUxLy2ex1fO_nZ_Gpe6_VAhY64Da6h4pfcHrShmNoqxD292EEB2tB5wh6uygcMCNbtfO3GZ7nY6OZ0ormwkGb3ehxvI-namTxempmo-W8w_rH76tD2bt6K5Yyh7aJ6C_8QDtH02-jiftrsudu2rzw5sgB04-dzvs0Y-9q1W5-2RRMYz5I_TAHQ28oxrnx-hemj9BD5uyG57Twk_RYR92rwe718D-DC1ORvPjse-qXfgGSHLpszQ0nOHMFgwLYhEkTMuYJkFoDMkyw0DVJhKnQLF1xHEWJcDtdIiDJKWYGS3pczTIizx9iTySBFTLEHOrkzMi4pRnAQtlRKkWIpND9LEZBGVcKnhbkeRSVUdCTlR3yIbosJVe1ylQdsi9g_FUbn1c75CRPZk8NcUvwldCUSDfhCoCnBOaKSzVzWpdPd-2PWiQ6nQSUtiyeSQD6L59DCrS2r10nhabSiaQcCyJxF9kOLf7E8dkiF7Us6D9s4QJm8NdvrpD69fo_nY9HaBB-XOTvgHaWsZv3Yz9A1T-lRM
linkProvider EBSCOhost
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Risk-Sensitive+Reinforcement+Learning&rft.jtitle=Neural+computation&rft.au=Shen%2C+Yun&rft.au=Tobia%2C+Michael+J&rft.au=Sommer%2C+Tobias&rft.au=Obermayer%2C+Klaus&rft.date=2014-07-01&rft.issn=0899-7667&rft.volume=26&rft.issue=7&rft.spage=1298&rft.epage=1328&rft_id=info:doi/10.1162%2FNECO_a_00600&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0899-7667&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0899-7667&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0899-7667&client=summon