A distributional code for value in dopamine-based reinforcement learning
Since its introduction, the reward prediction error theory of dopamine has explained a wealth of empirical phenomena, providing a unifying framework for understanding the representation of reward and value in the brain 1 – 3 . According to the now canonical theory, reward predictions are represented...
Saved in:
| Published in | Nature (London) Vol. 577; no. 7792; pp. 671 - 675 |
|---|---|
| Main Authors | , , , , , , |
| Format | Journal Article |
| Language | English |
| Published |
London
Nature Publishing Group UK
30.01.2020
Nature Publishing Group |
| Subjects | |
| Online Access | Get full text |
| ISSN | 0028-0836 1476-4687 1476-4687 |
| DOI | 10.1038/s41586-019-1924-6 |
Cover
| Summary: | Since its introduction, the reward prediction error theory of dopamine has explained a wealth of empirical phenomena, providing a unifying framework for understanding the representation of reward and value in the brain
1
–
3
. According to the now canonical theory, reward predictions are represented as a single scalar quantity, which supports learning about the expectation, or mean, of stochastic outcomes. Here we propose an account of dopamine-based reinforcement learning inspired by recent artificial intelligence research on distributional reinforcement learning
4
–
6
. We hypothesized that the brain represents possible future rewards not as a single mean, but instead as a probability distribution, effectively representing multiple future outcomes simultaneously and in parallel. This idea implies a set of empirical predictions, which we tested using single-unit recordings from mouse ventral tegmental area. Our findings provide strong evidence for a neural realization of distributional reinforcement learning.
Analyses of single-cell recordings from mouse ventral tegmental area are consistent with a model of reinforcement learning in which the brain represents possible future rewards not as a single mean of stochastic outcomes, as in the canonical model, but instead as a probability distribution. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 Equal contributions W.D. conceived the project. W.D., Z.K., and M.B. contributed ideas for experiments and analysis. W.D. and Z.K. performed simulation experiments and analysis. N.U. and C.S provided neuronal data for analysis. W.D., Z.K., and M.B. managed the project. M.B., N.U., R.M., and D.H advised on the project. M.B., W.D., and Z.K wrote the paper. W.D., Z.K., M.B., N.U., C.S., D.H., R.M. provided revisions to the paper. Author Contributions |
| ISSN: | 0028-0836 1476-4687 1476-4687 |
| DOI: | 10.1038/s41586-019-1924-6 |