A distributional code for value in dopamine-based reinforcement learning

Since its introduction, the reward prediction error theory of dopamine has explained a wealth of empirical phenomena, providing a unifying framework for understanding the representation of reward and value in the brain 1 – 3 . According to the now canonical theory, reward predictions are represented...

Full description

Saved in:

Bibliographic Details
Published in	Nature (London) Vol. 577; no. 7792; pp. 671 - 675
Main Authors	Dabney, Will, Kurth-Nelson, Zeb, Uchida, Naoshige, Starkweather, Clara Kwon, Hassabis, Demis, Munos, Rémi, Botvinick, Matthew
Format	Journal Article
Language	English
Published	London Nature Publishing Group UK 30.01.2020 Nature Publishing Group
Subjects	631/378/116/2396 631/378/1595 631/378/1788 631/378/2649 Animals Artificial Intelligence Brain research Distributed processing (Computers) Dopamine Dopamine - metabolism Dopaminergic Neurons - metabolism GABAergic Neurons - metabolism Humanities and Social Sciences Influence Learning Learning - physiology Methods Mice Models, Neurological multidisciplinary Noise Optimism Pessimism Probability Probability distribution Reinforcement Reinforcement learning (Machine learning) Reinforcement, Psychology Reward Reward (Psychology) Science Science (multidisciplinary) Statistical Distributions Variance analysis Ventral Tegmental Area - cytology Ventral Tegmental Area - physiology Ventral tegmentum United Kingdom
Online Access	Get full text
ISSN	0028-0836 1476-4687 1476-4687
DOI	10.1038/s41586-019-1924-6

Cover

More Information
Summary:	Since its introduction, the reward prediction error theory of dopamine has explained a wealth of empirical phenomena, providing a unifying framework for understanding the representation of reward and value in the brain 1 – 3 . According to the now canonical theory, reward predictions are represented as a single scalar quantity, which supports learning about the expectation, or mean, of stochastic outcomes. Here we propose an account of dopamine-based reinforcement learning inspired by recent artificial intelligence research on distributional reinforcement learning 4 – 6 . We hypothesized that the brain represents possible future rewards not as a single mean, but instead as a probability distribution, effectively representing multiple future outcomes simultaneously and in parallel. This idea implies a set of empirical predictions, which we tested using single-unit recordings from mouse ventral tegmental area. Our findings provide strong evidence for a neural realization of distributional reinforcement learning. Analyses of single-cell recordings from mouse ventral tegmental area are consistent with a model of reinforcement learning in which the brain represents possible future rewards not as a single mean of stochastic outcomes, as in the canonical model, but instead as a probability distribution.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 Equal contributions W.D. conceived the project. W.D., Z.K., and M.B. contributed ideas for experiments and analysis. W.D. and Z.K. performed simulation experiments and analysis. N.U. and C.S provided neuronal data for analysis. W.D., Z.K., and M.B. managed the project. M.B., N.U., R.M., and D.H advised on the project. M.B., W.D., and Z.K wrote the paper. W.D., Z.K., M.B., N.U., C.S., D.H., R.M. provided revisions to the paper. Author Contributions
ISSN:	0028-0836 1476-4687 1476-4687
DOI:	10.1038/s41586-019-1924-6