Unified reinforcement Q-learning for mean field game and control problems

We present a Reinforcement Learning (RL) algorithm to solve infinite horizon asymptotic Mean Field Game (MFG) and Mean Field Control (MFC) problems. Our approach can be described as a unified two-timescale Mean Field Q-learning: The same algorithm can learn either the MFG or the MFC solution by simp...

Full description

Saved in:

Bibliographic Details
Published in	Mathematics of control, signals, and systems Vol. 34; no. 2; pp. 217 - 271
Main Authors	Angiuli, Andrea, Fouque, Jean-Pierre, Laurière, Mathieu
Format	Journal Article
Language	English
Published	London Springer London 01.06.2022 Springer Nature B.V
Subjects	Algorithms Asymptotic properties Communications Engineering Control Machine learning Machine Learning for Control Systems and Optimal Control Mathematics Mathematics and Statistics Mechatronics Networks Original Article Robotics Systems Theory Linear-quadratic control 60J20 Q-learning 90C40 93E35 Mean field game Mean field control Timescales
Online Access	Get full text
ISSN	0932-4194 1435-568X
DOI	10.1007/s00498-021-00310-1

Cover

More Information
Summary:	We present a Reinforcement Learning (RL) algorithm to solve infinite horizon asymptotic Mean Field Game (MFG) and Mean Field Control (MFC) problems. Our approach can be described as a unified two-timescale Mean Field Q-learning: The same algorithm can learn either the MFG or the MFC solution by simply tuning the ratio of two learning parameters. The algorithm is in discrete time and space where the agent not only provides an action to the environment but also a distribution of the state in order to take into account the mean field feature of the problem. Importantly, we assume that the agent cannot observe the population’s distribution and needs to estimate it in a model-free manner. The asymptotic MFG and MFC problems are also presented in continuous time and space, and compared with classical (non-asymptotic or stationary) MFG and MFC problems. They lead to explicit solutions in the linear-quadratic (LQ) case that are used as benchmarks for the results of our algorithm.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0932-4194 1435-568X
DOI:	10.1007/s00498-021-00310-1