Algebraic optimization of sequential decision problems

We study the optimization of the expected long-term reward in finite partially observable Markov decision processes over the set of stationary stochastic policies. In the case of deterministic observations, also known as state aggregation, the problem is equivalent to optimizing a linear objective s...

Full description

Saved in:

Bibliographic Details
Published in	Journal of symbolic computation Vol. 121; p. 102241
Main Authors	Dressler, Mareike, Garrote-López, Marina, Montúfar, Guido, Müller, Johannes, Rose, Kemal
Format	Journal Article
Language	English
Published	Elsevier Ltd 01.03.2024
Subjects	Algebraic degree Partially observable Markov decision process Polynomial optimization State aggregation State-action frequencies 90C40 Partially observable Markov decision process Polynomial optimization 62R01 90C23 State-action frequencies Algebraic degree State aggregation
Online Access	Get full text
ISSN	0747-7171
DOI	10.1016/j.jsc.2023.102241

Cover

More Information
Summary:	We study the optimization of the expected long-term reward in finite partially observable Markov decision processes over the set of stationary stochastic policies. In the case of deterministic observations, also known as state aggregation, the problem is equivalent to optimizing a linear objective subject to quadratic constraints. We characterize the feasible set of this problem as the intersection of a product of affine varieties of rank one matrices and a polytope. Based on this description, we obtain bounds on the number of critical points of the optimization problem. Finally, we conduct experiments in which we solve the KKT equations or the Lagrange equations over different boundary components of the feasible set, and we compare the result to the theoretical bounds and to other constrained optimization methods.
ISSN:	0747-7171
DOI:	10.1016/j.jsc.2023.102241