Algebraic optimization of sequential decision problems

We study the optimization of the expected long-term reward in finite partially observable Markov decision processes over the set of stationary stochastic policies. In the case of deterministic observations, also known as state aggregation, the problem is equivalent to optimizing a linear objective s...

Full description

Saved in:
Bibliographic Details
Published inJournal of symbolic computation Vol. 121; p. 102241
Main Authors Dressler, Mareike, Garrote-López, Marina, Montúfar, Guido, Müller, Johannes, Rose, Kemal
Format Journal Article
LanguageEnglish
Published Elsevier Ltd 01.03.2024
Subjects
Online AccessGet full text
ISSN0747-7171
DOI10.1016/j.jsc.2023.102241

Cover

Abstract We study the optimization of the expected long-term reward in finite partially observable Markov decision processes over the set of stationary stochastic policies. In the case of deterministic observations, also known as state aggregation, the problem is equivalent to optimizing a linear objective subject to quadratic constraints. We characterize the feasible set of this problem as the intersection of a product of affine varieties of rank one matrices and a polytope. Based on this description, we obtain bounds on the number of critical points of the optimization problem. Finally, we conduct experiments in which we solve the KKT equations or the Lagrange equations over different boundary components of the feasible set, and we compare the result to the theoretical bounds and to other constrained optimization methods.
AbstractList We study the optimization of the expected long-term reward in finite partially observable Markov decision processes over the set of stationary stochastic policies. In the case of deterministic observations, also known as state aggregation, the problem is equivalent to optimizing a linear objective subject to quadratic constraints. We characterize the feasible set of this problem as the intersection of a product of affine varieties of rank one matrices and a polytope. Based on this description, we obtain bounds on the number of critical points of the optimization problem. Finally, we conduct experiments in which we solve the KKT equations or the Lagrange equations over different boundary components of the feasible set, and we compare the result to the theoretical bounds and to other constrained optimization methods.
ArticleNumber 102241
Author Müller, Johannes
Rose, Kemal
Dressler, Mareike
Garrote-López, Marina
Montúfar, Guido
Author_xml – sequence: 1
  givenname: Mareike
  surname: Dressler
  fullname: Dressler, Mareike
  organization: School of Mathematics and Statistics, University of New South Wales, Sydney, 2052, NSW, Australia
– sequence: 2
  givenname: Marina
  orcidid: 0000-0002-0673-9450
  surname: Garrote-López
  fullname: Garrote-López, Marina
  email: marina.garrote@mis.mpg.de
  organization: Max Planck Institute for Mathematics in the Sciences, Leipzig, 04103, SN, Germany
– sequence: 3
  givenname: Guido
  surname: Montúfar
  fullname: Montúfar, Guido
  organization: Departments of Mathematics and Statistics, University of California, Los Angeles, 90095, CA, USA
– sequence: 4
  givenname: Johannes
  surname: Müller
  fullname: Müller, Johannes
  organization: Max Planck Institute for Mathematics in the Sciences, Leipzig, 04103, SN, Germany
– sequence: 5
  givenname: Kemal
  surname: Rose
  fullname: Rose, Kemal
  organization: Max Planck Institute for Mathematics in the Sciences, Leipzig, 04103, SN, Germany
BookMark eNp9j8tKxTAQhrM4gufiA7jrC7RO0luKq8PBGxxw41mHXCaS0jY1qYI-vS117WyGGfh-_m9HNoMfkJBbChkFWt21WRt1xoDl881YQTdkC3VRpzWt6TXZxdgCQFPk5ZZUx-4dVZBOJ36cXO9-5OT8kHibRPz4xGFysksMaheX9xi86rCPB3JlZRfx5m_vyeXx4e30nJ5fn15Ox3OqWdFMqbQGDUcGUJXYGGRYat0g54YzZUpoSsmpBV7NIyUapjA3ympacVSFyfM9oWuuDj7GgFaMwfUyfAsKYpEVrZhlxSIrVtmZuV8ZnIt9OQwiaoeDRuMC6kkY7_6hfwGp-2ID
Cites_doi 10.1057/palgrave.jors.2600425
10.1137/070685051
10.1007/s10107-004-0559-y
10.1137/080716670
10.1137/141000671
10.1137/S1052623400366802
10.1287/inte.18.5.55
10.1007/s10107-012-0589-9
10.1126/science.153.3731.34
10.1007/s10107-013-0680-x
10.1145/2382559.2382563
10.1080/10556780802699201
ContentType Journal Article
Copyright 2023 Elsevier Ltd
Copyright_xml – notice: 2023 Elsevier Ltd
DBID AAYXX
CITATION
DOI 10.1016/j.jsc.2023.102241
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
ExternalDocumentID 10_1016_j_jsc_2023_102241
S074771712300055X
GroupedDBID --K
--M
-~X
.DC
.~1
0R~
0SF
1B1
1RT
1~.
1~5
29L
4.4
457
4G.
5GY
5VS
6I.
6OB
7-5
71M
8P~
9JN
AACTN
AAEDT
AAEDW
AAFTH
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AAQXK
AAXKI
AAXUO
AAYFN
ABAOU
ABBOA
ABEFU
ABFNM
ABJNI
ABMAC
ABVKL
ABXDB
ACDAQ
ACGFS
ACNNM
ACRLP
ACZNC
ADBBV
ADEZE
ADFGL
ADMUD
ADVLN
AEBSH
AEKER
AENEX
AEXQZ
AFJKZ
AFKWA
AFTJW
AGHFR
AGUBO
AGYEJ
AHHHB
AIALX
AIEXJ
AIGVJ
AIKHN
AITUG
AJOXV
AKRWK
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
AOUOD
ARUGR
ASPBG
AVWKF
AXJTR
AZFZN
BKOJK
BLXMC
CAG
COF
CS3
DM4
DU5
EBS
EFBJH
EJD
EO8
EO9
EP2
EP3
FDB
FEDTE
FGOYB
FIRID
FNPLU
FYGXN
G-2
G-Q
GBLVA
GBOLZ
HVGLF
HZ~
IHE
IXB
J1W
KOM
LG5
M25
M41
MHUIS
MO0
N9A
NCXOZ
O-L
O9-
OAUVE
OK1
OZT
P-8
P-9
P2P
PC.
Q38
R2-
RIG
RNS
ROL
RPZ
SDF
SDG
SDP
SES
SEW
SPC
SPCBC
SSV
SSW
SSZ
T5K
TN5
UPT
WUQ
XPP
YQT
ZMT
ZU3
~G-
AATTM
AAYWO
AAYXX
ABWVN
ACLOT
ACRPL
ACVFH
ADCNI
ADNMO
AEIPS
AEUPX
AFPUW
AGQPQ
AIGII
AIIUN
AKBMS
AKYEP
ANKPU
APXCP
CITATION
EFKBS
EFLBG
~HD
ID FETCH-LOGICAL-c249t-afded8e20065e9de2e5cc9e88d82bd5095a81f086666aaed2be3dbfc168eb4d33
IEDL.DBID .~1
ISSN 0747-7171
IngestDate Wed Oct 01 02:18:03 EDT 2025
Tue Dec 03 03:44:40 EST 2024
IsPeerReviewed true
IsScholarly true
Keywords 90C40
Partially observable Markov decision process
Polynomial optimization
62R01
90C23
State-action frequencies
Algebraic degree
State aggregation
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c249t-afded8e20065e9de2e5cc9e88d82bd5095a81f086666aaed2be3dbfc168eb4d33
ORCID 0000-0002-0673-9450
ParticipantIDs crossref_primary_10_1016_j_jsc_2023_102241
elsevier_sciencedirect_doi_10_1016_j_jsc_2023_102241
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate March-April 2024
2024-03-00
PublicationDateYYYYMMDD 2024-03-01
PublicationDate_xml – month: 03
  year: 2024
  text: March-April 2024
PublicationDecade 2020
PublicationTitle Journal of symbolic computation
PublicationYear 2024
Publisher Elsevier Ltd
Publisher_xml – name: Elsevier Ltd
References Müller, Montúfar (br0170) 2022
Nie, Ranestad (br0280) 2009; 20
Müller, Montúfar (br0180) 2022
Lasserre, Henrion, Prieur, Trélat (br0190) 2008; 47
Portakal, Sturmfels (br0220) 2022
Kallenberg (br0110) 1994; 40
Montúfar, Rauh (br0230) 2017
Lasserre (br0400) 2000/01; 11
Puterman (br0260) 2014
Wang, Kumar, Zhou, Hooi, Feng, Mannor (br0130) 2022; vol. 162
Nie (br0380) 2011; 142
Vlassis, Littman, Barber (br0070) 2012; 4
Sutton, McAllester, Singh, Mansour (br0080) 1999; vol. 12
Azizzadenesheli, Yue, Anandkumar (br0090) 2018
Derman (br0100) 1970
White (br0040) 1988; 18
Nie, Tang (br0210) 2021
Chernoff (br0030) 1968
Jones, Kerrigan, Maciejowski (br0270) 2004
Abadie (br0320) 1967
Breiding, Timme, jl (br0240) 2018
Bazaraa, Sherali, Shetty (br0340) 2006
Montúfar, Rauh, Ay (br0290) 2019
Bhandari, Russo (br0060) 2019
Breiding, Rose, Timme (br0250) 2021
Baldi, Mourrain (br0420) 2022
Wächter, Biegler (br0390) 2006; 106
Bellman (br0050) 1966; 153
Bertsekas (br0330) 1997; 48
Howard (br0020) 1960
Nie (br0410) 2014; 146
Poupart, Lang, Toussaint (br0300) 2011
Amato, Bernstein, Zilberstein (br0150) 2006
Dahl, Andersen (br0360) 2021
Neyman (br0200) 2003
Bezanson, Edelman, Karpinski, Shah Julia (br0430) 2017; 59
Kuhn, Tucker (br0310) 1951
Montúfar, Ghazi-Zahedi, Ay (br0160) 2015
Bellman (br0010) 1957; 6
Wu, De Loera (br0140) 2022
Henrion, Lasserre (br0370) 2005
Henrion, Lasserre, Löfberg (br0350) 2009; 24
Dadashi, Taiga, Le Roux, Schuurmans, Bellemare (br0120) 2019
Wu (10.1016/j.jsc.2023.102241_br0140) 2022
Lasserre (10.1016/j.jsc.2023.102241_br0190) 2008; 47
Bellman (10.1016/j.jsc.2023.102241_br0050) 1966; 153
Chernoff (10.1016/j.jsc.2023.102241_br0030) 1968
Müller (10.1016/j.jsc.2023.102241_br0170) 2022
Müller (10.1016/j.jsc.2023.102241_br0180) 2022
White (10.1016/j.jsc.2023.102241_br0040) 1988; 18
Wang (10.1016/j.jsc.2023.102241_br0130) 2022; vol. 162
Howard (10.1016/j.jsc.2023.102241_br0020) 1960
Baldi (10.1016/j.jsc.2023.102241_br0420)
Derman (10.1016/j.jsc.2023.102241_br0100) 1970
Bazaraa (10.1016/j.jsc.2023.102241_br0340) 2006
Henrion (10.1016/j.jsc.2023.102241_br0350) 2009; 24
Dadashi (10.1016/j.jsc.2023.102241_br0120) 2019
Montúfar (10.1016/j.jsc.2023.102241_br0160)
Bellman (10.1016/j.jsc.2023.102241_br0010) 1957; 6
Bezanson (10.1016/j.jsc.2023.102241_br0430) 2017; 59
Jones (10.1016/j.jsc.2023.102241_br0270) 2004
Nie (10.1016/j.jsc.2023.102241_br0280) 2009; 20
Sutton (10.1016/j.jsc.2023.102241_br0080) 1999; vol. 12
Nie (10.1016/j.jsc.2023.102241_br0410) 2014; 146
Portakal (10.1016/j.jsc.2023.102241_br0220)
Breiding (10.1016/j.jsc.2023.102241_br0250)
Breiding (10.1016/j.jsc.2023.102241_br0240) 2018
Kallenberg (10.1016/j.jsc.2023.102241_br0110) 1994; 40
Bhandari (10.1016/j.jsc.2023.102241_br0060)
Montúfar (10.1016/j.jsc.2023.102241_br0290) 2019
Neyman (10.1016/j.jsc.2023.102241_br0200) 2003
Nie (10.1016/j.jsc.2023.102241_br0210) 2021
Azizzadenesheli (10.1016/j.jsc.2023.102241_br0090)
Vlassis (10.1016/j.jsc.2023.102241_br0070) 2012; 4
Montúfar (10.1016/j.jsc.2023.102241_br0230) 2017
Amato (10.1016/j.jsc.2023.102241_br0150) 2006
Bertsekas (10.1016/j.jsc.2023.102241_br0330) 1997; 48
Henrion (10.1016/j.jsc.2023.102241_br0370) 2005
Poupart (10.1016/j.jsc.2023.102241_br0300) 2011
Puterman (10.1016/j.jsc.2023.102241_br0260) 2014
Kuhn (10.1016/j.jsc.2023.102241_br0310) 1951
Nie (10.1016/j.jsc.2023.102241_br0380) 2011; 142
Abadie (10.1016/j.jsc.2023.102241_br0320) 1967
Dahl (10.1016/j.jsc.2023.102241_br0360) 2021
Lasserre (10.1016/j.jsc.2023.102241_br0400) 2000; 11
Wächter (10.1016/j.jsc.2023.102241_br0390) 2006; 106
References_xml – volume: 47
  start-page: 1643
  year: 2008
  end-page: 1666
  ident: br0190
  article-title: Nonlinear optimal control via occupation measures and lmi-relaxations
  publication-title: SIAM J. Control Optim.
– volume: 40
  start-page: 1
  year: 1994
  end-page: 42
  ident: br0110
  article-title: Survey of linear programming for standard and nonstandard Markovian control problems. Part I: theory
  publication-title: Z. Oper.-Res.
– year: 2022
  ident: br0180
  article-title: Solving infinite-horizon POMDPs with memoryless stochastic policies in state-action space
  publication-title: 5th Multi-Disciplinary Conference on Reinforcement Learning and Decision Making
– start-page: 57
  year: 2003
  end-page: 75
  ident: br0200
  article-title: Real algebraic tools in stochastic games
  publication-title: Stochastic Games and Applications
– start-page: 19
  year: 1967
  end-page: 36
  ident: br0320
  article-title: On the Kuhn-Tucker theorem
  publication-title: Nonlinear Programming (NATO Summer School)
– year: 2021
  ident: br0360
  article-title: A primal-dual interior-point algorithm for nonsymmetric exponential-cone optimization
  publication-title: Math. Program.
– volume: 146
  start-page: 97
  year: 2014
  end-page: 121
  ident: br0410
  article-title: Optimality conditions and finite convergence of Lasserre's hierarchy
  publication-title: Math. Program.
– year: 1960
  ident: br0020
  article-title: Dynamic Programming and Markov Processes
– start-page: 481
  year: 1951
  end-page: 492
  ident: br0310
  article-title: Nonlinear programming
  publication-title: Second Berkeley Symposium on Mathematical Statistics and Probability
– volume: 4
  start-page: 1
  year: 2012
  end-page: 8
  ident: br0070
  article-title: On the computational complexity of stochastic controller optimization in POMDPs
  publication-title: ACM Trans. Comput. Theory
– year: 2015
  ident: br0160
  article-title: Geometry and determinism of optimal stationary control in partially observable Markov decision processes
– start-page: 458
  year: 2018
  end-page: 465
  ident: br0240
  article-title: A package for homotopy continuation in Julia
  publication-title: International Congress on Mathematical Software
– year: 1970
  ident: br0100
  article-title: Finite state Markovian decision processes
– volume: 11
  start-page: 796
  year: 2000/01
  end-page: 817
  ident: br0400
  article-title: Global optimization with polynomials and the problem of moments
  publication-title: SIAM J. Optim.
– year: 2014
  ident: br0260
  article-title: Markov Decision Processes: Discrete Stochastic Dynamic Programming
– start-page: 293
  year: 2005
  end-page: 310
  ident: br0370
  article-title: Detecting Global Optimality and Extracting Solutions in GloptiPoly
– start-page: 282
  year: 2017
  end-page: 290
  ident: br0230
  article-title: Geometry of policy improvement
  publication-title: International Conference on Geometric Science of Information
– volume: 153
  start-page: 34
  year: 1966
  end-page: 37
  ident: br0050
  article-title: Dynamic programming
  publication-title: Science
– year: 2022
  ident: br0220
  article-title: Geometry of dependency equilibria
– year: 2004
  ident: br0270
  article-title: Equality set projection: A new algorithm for the projection of polytopes in halfspace representation
– start-page: 221
  year: 1968
  end-page: 252
  ident: br0030
  article-title: Optimal stochastic control
  publication-title: Sankhyā: Indian J. Stat., Ser. A
– year: 2022
  ident: br0170
  article-title: The geometry of memoryless stochastic policy optimization in infinite-horizon POMDPs
  publication-title: International Conference on Learning Representations
– volume: 20
  start-page: 485
  year: 2009
  end-page: 502
  ident: br0280
  article-title: Algebraic degree of polynomial optimization
  publication-title: SIAM J. Optim.
– volume: vol. 162
  start-page: 22727
  year: 2022
  end-page: 22751
  ident: br0130
  article-title: The geometry of robust value functions
  publication-title: Proceedings of the 39th International Conference on Machine Learning
– volume: 24
  start-page: 761
  year: 2009
  end-page: 779
  ident: br0350
  article-title: Gloptipoly 3: moments, optimization and semidefinite programming
  publication-title: Optim. Methods Softw.
– volume: vol. 12
  year: 1999
  ident: br0080
  article-title: Policy Gradient Methods for Reinforcement Learning with Function Approximation
  publication-title: Advances in Neural Information Processing Systems
– year: 2019
  ident: br0060
  article-title: Global optimality guarantees for policy gradient methods
– year: 2021
  ident: br0250
  article-title: Certifying zeros of polynomial systems using interval arithmetic
– start-page: 341
  year: 2006
  end-page: 343
  ident: br0150
  article-title: Solving POMDPs using quadratically constrained linear programs
  publication-title: Proceedings of the Fifth International Joint Conference on Autonomous Agents and Multiagent Systems
– start-page: 613
  year: 2011
  end-page: 628
  ident: br0300
  article-title: Analyzing and escaping local optima in planning as inference for partially observable domains
  publication-title: Joint European Conference on Machine Learning and Knowledge Discovery in Databases
– year: 2006
  ident: br0340
  article-title: Nonlinear Programming
– start-page: 1
  year: 2021
  end-page: 34
  ident: br0210
  article-title: Convex generalized Nash equilibrium problems and polynomial optimization
  publication-title: Math. Program.
– volume: 59
  start-page: 65
  year: 2017
  end-page: 98
  ident: br0430
  article-title: A fresh approach to numerical computing
  publication-title: SIAM Rev.
– volume: 142
  start-page: 485
  year: 2011
  end-page: 510
  ident: br0380
  article-title: Certifying convergence of Lasserre's hierarchy via flat truncation
  publication-title: Math. Program.
– start-page: 1486
  year: 2019
  end-page: 1495
  ident: br0120
  article-title: The value function polytope in reinforcement learning
  publication-title: International Conference on Machine Learning, PMLR
– volume: 18
  start-page: 55
  year: 1988
  end-page: 61
  ident: br0040
  article-title: Further real applications of Markov decision processes
  publication-title: Interfaces
– year: 2022
  ident: br0420
  article-title: Exact moment representation in polynomial optimization
– volume: 106
  start-page: 25
  year: 2006
  end-page: 57
  ident: br0390
  article-title: On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming
  publication-title: Math. Program.
– volume: 48
  start-page: 334
  year: 1997
  ident: br0330
  article-title: Nonlinear programming
  publication-title: J. Oper. Res. Soc.
– year: 2019
  ident: br0290
  article-title: Task-agnostic constraining in average reward POMDPs
  publication-title: ICLR 2019 Workshop on Task-Agnostic Reinforcement Learning
– year: 2018
  ident: br0090
  article-title: Policy gradient in partially observable environments: approximation and convergence
– start-page: 2070
  year: 2022
  end-page: 2078
  ident: br0140
  article-title: Geometric policy iteration for Markov decision processes
  publication-title: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD'22
– volume: 6
  start-page: 679
  year: 1957
  end-page: 684
  ident: br0010
  article-title: A Markovian decision process
  publication-title: J. Math. Mech.
– volume: 40
  start-page: 1
  year: 1994
  ident: 10.1016/j.jsc.2023.102241_br0110
  article-title: Survey of linear programming for standard and nonstandard Markovian control problems. Part I: theory
  publication-title: Z. Oper.-Res.
– year: 2021
  ident: 10.1016/j.jsc.2023.102241_br0360
  article-title: A primal-dual interior-point algorithm for nonsymmetric exponential-cone optimization
  publication-title: Math. Program.
– year: 2006
  ident: 10.1016/j.jsc.2023.102241_br0340
– ident: 10.1016/j.jsc.2023.102241_br0160
– start-page: 481
  year: 1951
  ident: 10.1016/j.jsc.2023.102241_br0310
  article-title: Nonlinear programming
– start-page: 19
  year: 1967
  ident: 10.1016/j.jsc.2023.102241_br0320
  article-title: On the Kuhn-Tucker theorem
– volume: 48
  start-page: 334
  year: 1997
  ident: 10.1016/j.jsc.2023.102241_br0330
  article-title: Nonlinear programming
  publication-title: J. Oper. Res. Soc.
  doi: 10.1057/palgrave.jors.2600425
– year: 1970
  ident: 10.1016/j.jsc.2023.102241_br0100
– year: 2019
  ident: 10.1016/j.jsc.2023.102241_br0290
  article-title: Task-agnostic constraining in average reward POMDPs
– volume: 47
  start-page: 1643
  year: 2008
  ident: 10.1016/j.jsc.2023.102241_br0190
  article-title: Nonlinear optimal control via occupation measures and lmi-relaxations
  publication-title: SIAM J. Control Optim.
  doi: 10.1137/070685051
– start-page: 613
  year: 2011
  ident: 10.1016/j.jsc.2023.102241_br0300
  article-title: Analyzing and escaping local optima in planning as inference for partially observable domains
– volume: 106
  start-page: 25
  year: 2006
  ident: 10.1016/j.jsc.2023.102241_br0390
  article-title: On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming
  publication-title: Math. Program.
  doi: 10.1007/s10107-004-0559-y
– year: 2004
  ident: 10.1016/j.jsc.2023.102241_br0270
– start-page: 57
  year: 2003
  ident: 10.1016/j.jsc.2023.102241_br0200
  article-title: Real algebraic tools in stochastic games
– start-page: 341
  year: 2006
  ident: 10.1016/j.jsc.2023.102241_br0150
  article-title: Solving POMDPs using quadratically constrained linear programs
– volume: 20
  start-page: 485
  year: 2009
  ident: 10.1016/j.jsc.2023.102241_br0280
  article-title: Algebraic degree of polynomial optimization
  publication-title: SIAM J. Optim.
  doi: 10.1137/080716670
– volume: 59
  start-page: 65
  year: 2017
  ident: 10.1016/j.jsc.2023.102241_br0430
  article-title: A fresh approach to numerical computing
  publication-title: SIAM Rev.
  doi: 10.1137/141000671
– volume: 6
  start-page: 679
  year: 1957
  ident: 10.1016/j.jsc.2023.102241_br0010
  article-title: A Markovian decision process
  publication-title: J. Math. Mech.
– year: 2014
  ident: 10.1016/j.jsc.2023.102241_br0260
– ident: 10.1016/j.jsc.2023.102241_br0250
– start-page: 282
  year: 2017
  ident: 10.1016/j.jsc.2023.102241_br0230
  article-title: Geometry of policy improvement
– volume: 11
  start-page: 796
  year: 2000
  ident: 10.1016/j.jsc.2023.102241_br0400
  article-title: Global optimization with polynomials and the problem of moments
  publication-title: SIAM J. Optim.
  doi: 10.1137/S1052623400366802
– ident: 10.1016/j.jsc.2023.102241_br0090
– year: 2022
  ident: 10.1016/j.jsc.2023.102241_br0180
  article-title: Solving infinite-horizon POMDPs with memoryless stochastic policies in state-action space
– start-page: 1486
  year: 2019
  ident: 10.1016/j.jsc.2023.102241_br0120
  article-title: The value function polytope in reinforcement learning
– volume: 18
  start-page: 55
  year: 1988
  ident: 10.1016/j.jsc.2023.102241_br0040
  article-title: Further real applications of Markov decision processes
  publication-title: Interfaces
  doi: 10.1287/inte.18.5.55
– ident: 10.1016/j.jsc.2023.102241_br0420
– volume: 142
  start-page: 485
  year: 2011
  ident: 10.1016/j.jsc.2023.102241_br0380
  article-title: Certifying convergence of Lasserre's hierarchy via flat truncation
  publication-title: Math. Program.
  doi: 10.1007/s10107-012-0589-9
– start-page: 2070
  year: 2022
  ident: 10.1016/j.jsc.2023.102241_br0140
  article-title: Geometric policy iteration for Markov decision processes
– volume: 153
  start-page: 34
  year: 1966
  ident: 10.1016/j.jsc.2023.102241_br0050
  article-title: Dynamic programming
  publication-title: Science
  doi: 10.1126/science.153.3731.34
– start-page: 1
  year: 2021
  ident: 10.1016/j.jsc.2023.102241_br0210
  article-title: Convex generalized Nash equilibrium problems and polynomial optimization
  publication-title: Math. Program.
– volume: 146
  start-page: 97
  year: 2014
  ident: 10.1016/j.jsc.2023.102241_br0410
  article-title: Optimality conditions and finite convergence of Lasserre's hierarchy
  publication-title: Math. Program.
  doi: 10.1007/s10107-013-0680-x
– volume: 4
  start-page: 1
  year: 2012
  ident: 10.1016/j.jsc.2023.102241_br0070
  article-title: On the computational complexity of stochastic controller optimization in POMDPs
  publication-title: ACM Trans. Comput. Theory
  doi: 10.1145/2382559.2382563
– volume: vol. 162
  start-page: 22727
  year: 2022
  ident: 10.1016/j.jsc.2023.102241_br0130
  article-title: The geometry of robust value functions
– start-page: 458
  year: 2018
  ident: 10.1016/j.jsc.2023.102241_br0240
  article-title: A package for homotopy continuation in Julia
– volume: vol. 12
  year: 1999
  ident: 10.1016/j.jsc.2023.102241_br0080
  article-title: Policy Gradient Methods for Reinforcement Learning with Function Approximation
– ident: 10.1016/j.jsc.2023.102241_br0060
– start-page: 221
  year: 1968
  ident: 10.1016/j.jsc.2023.102241_br0030
  article-title: Optimal stochastic control
  publication-title: Sankhyā: Indian J. Stat., Ser. A
– start-page: 293
  year: 2005
  ident: 10.1016/j.jsc.2023.102241_br0370
– year: 2022
  ident: 10.1016/j.jsc.2023.102241_br0170
  article-title: The geometry of memoryless stochastic policy optimization in infinite-horizon POMDPs
– ident: 10.1016/j.jsc.2023.102241_br0220
– volume: 24
  start-page: 761
  year: 2009
  ident: 10.1016/j.jsc.2023.102241_br0350
  article-title: Gloptipoly 3: moments, optimization and semidefinite programming
  publication-title: Optim. Methods Softw.
  doi: 10.1080/10556780802699201
– year: 1960
  ident: 10.1016/j.jsc.2023.102241_br0020
SSID ssj0009435
Score 2.354461
Snippet We study the optimization of the expected long-term reward in finite partially observable Markov decision processes over the set of stationary stochastic...
SourceID crossref
elsevier
SourceType Index Database
Publisher
StartPage 102241
SubjectTerms Algebraic degree
Partially observable Markov decision process
Polynomial optimization
State aggregation
State-action frequencies
Title Algebraic optimization of sequential decision problems
URI https://dx.doi.org/10.1016/j.jsc.2023.102241
Volume 121
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVESC
  databaseName: Baden-Württemberg Complete Freedom Collection (Elsevier)
  issn: 0747-7171
  databaseCode: GBLVA
  dateStart: 20110101
  customDbUrl:
  isFulltext: true
  dateEnd: 99991231
  titleUrlDefault: https://www.sciencedirect.com
  omitProxy: true
  ssIdentifier: ssj0009435
  providerName: Elsevier
– providerCode: PRVESC
  databaseName: Elsevier ScienceDirect Freedom Collection Journals
  issn: 0747-7171
  databaseCode: ACRLP
  dateStart: 20211101
  customDbUrl:
  isFulltext: true
  dateEnd: 99991231
  titleUrlDefault: https://www.sciencedirect.com
  omitProxy: true
  ssIdentifier: ssj0009435
  providerName: Elsevier
– providerCode: PRVESC
  databaseName: Elsevier ScienceDirect Journals
  issn: 0747-7171
  databaseCode: AIKHN
  dateStart: 20210701
  customDbUrl:
  isFulltext: true
  dateEnd: 99991231
  titleUrlDefault: https://www.sciencedirect.com
  omitProxy: true
  ssIdentifier: ssj0009435
  providerName: Elsevier
– providerCode: PRVESC
  databaseName: ScienceDirect Freedom Collection 2013
  issn: 0747-7171
  databaseCode: .~1
  dateStart: 19950101
  customDbUrl:
  isFulltext: true
  dateEnd: 99991231
  titleUrlDefault: https://www.sciencedirect.com
  omitProxy: true
  ssIdentifier: ssj0009435
  providerName: Elsevier
– providerCode: PRVLSH
  databaseName: Elsevier Journals
  issn: 0747-7171
  databaseCode: AKRWK
  dateStart: 19850301
  customDbUrl:
  isFulltext: true
  mediaType: online
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0009435
  providerName: Library Specific Holdings
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LS8NAEB6KXrz4Fuuj7MGTkDbZPLo9lmKpFntQi72FZGdWWmpbbL36251NNqigF3MJCbsQvmFe2W9mAK6kDmLiSNzzM-IEhR2gZ9vfem1jkty2_zaFpO9HyWAc3U3iSQ16VS2MpVU621_a9MJauzcth2ZrNZ22Hm3rd05G2PTawCOe2Ar2KLG0vubHF82jE5VDNnmxZ1dXJ5sFx2u2tl0MZdgsGqsFv_umb_6mvw-7LlAU3fJbDqBGi0PYq4YwCKeTR5B05y_28HeqxZLV_9XVVYqlESVNmlV4LtCN0hFugMz6GMb9m6fewHPDEDzNGdLGywwSKrJ_AGLqIEmKte6QUqhkjuz240wFhhMUvrKMUOYUYm50kCjKIwzDE9haLBd0CoJ8ROkrjMOozfGaVIw1-r7WyGIyKOtwXcGQrsqeF2lFBpuljFlqMUtLzOoQVUClPwSXsk3-e9vZ_7adww4_RSUJ7AK2Nm_vdMlRwSZvFGJvwHb3djgY2fvw4Xn4CRl5uDI
linkProvider Elsevier
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3JTsMwEB2V9gAXdkRZfeCEFJo4ceoeq4oqpcuFVurNSrygVt1Ey_8zThwBElzIMYml6I39ZiYevwF4oDJgGiNxz081JijoAD0rf-s1jYkzK_9tcksPR3EyiV6mbFqBTnkWxpZVOu4vOD1na3en4dBsbGazxquVfsdkBKnXBh5suge1iCEnV6HW7vWT0Zf2blT02cT3PTug3NzMy7zmWytkSMOnXFst-N09fXM53WM4dLEiaRefcwIVvTqFo7IPA3HL8gzi9uLN7v_OJFkjAyzd0UqyNqSolMZVvCDKddMhrofM9hwm3edxJ_FcPwRPYpK081KjtOLa_gRguqU01UzKluZccZop9Pws5YHBHAWvNNWKZjpUmZFBzHUWqTC8gOpqvdKXQLSvFPW5YmHUxJCNcoRb-b6UCi1lFK3DYwmD2BSyF6KsB5sLxExYzESBWR2iEijxw3YCafnvYVf_G3YP-8l4OBCD3qh_DQf4JCpqwm6gunv_0LcYJOyyOzcJPgGwJLk6
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Algebraic+optimization+of+sequential+decision+problems&rft.jtitle=Journal+of+symbolic+computation&rft.au=Dressler%2C+Mareike&rft.au=Garrote-L%C3%B3pez%2C+Marina&rft.au=Mont%C3%BAfar%2C+Guido&rft.au=M%C3%BCller%2C+Johannes&rft.date=2024-03-01&rft.pub=Elsevier+Ltd&rft.issn=0747-7171&rft.volume=121&rft_id=info:doi/10.1016%2Fj.jsc.2023.102241&rft.externalDocID=S074771712300055X
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0747-7171&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0747-7171&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0747-7171&client=summon