Algebraic optimization of sequential decision problems
We study the optimization of the expected long-term reward in finite partially observable Markov decision processes over the set of stationary stochastic policies. In the case of deterministic observations, also known as state aggregation, the problem is equivalent to optimizing a linear objective s...
Saved in:
Published in | Journal of symbolic computation Vol. 121; p. 102241 |
---|---|
Main Authors | , , , , |
Format | Journal Article |
Language | English |
Published |
Elsevier Ltd
01.03.2024
|
Subjects | |
Online Access | Get full text |
ISSN | 0747-7171 |
DOI | 10.1016/j.jsc.2023.102241 |
Cover
Abstract | We study the optimization of the expected long-term reward in finite partially observable Markov decision processes over the set of stationary stochastic policies. In the case of deterministic observations, also known as state aggregation, the problem is equivalent to optimizing a linear objective subject to quadratic constraints. We characterize the feasible set of this problem as the intersection of a product of affine varieties of rank one matrices and a polytope. Based on this description, we obtain bounds on the number of critical points of the optimization problem. Finally, we conduct experiments in which we solve the KKT equations or the Lagrange equations over different boundary components of the feasible set, and we compare the result to the theoretical bounds and to other constrained optimization methods. |
---|---|
AbstractList | We study the optimization of the expected long-term reward in finite partially observable Markov decision processes over the set of stationary stochastic policies. In the case of deterministic observations, also known as state aggregation, the problem is equivalent to optimizing a linear objective subject to quadratic constraints. We characterize the feasible set of this problem as the intersection of a product of affine varieties of rank one matrices and a polytope. Based on this description, we obtain bounds on the number of critical points of the optimization problem. Finally, we conduct experiments in which we solve the KKT equations or the Lagrange equations over different boundary components of the feasible set, and we compare the result to the theoretical bounds and to other constrained optimization methods. |
ArticleNumber | 102241 |
Author | Müller, Johannes Rose, Kemal Dressler, Mareike Garrote-López, Marina Montúfar, Guido |
Author_xml | – sequence: 1 givenname: Mareike surname: Dressler fullname: Dressler, Mareike organization: School of Mathematics and Statistics, University of New South Wales, Sydney, 2052, NSW, Australia – sequence: 2 givenname: Marina orcidid: 0000-0002-0673-9450 surname: Garrote-López fullname: Garrote-López, Marina email: marina.garrote@mis.mpg.de organization: Max Planck Institute for Mathematics in the Sciences, Leipzig, 04103, SN, Germany – sequence: 3 givenname: Guido surname: Montúfar fullname: Montúfar, Guido organization: Departments of Mathematics and Statistics, University of California, Los Angeles, 90095, CA, USA – sequence: 4 givenname: Johannes surname: Müller fullname: Müller, Johannes organization: Max Planck Institute for Mathematics in the Sciences, Leipzig, 04103, SN, Germany – sequence: 5 givenname: Kemal surname: Rose fullname: Rose, Kemal organization: Max Planck Institute for Mathematics in the Sciences, Leipzig, 04103, SN, Germany |
BookMark | eNp9j8tKxTAQhrM4gufiA7jrC7RO0luKq8PBGxxw41mHXCaS0jY1qYI-vS117WyGGfh-_m9HNoMfkJBbChkFWt21WRt1xoDl881YQTdkC3VRpzWt6TXZxdgCQFPk5ZZUx-4dVZBOJ36cXO9-5OT8kHibRPz4xGFysksMaheX9xi86rCPB3JlZRfx5m_vyeXx4e30nJ5fn15Ox3OqWdFMqbQGDUcGUJXYGGRYat0g54YzZUpoSsmpBV7NIyUapjA3ympacVSFyfM9oWuuDj7GgFaMwfUyfAsKYpEVrZhlxSIrVtmZuV8ZnIt9OQwiaoeDRuMC6kkY7_6hfwGp-2ID |
Cites_doi | 10.1057/palgrave.jors.2600425 10.1137/070685051 10.1007/s10107-004-0559-y 10.1137/080716670 10.1137/141000671 10.1137/S1052623400366802 10.1287/inte.18.5.55 10.1007/s10107-012-0589-9 10.1126/science.153.3731.34 10.1007/s10107-013-0680-x 10.1145/2382559.2382563 10.1080/10556780802699201 |
ContentType | Journal Article |
Copyright | 2023 Elsevier Ltd |
Copyright_xml | – notice: 2023 Elsevier Ltd |
DBID | AAYXX CITATION |
DOI | 10.1016/j.jsc.2023.102241 |
DatabaseName | CrossRef |
DatabaseTitle | CrossRef |
DatabaseTitleList | |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Computer Science |
ExternalDocumentID | 10_1016_j_jsc_2023_102241 S074771712300055X |
GroupedDBID | --K --M -~X .DC .~1 0R~ 0SF 1B1 1RT 1~. 1~5 29L 4.4 457 4G. 5GY 5VS 6I. 6OB 7-5 71M 8P~ 9JN AACTN AAEDT AAEDW AAFTH AAIKJ AAKOC AALRI AAOAW AAQFI AAQXK AAXKI AAXUO AAYFN ABAOU ABBOA ABEFU ABFNM ABJNI ABMAC ABVKL ABXDB ACDAQ ACGFS ACNNM ACRLP ACZNC ADBBV ADEZE ADFGL ADMUD ADVLN AEBSH AEKER AENEX AEXQZ AFJKZ AFKWA AFTJW AGHFR AGUBO AGYEJ AHHHB AIALX AIEXJ AIGVJ AIKHN AITUG AJOXV AKRWK ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ AOUOD ARUGR ASPBG AVWKF AXJTR AZFZN BKOJK BLXMC CAG COF CS3 DM4 DU5 EBS EFBJH EJD EO8 EO9 EP2 EP3 FDB FEDTE FGOYB FIRID FNPLU FYGXN G-2 G-Q GBLVA GBOLZ HVGLF HZ~ IHE IXB J1W KOM LG5 M25 M41 MHUIS MO0 N9A NCXOZ O-L O9- OAUVE OK1 OZT P-8 P-9 P2P PC. Q38 R2- RIG RNS ROL RPZ SDF SDG SDP SES SEW SPC SPCBC SSV SSW SSZ T5K TN5 UPT WUQ XPP YQT ZMT ZU3 ~G- AATTM AAYWO AAYXX ABWVN ACLOT ACRPL ACVFH ADCNI ADNMO AEIPS AEUPX AFPUW AGQPQ AIGII AIIUN AKBMS AKYEP ANKPU APXCP CITATION EFKBS EFLBG ~HD |
ID | FETCH-LOGICAL-c249t-afded8e20065e9de2e5cc9e88d82bd5095a81f086666aaed2be3dbfc168eb4d33 |
IEDL.DBID | .~1 |
ISSN | 0747-7171 |
IngestDate | Wed Oct 01 02:18:03 EDT 2025 Tue Dec 03 03:44:40 EST 2024 |
IsPeerReviewed | true |
IsScholarly | true |
Keywords | 90C40 Partially observable Markov decision process Polynomial optimization 62R01 90C23 State-action frequencies Algebraic degree State aggregation |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c249t-afded8e20065e9de2e5cc9e88d82bd5095a81f086666aaed2be3dbfc168eb4d33 |
ORCID | 0000-0002-0673-9450 |
ParticipantIDs | crossref_primary_10_1016_j_jsc_2023_102241 elsevier_sciencedirect_doi_10_1016_j_jsc_2023_102241 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | March-April 2024 2024-03-00 |
PublicationDateYYYYMMDD | 2024-03-01 |
PublicationDate_xml | – month: 03 year: 2024 text: March-April 2024 |
PublicationDecade | 2020 |
PublicationTitle | Journal of symbolic computation |
PublicationYear | 2024 |
Publisher | Elsevier Ltd |
Publisher_xml | – name: Elsevier Ltd |
References | Müller, Montúfar (br0170) 2022 Nie, Ranestad (br0280) 2009; 20 Müller, Montúfar (br0180) 2022 Lasserre, Henrion, Prieur, Trélat (br0190) 2008; 47 Portakal, Sturmfels (br0220) 2022 Kallenberg (br0110) 1994; 40 Montúfar, Rauh (br0230) 2017 Lasserre (br0400) 2000/01; 11 Puterman (br0260) 2014 Wang, Kumar, Zhou, Hooi, Feng, Mannor (br0130) 2022; vol. 162 Nie (br0380) 2011; 142 Vlassis, Littman, Barber (br0070) 2012; 4 Sutton, McAllester, Singh, Mansour (br0080) 1999; vol. 12 Azizzadenesheli, Yue, Anandkumar (br0090) 2018 Derman (br0100) 1970 White (br0040) 1988; 18 Nie, Tang (br0210) 2021 Chernoff (br0030) 1968 Jones, Kerrigan, Maciejowski (br0270) 2004 Abadie (br0320) 1967 Breiding, Timme, jl (br0240) 2018 Bazaraa, Sherali, Shetty (br0340) 2006 Montúfar, Rauh, Ay (br0290) 2019 Bhandari, Russo (br0060) 2019 Breiding, Rose, Timme (br0250) 2021 Baldi, Mourrain (br0420) 2022 Wächter, Biegler (br0390) 2006; 106 Bellman (br0050) 1966; 153 Bertsekas (br0330) 1997; 48 Howard (br0020) 1960 Nie (br0410) 2014; 146 Poupart, Lang, Toussaint (br0300) 2011 Amato, Bernstein, Zilberstein (br0150) 2006 Dahl, Andersen (br0360) 2021 Neyman (br0200) 2003 Bezanson, Edelman, Karpinski, Shah Julia (br0430) 2017; 59 Kuhn, Tucker (br0310) 1951 Montúfar, Ghazi-Zahedi, Ay (br0160) 2015 Bellman (br0010) 1957; 6 Wu, De Loera (br0140) 2022 Henrion, Lasserre (br0370) 2005 Henrion, Lasserre, Löfberg (br0350) 2009; 24 Dadashi, Taiga, Le Roux, Schuurmans, Bellemare (br0120) 2019 Wu (10.1016/j.jsc.2023.102241_br0140) 2022 Lasserre (10.1016/j.jsc.2023.102241_br0190) 2008; 47 Bellman (10.1016/j.jsc.2023.102241_br0050) 1966; 153 Chernoff (10.1016/j.jsc.2023.102241_br0030) 1968 Müller (10.1016/j.jsc.2023.102241_br0170) 2022 Müller (10.1016/j.jsc.2023.102241_br0180) 2022 White (10.1016/j.jsc.2023.102241_br0040) 1988; 18 Wang (10.1016/j.jsc.2023.102241_br0130) 2022; vol. 162 Howard (10.1016/j.jsc.2023.102241_br0020) 1960 Baldi (10.1016/j.jsc.2023.102241_br0420) Derman (10.1016/j.jsc.2023.102241_br0100) 1970 Bazaraa (10.1016/j.jsc.2023.102241_br0340) 2006 Henrion (10.1016/j.jsc.2023.102241_br0350) 2009; 24 Dadashi (10.1016/j.jsc.2023.102241_br0120) 2019 Montúfar (10.1016/j.jsc.2023.102241_br0160) Bellman (10.1016/j.jsc.2023.102241_br0010) 1957; 6 Bezanson (10.1016/j.jsc.2023.102241_br0430) 2017; 59 Jones (10.1016/j.jsc.2023.102241_br0270) 2004 Nie (10.1016/j.jsc.2023.102241_br0280) 2009; 20 Sutton (10.1016/j.jsc.2023.102241_br0080) 1999; vol. 12 Nie (10.1016/j.jsc.2023.102241_br0410) 2014; 146 Portakal (10.1016/j.jsc.2023.102241_br0220) Breiding (10.1016/j.jsc.2023.102241_br0250) Breiding (10.1016/j.jsc.2023.102241_br0240) 2018 Kallenberg (10.1016/j.jsc.2023.102241_br0110) 1994; 40 Bhandari (10.1016/j.jsc.2023.102241_br0060) Montúfar (10.1016/j.jsc.2023.102241_br0290) 2019 Neyman (10.1016/j.jsc.2023.102241_br0200) 2003 Nie (10.1016/j.jsc.2023.102241_br0210) 2021 Azizzadenesheli (10.1016/j.jsc.2023.102241_br0090) Vlassis (10.1016/j.jsc.2023.102241_br0070) 2012; 4 Montúfar (10.1016/j.jsc.2023.102241_br0230) 2017 Amato (10.1016/j.jsc.2023.102241_br0150) 2006 Bertsekas (10.1016/j.jsc.2023.102241_br0330) 1997; 48 Henrion (10.1016/j.jsc.2023.102241_br0370) 2005 Poupart (10.1016/j.jsc.2023.102241_br0300) 2011 Puterman (10.1016/j.jsc.2023.102241_br0260) 2014 Kuhn (10.1016/j.jsc.2023.102241_br0310) 1951 Nie (10.1016/j.jsc.2023.102241_br0380) 2011; 142 Abadie (10.1016/j.jsc.2023.102241_br0320) 1967 Dahl (10.1016/j.jsc.2023.102241_br0360) 2021 Lasserre (10.1016/j.jsc.2023.102241_br0400) 2000; 11 Wächter (10.1016/j.jsc.2023.102241_br0390) 2006; 106 |
References_xml | – volume: 47 start-page: 1643 year: 2008 end-page: 1666 ident: br0190 article-title: Nonlinear optimal control via occupation measures and lmi-relaxations publication-title: SIAM J. Control Optim. – volume: 40 start-page: 1 year: 1994 end-page: 42 ident: br0110 article-title: Survey of linear programming for standard and nonstandard Markovian control problems. Part I: theory publication-title: Z. Oper.-Res. – year: 2022 ident: br0180 article-title: Solving infinite-horizon POMDPs with memoryless stochastic policies in state-action space publication-title: 5th Multi-Disciplinary Conference on Reinforcement Learning and Decision Making – start-page: 57 year: 2003 end-page: 75 ident: br0200 article-title: Real algebraic tools in stochastic games publication-title: Stochastic Games and Applications – start-page: 19 year: 1967 end-page: 36 ident: br0320 article-title: On the Kuhn-Tucker theorem publication-title: Nonlinear Programming (NATO Summer School) – year: 2021 ident: br0360 article-title: A primal-dual interior-point algorithm for nonsymmetric exponential-cone optimization publication-title: Math. Program. – volume: 146 start-page: 97 year: 2014 end-page: 121 ident: br0410 article-title: Optimality conditions and finite convergence of Lasserre's hierarchy publication-title: Math. Program. – year: 1960 ident: br0020 article-title: Dynamic Programming and Markov Processes – start-page: 481 year: 1951 end-page: 492 ident: br0310 article-title: Nonlinear programming publication-title: Second Berkeley Symposium on Mathematical Statistics and Probability – volume: 4 start-page: 1 year: 2012 end-page: 8 ident: br0070 article-title: On the computational complexity of stochastic controller optimization in POMDPs publication-title: ACM Trans. Comput. Theory – year: 2015 ident: br0160 article-title: Geometry and determinism of optimal stationary control in partially observable Markov decision processes – start-page: 458 year: 2018 end-page: 465 ident: br0240 article-title: A package for homotopy continuation in Julia publication-title: International Congress on Mathematical Software – year: 1970 ident: br0100 article-title: Finite state Markovian decision processes – volume: 11 start-page: 796 year: 2000/01 end-page: 817 ident: br0400 article-title: Global optimization with polynomials and the problem of moments publication-title: SIAM J. Optim. – year: 2014 ident: br0260 article-title: Markov Decision Processes: Discrete Stochastic Dynamic Programming – start-page: 293 year: 2005 end-page: 310 ident: br0370 article-title: Detecting Global Optimality and Extracting Solutions in GloptiPoly – start-page: 282 year: 2017 end-page: 290 ident: br0230 article-title: Geometry of policy improvement publication-title: International Conference on Geometric Science of Information – volume: 153 start-page: 34 year: 1966 end-page: 37 ident: br0050 article-title: Dynamic programming publication-title: Science – year: 2022 ident: br0220 article-title: Geometry of dependency equilibria – year: 2004 ident: br0270 article-title: Equality set projection: A new algorithm for the projection of polytopes in halfspace representation – start-page: 221 year: 1968 end-page: 252 ident: br0030 article-title: Optimal stochastic control publication-title: Sankhyā: Indian J. Stat., Ser. A – year: 2022 ident: br0170 article-title: The geometry of memoryless stochastic policy optimization in infinite-horizon POMDPs publication-title: International Conference on Learning Representations – volume: 20 start-page: 485 year: 2009 end-page: 502 ident: br0280 article-title: Algebraic degree of polynomial optimization publication-title: SIAM J. Optim. – volume: vol. 162 start-page: 22727 year: 2022 end-page: 22751 ident: br0130 article-title: The geometry of robust value functions publication-title: Proceedings of the 39th International Conference on Machine Learning – volume: 24 start-page: 761 year: 2009 end-page: 779 ident: br0350 article-title: Gloptipoly 3: moments, optimization and semidefinite programming publication-title: Optim. Methods Softw. – volume: vol. 12 year: 1999 ident: br0080 article-title: Policy Gradient Methods for Reinforcement Learning with Function Approximation publication-title: Advances in Neural Information Processing Systems – year: 2019 ident: br0060 article-title: Global optimality guarantees for policy gradient methods – year: 2021 ident: br0250 article-title: Certifying zeros of polynomial systems using interval arithmetic – start-page: 341 year: 2006 end-page: 343 ident: br0150 article-title: Solving POMDPs using quadratically constrained linear programs publication-title: Proceedings of the Fifth International Joint Conference on Autonomous Agents and Multiagent Systems – start-page: 613 year: 2011 end-page: 628 ident: br0300 article-title: Analyzing and escaping local optima in planning as inference for partially observable domains publication-title: Joint European Conference on Machine Learning and Knowledge Discovery in Databases – year: 2006 ident: br0340 article-title: Nonlinear Programming – start-page: 1 year: 2021 end-page: 34 ident: br0210 article-title: Convex generalized Nash equilibrium problems and polynomial optimization publication-title: Math. Program. – volume: 59 start-page: 65 year: 2017 end-page: 98 ident: br0430 article-title: A fresh approach to numerical computing publication-title: SIAM Rev. – volume: 142 start-page: 485 year: 2011 end-page: 510 ident: br0380 article-title: Certifying convergence of Lasserre's hierarchy via flat truncation publication-title: Math. Program. – start-page: 1486 year: 2019 end-page: 1495 ident: br0120 article-title: The value function polytope in reinforcement learning publication-title: International Conference on Machine Learning, PMLR – volume: 18 start-page: 55 year: 1988 end-page: 61 ident: br0040 article-title: Further real applications of Markov decision processes publication-title: Interfaces – year: 2022 ident: br0420 article-title: Exact moment representation in polynomial optimization – volume: 106 start-page: 25 year: 2006 end-page: 57 ident: br0390 article-title: On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming publication-title: Math. Program. – volume: 48 start-page: 334 year: 1997 ident: br0330 article-title: Nonlinear programming publication-title: J. Oper. Res. Soc. – year: 2019 ident: br0290 article-title: Task-agnostic constraining in average reward POMDPs publication-title: ICLR 2019 Workshop on Task-Agnostic Reinforcement Learning – year: 2018 ident: br0090 article-title: Policy gradient in partially observable environments: approximation and convergence – start-page: 2070 year: 2022 end-page: 2078 ident: br0140 article-title: Geometric policy iteration for Markov decision processes publication-title: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD'22 – volume: 6 start-page: 679 year: 1957 end-page: 684 ident: br0010 article-title: A Markovian decision process publication-title: J. Math. Mech. – volume: 40 start-page: 1 year: 1994 ident: 10.1016/j.jsc.2023.102241_br0110 article-title: Survey of linear programming for standard and nonstandard Markovian control problems. Part I: theory publication-title: Z. Oper.-Res. – year: 2021 ident: 10.1016/j.jsc.2023.102241_br0360 article-title: A primal-dual interior-point algorithm for nonsymmetric exponential-cone optimization publication-title: Math. Program. – year: 2006 ident: 10.1016/j.jsc.2023.102241_br0340 – ident: 10.1016/j.jsc.2023.102241_br0160 – start-page: 481 year: 1951 ident: 10.1016/j.jsc.2023.102241_br0310 article-title: Nonlinear programming – start-page: 19 year: 1967 ident: 10.1016/j.jsc.2023.102241_br0320 article-title: On the Kuhn-Tucker theorem – volume: 48 start-page: 334 year: 1997 ident: 10.1016/j.jsc.2023.102241_br0330 article-title: Nonlinear programming publication-title: J. Oper. Res. Soc. doi: 10.1057/palgrave.jors.2600425 – year: 1970 ident: 10.1016/j.jsc.2023.102241_br0100 – year: 2019 ident: 10.1016/j.jsc.2023.102241_br0290 article-title: Task-agnostic constraining in average reward POMDPs – volume: 47 start-page: 1643 year: 2008 ident: 10.1016/j.jsc.2023.102241_br0190 article-title: Nonlinear optimal control via occupation measures and lmi-relaxations publication-title: SIAM J. Control Optim. doi: 10.1137/070685051 – start-page: 613 year: 2011 ident: 10.1016/j.jsc.2023.102241_br0300 article-title: Analyzing and escaping local optima in planning as inference for partially observable domains – volume: 106 start-page: 25 year: 2006 ident: 10.1016/j.jsc.2023.102241_br0390 article-title: On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming publication-title: Math. Program. doi: 10.1007/s10107-004-0559-y – year: 2004 ident: 10.1016/j.jsc.2023.102241_br0270 – start-page: 57 year: 2003 ident: 10.1016/j.jsc.2023.102241_br0200 article-title: Real algebraic tools in stochastic games – start-page: 341 year: 2006 ident: 10.1016/j.jsc.2023.102241_br0150 article-title: Solving POMDPs using quadratically constrained linear programs – volume: 20 start-page: 485 year: 2009 ident: 10.1016/j.jsc.2023.102241_br0280 article-title: Algebraic degree of polynomial optimization publication-title: SIAM J. Optim. doi: 10.1137/080716670 – volume: 59 start-page: 65 year: 2017 ident: 10.1016/j.jsc.2023.102241_br0430 article-title: A fresh approach to numerical computing publication-title: SIAM Rev. doi: 10.1137/141000671 – volume: 6 start-page: 679 year: 1957 ident: 10.1016/j.jsc.2023.102241_br0010 article-title: A Markovian decision process publication-title: J. Math. Mech. – year: 2014 ident: 10.1016/j.jsc.2023.102241_br0260 – ident: 10.1016/j.jsc.2023.102241_br0250 – start-page: 282 year: 2017 ident: 10.1016/j.jsc.2023.102241_br0230 article-title: Geometry of policy improvement – volume: 11 start-page: 796 year: 2000 ident: 10.1016/j.jsc.2023.102241_br0400 article-title: Global optimization with polynomials and the problem of moments publication-title: SIAM J. Optim. doi: 10.1137/S1052623400366802 – ident: 10.1016/j.jsc.2023.102241_br0090 – year: 2022 ident: 10.1016/j.jsc.2023.102241_br0180 article-title: Solving infinite-horizon POMDPs with memoryless stochastic policies in state-action space – start-page: 1486 year: 2019 ident: 10.1016/j.jsc.2023.102241_br0120 article-title: The value function polytope in reinforcement learning – volume: 18 start-page: 55 year: 1988 ident: 10.1016/j.jsc.2023.102241_br0040 article-title: Further real applications of Markov decision processes publication-title: Interfaces doi: 10.1287/inte.18.5.55 – ident: 10.1016/j.jsc.2023.102241_br0420 – volume: 142 start-page: 485 year: 2011 ident: 10.1016/j.jsc.2023.102241_br0380 article-title: Certifying convergence of Lasserre's hierarchy via flat truncation publication-title: Math. Program. doi: 10.1007/s10107-012-0589-9 – start-page: 2070 year: 2022 ident: 10.1016/j.jsc.2023.102241_br0140 article-title: Geometric policy iteration for Markov decision processes – volume: 153 start-page: 34 year: 1966 ident: 10.1016/j.jsc.2023.102241_br0050 article-title: Dynamic programming publication-title: Science doi: 10.1126/science.153.3731.34 – start-page: 1 year: 2021 ident: 10.1016/j.jsc.2023.102241_br0210 article-title: Convex generalized Nash equilibrium problems and polynomial optimization publication-title: Math. Program. – volume: 146 start-page: 97 year: 2014 ident: 10.1016/j.jsc.2023.102241_br0410 article-title: Optimality conditions and finite convergence of Lasserre's hierarchy publication-title: Math. Program. doi: 10.1007/s10107-013-0680-x – volume: 4 start-page: 1 year: 2012 ident: 10.1016/j.jsc.2023.102241_br0070 article-title: On the computational complexity of stochastic controller optimization in POMDPs publication-title: ACM Trans. Comput. Theory doi: 10.1145/2382559.2382563 – volume: vol. 162 start-page: 22727 year: 2022 ident: 10.1016/j.jsc.2023.102241_br0130 article-title: The geometry of robust value functions – start-page: 458 year: 2018 ident: 10.1016/j.jsc.2023.102241_br0240 article-title: A package for homotopy continuation in Julia – volume: vol. 12 year: 1999 ident: 10.1016/j.jsc.2023.102241_br0080 article-title: Policy Gradient Methods for Reinforcement Learning with Function Approximation – ident: 10.1016/j.jsc.2023.102241_br0060 – start-page: 221 year: 1968 ident: 10.1016/j.jsc.2023.102241_br0030 article-title: Optimal stochastic control publication-title: Sankhyā: Indian J. Stat., Ser. A – start-page: 293 year: 2005 ident: 10.1016/j.jsc.2023.102241_br0370 – year: 2022 ident: 10.1016/j.jsc.2023.102241_br0170 article-title: The geometry of memoryless stochastic policy optimization in infinite-horizon POMDPs – ident: 10.1016/j.jsc.2023.102241_br0220 – volume: 24 start-page: 761 year: 2009 ident: 10.1016/j.jsc.2023.102241_br0350 article-title: Gloptipoly 3: moments, optimization and semidefinite programming publication-title: Optim. Methods Softw. doi: 10.1080/10556780802699201 – year: 1960 ident: 10.1016/j.jsc.2023.102241_br0020 |
SSID | ssj0009435 |
Score | 2.354461 |
Snippet | We study the optimization of the expected long-term reward in finite partially observable Markov decision processes over the set of stationary stochastic... |
SourceID | crossref elsevier |
SourceType | Index Database Publisher |
StartPage | 102241 |
SubjectTerms | Algebraic degree Partially observable Markov decision process Polynomial optimization State aggregation State-action frequencies |
Title | Algebraic optimization of sequential decision problems |
URI | https://dx.doi.org/10.1016/j.jsc.2023.102241 |
Volume | 121 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
journalDatabaseRights | – providerCode: PRVESC databaseName: Baden-Württemberg Complete Freedom Collection (Elsevier) issn: 0747-7171 databaseCode: GBLVA dateStart: 20110101 customDbUrl: isFulltext: true dateEnd: 99991231 titleUrlDefault: https://www.sciencedirect.com omitProxy: true ssIdentifier: ssj0009435 providerName: Elsevier – providerCode: PRVESC databaseName: Elsevier ScienceDirect Freedom Collection Journals issn: 0747-7171 databaseCode: ACRLP dateStart: 20211101 customDbUrl: isFulltext: true dateEnd: 99991231 titleUrlDefault: https://www.sciencedirect.com omitProxy: true ssIdentifier: ssj0009435 providerName: Elsevier – providerCode: PRVESC databaseName: Elsevier ScienceDirect Journals issn: 0747-7171 databaseCode: AIKHN dateStart: 20210701 customDbUrl: isFulltext: true dateEnd: 99991231 titleUrlDefault: https://www.sciencedirect.com omitProxy: true ssIdentifier: ssj0009435 providerName: Elsevier – providerCode: PRVESC databaseName: ScienceDirect Freedom Collection 2013 issn: 0747-7171 databaseCode: .~1 dateStart: 19950101 customDbUrl: isFulltext: true dateEnd: 99991231 titleUrlDefault: https://www.sciencedirect.com omitProxy: true ssIdentifier: ssj0009435 providerName: Elsevier – providerCode: PRVLSH databaseName: Elsevier Journals issn: 0747-7171 databaseCode: AKRWK dateStart: 19850301 customDbUrl: isFulltext: true mediaType: online dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0009435 providerName: Library Specific Holdings |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LS8NAEB6KXrz4Fuuj7MGTkDbZPLo9lmKpFntQi72FZGdWWmpbbL36251NNqigF3MJCbsQvmFe2W9mAK6kDmLiSNzzM-IEhR2gZ9vfem1jkty2_zaFpO9HyWAc3U3iSQ16VS2MpVU621_a9MJauzcth2ZrNZ22Hm3rd05G2PTawCOe2Ar2KLG0vubHF82jE5VDNnmxZ1dXJ5sFx2u2tl0MZdgsGqsFv_umb_6mvw-7LlAU3fJbDqBGi0PYq4YwCKeTR5B05y_28HeqxZLV_9XVVYqlESVNmlV4LtCN0hFugMz6GMb9m6fewHPDEDzNGdLGywwSKrJ_AGLqIEmKte6QUqhkjuz240wFhhMUvrKMUOYUYm50kCjKIwzDE9haLBd0CoJ8ROkrjMOozfGaVIw1-r7WyGIyKOtwXcGQrsqeF2lFBpuljFlqMUtLzOoQVUClPwSXsk3-e9vZ_7adww4_RSUJ7AK2Nm_vdMlRwSZvFGJvwHb3djgY2fvw4Xn4CRl5uDI |
linkProvider | Elsevier |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3JTsMwEB2V9gAXdkRZfeCEFJo4ceoeq4oqpcuFVurNSrygVt1Ey_8zThwBElzIMYml6I39ZiYevwF4oDJgGiNxz081JijoAD0rf-s1jYkzK_9tcksPR3EyiV6mbFqBTnkWxpZVOu4vOD1na3en4dBsbGazxquVfsdkBKnXBh5suge1iCEnV6HW7vWT0Zf2blT02cT3PTug3NzMy7zmWytkSMOnXFst-N09fXM53WM4dLEiaRefcwIVvTqFo7IPA3HL8gzi9uLN7v_OJFkjAyzd0UqyNqSolMZVvCDKddMhrofM9hwm3edxJ_FcPwRPYpK081KjtOLa_gRguqU01UzKluZccZop9Pws5YHBHAWvNNWKZjpUmZFBzHUWqTC8gOpqvdKXQLSvFPW5YmHUxJCNcoRb-b6UCi1lFK3DYwmD2BSyF6KsB5sLxExYzESBWR2iEijxw3YCafnvYVf_G3YP-8l4OBCD3qh_DQf4JCpqwm6gunv_0LcYJOyyOzcJPgGwJLk6 |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Algebraic+optimization+of+sequential+decision+problems&rft.jtitle=Journal+of+symbolic+computation&rft.au=Dressler%2C+Mareike&rft.au=Garrote-L%C3%B3pez%2C+Marina&rft.au=Mont%C3%BAfar%2C+Guido&rft.au=M%C3%BCller%2C+Johannes&rft.date=2024-03-01&rft.pub=Elsevier+Ltd&rft.issn=0747-7171&rft.volume=121&rft_id=info:doi/10.1016%2Fj.jsc.2023.102241&rft.externalDocID=S074771712300055X |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0747-7171&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0747-7171&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0747-7171&client=summon |