The Simplex and Policy-Iteration Methods Are Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate

We prove that the classic policy-iteration method [Howard, R. A. 1960. Dynamic Programming and Markov Processes . MIT, Cambridge] and the original simplex method with the most-negative-reduced-cost pivoting rule of Dantzig are strongly polynomial-time algorithms for solving the Markov decision probl...

Full description

Saved in:

Bibliographic Details
Published in	Mathematics of operations research Vol. 36; no. 4; pp. 593 - 603
Main Author	Ye, Yinyu
Format	Journal Article
Language	English
Published	Linthicum INFORMS 01.11.2011 Institute for Operations Research and the Management Sciences Inst
Subjects	Algorithms Arithmetic Discount rates Dynamic programming Linear programming Markov analysis Markov decision problem Markov processes Mathematical vectors Mathematics Methods Optimal policy policy-iteration method Polynomials Simplex method strongly polynomial time Studies United States
Online Access	Get full text
ISSN	0364-765X 1526-5471
DOI	10.1287/moor.1110.0516

Cover

More Information
Summary:	We prove that the classic policy-iteration method [Howard, R. A. 1960. Dynamic Programming and Markov Processes . MIT, Cambridge] and the original simplex method with the most-negative-reduced-cost pivoting rule of Dantzig are strongly polynomial-time algorithms for solving the Markov decision problem (MDP) with a fixed discount rate. Furthermore, the computational complexity of the policy-iteration and simplex methods is superior to that of the only known strongly polynomial-time interior-point algorithm [Ye, Y. 2005. A new complexity result on solving the Markov decision problem. Math. Oper. Res. 30 (3) 733-749] for solving this problem. The result is surprising because the simplex method with the same pivoting rule was shown to be exponential for solving a general linear programming problem [Klee, V., G. J. Minty. 1972. How good is the simplex method? Technical report. O. Shisha, ed. Inequalities III. Academic Press, New York], the simplex method with the smallest index pivoting rule was shown to be exponential for solving an MDP regardless of discount rates [Melekopoglou, M., A. Condon. 1994. On the complexity of the policy improvement algorithm for Markov decision processes. INFORMS J. Comput. 6 (2) 188-192], and the policy-iteration method was recently shown to be exponential for solving undiscounted MDPs under the average cost criterion. We also extend the result to solving MDPs with transient substochastic transition matrices whose spectral radii are uniformly below one.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0364-765X 1526-5471
DOI:	10.1287/moor.1110.0516