Finite-time error bounds for Greedy-GQ
Greedy-GQ with linear function approximation, originally proposed in Maei et al. (in: Proceedings of the international conference on machine learning (ICML), 2010), is a value-based off-policy algorithm for optimal control in reinforcement learning, and it has a non-linear two timescale structure wi...
Saved in:
| Published in | Machine learning Vol. 113; no. 9; pp. 5981 - 6018 |
|---|---|
| Main Authors | , , |
| Format | Journal Article |
| Language | English |
| Published |
New York
Springer US
01.09.2024
Springer Nature B.V |
| Subjects | |
| Online Access | Get full text |
| ISSN | 0885-6125 1573-0565 |
| DOI | 10.1007/s10994-024-06542-x |
Cover
| Summary: | Greedy-GQ with linear function approximation, originally proposed in Maei et al. (in: Proceedings of the international conference on machine learning (ICML), 2010), is a value-based off-policy algorithm for optimal control in reinforcement learning, and it has a non-linear two timescale structure with non-convex objective function. This paper develops its tightest finite-time error bounds. We show that the Greedy-GQ algorithm converges as fast as
O
(
1
/
T
)
under the i.i.d. setting and
O
(
log
T
/
T
)
under the Markovian setting. We further design variant of the vanilla Greedy-GQ algorithm using the nested-loop approach, and show that its sample complexity is
O
(
log
(
1
/
ϵ
)
ϵ
-
2
)
, which matches with the one of the vanilla Greedy-GQ. Our finite-time error bounds match with the one of the stochastic gradient descent algorithm for general smooth non-convex optimization problems, despite of its additonal challenge in the two time-scale updates. Our finite-sample analysis provides theoretical guidance on choosing step-sizes for faster convergence in practice, and suggests the trade-off between the convergence rate and the quality of the obtained policy. Our techniques provide a general approach for finite-sample analysis of non-convex two timescale value-based reinforcement learning algorithms. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 0885-6125 1573-0565 |
| DOI: | 10.1007/s10994-024-06542-x |