Explore First, Exploit Next: The True Shape of Regret in Bandit Problems
We revisit lower bounds on the regret in the case of multiarmed bandit problems. We obtain nonasymptotic, distribution-dependent bounds and provide simple proofs based only on well-known properties of Kullback–Leibler divergences. These bounds show in particular that in the initial phase the regret...
Saved in:
Published in | Mathematics of operations research Vol. 44; no. 2; pp. 377 - 399 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
Linthicum
INFORMS
01.05.2019
Institute for Operations Research and the Management Sciences |
Subjects | |
Online Access | Get full text |
ISSN | 0364-765X 1526-5471 |
DOI | 10.1287/moor.2017.0928 |
Cover
Summary: | We revisit lower bounds on the regret in the case of multiarmed bandit problems. We obtain nonasymptotic, distribution-dependent bounds and provide simple proofs based only on well-known properties of Kullback–Leibler divergences. These bounds show in particular that in the initial phase the regret grows almost linearly, and that the well-known logarithmic growth of the regret only holds in a final phase. The proof techniques come to the essence of the information-theoretic arguments used and they involve no unnecessary complications. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ISSN: | 0364-765X 1526-5471 |
DOI: | 10.1287/moor.2017.0928 |