Explore First, Exploit Next: The True Shape of Regret in Bandit Problems

We revisit lower bounds on the regret in the case of multiarmed bandit problems. We obtain nonasymptotic, distribution-dependent bounds and provide simple proofs based only on well-known properties of Kullback–Leibler divergences. These bounds show in particular that in the initial phase the regret...

Full description

Saved in:
Bibliographic Details
Published inMathematics of operations research Vol. 44; no. 2; pp. 377 - 399
Main Authors Garivier, Aurélien, Ménard, Pierre, Stoltz, Gilles
Format Journal Article
LanguageEnglish
Published Linthicum INFORMS 01.05.2019
Institute for Operations Research and the Management Sciences
Subjects
Online AccessGet full text
ISSN0364-765X
1526-5471
DOI10.1287/moor.2017.0928

Cover

More Information
Summary:We revisit lower bounds on the regret in the case of multiarmed bandit problems. We obtain nonasymptotic, distribution-dependent bounds and provide simple proofs based only on well-known properties of Kullback–Leibler divergences. These bounds show in particular that in the initial phase the regret grows almost linearly, and that the well-known logarithmic growth of the regret only holds in a final phase. The proof techniques come to the essence of the information-theoretic arguments used and they involve no unnecessary complications.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0364-765X
1526-5471
DOI:10.1287/moor.2017.0928