Explore First, Exploit Next: The True Shape of Regret in Bandit Problems

We revisit lower bounds on the regret in the case of multiarmed bandit problems. We obtain nonasymptotic, distribution-dependent bounds and provide simple proofs based only on well-known properties of Kullback–Leibler divergences. These bounds show in particular that in the initial phase the regret...

Full description

Saved in:
Bibliographic Details
Published inMathematics of operations research Vol. 44; no. 2; pp. 377 - 399
Main Authors Garivier, Aurélien, Ménard, Pierre, Stoltz, Gilles
Format Journal Article
LanguageEnglish
Published Linthicum INFORMS 01.05.2019
Institute for Operations Research and the Management Sciences
Subjects
Online AccessGet full text
ISSN0364-765X
1526-5471
DOI10.1287/moor.2017.0928

Cover

Abstract We revisit lower bounds on the regret in the case of multiarmed bandit problems. We obtain nonasymptotic, distribution-dependent bounds and provide simple proofs based only on well-known properties of Kullback–Leibler divergences. These bounds show in particular that in the initial phase the regret grows almost linearly, and that the well-known logarithmic growth of the regret only holds in a final phase. The proof techniques come to the essence of the information-theoretic arguments used and they involve no unnecessary complications.
AbstractList We revisit lower bounds on the regret in the case of multiarmed bandit problems. We obtain nonasymptotic, distribution-dependent bounds and provide simple proofs based only on well-known properties of Kullback-Leibler divergences. These bounds show in particular that in the initial phase the regret grows almost linearly, and that the well-known logarithmic growth of the regret only holds in a final phase. The proof techniques come to the essence of the information-theoretic arguments used and they involve no unnecessary complications. Funding: This work was partially supported by the CIMI (Centre International de Mathdmatiques et d'Informatique) Excellence program while Gilles Stoltz visited Toulouse in November 2015. The authors acknowledge the support of the French Agence Nationale de la Recherche (ANR), under project SPADRO [Grant ANR-13-BS01-0005] and under project ALICIA [Grant ANR-13-CORD-0020]. Gilles Stoltz would like to thank Investissements d'Avenir [Grant ANR-11-IDEX-0003/Labex Ecodec/ANR-ll-LABX-0047] for financial support. Keywords: multiarmed bandits * cumulative regret * Information-theoretic proof techniques * nonasymptotic lower bounds
We revisit lower bounds on the regret in the case of multiarmed bandit problems. We obtain nonasymptotic, distribution-dependent bounds and provide simple proofs based only on well-known properties of Kullback–Leibler divergences. These bounds show in particular that in the initial phase the regret grows almost linearly, and that the well-known logarithmic growth of the regret only holds in a final phase. The proof techniques come to the essence of the information-theoretic arguments used and they involve no unnecessary complications.
We revisit lower bounds on the regret in the case of multi-armed bandit problems. We obtain non-asymptotic, distribution-dependent bounds and provide straightforward proofs based only on well-known properties of Kullback-Leibler divergences. These bounds show in particular that in an initial phase the regret grows almost linearly, and that the well-known logarithmic growth of the regret only holds in a final phase. The proof techniques come to the essence of the information-theoretic arguments used and they are deprived of all unnecessary complications.
Audience Academic
Author Stoltz, Gilles
Ménard, Pierre
Garivier, Aurélien
Author_xml – sequence: 1
  givenname: Aurélien
  surname: Garivier
  fullname: Garivier, Aurélien
– sequence: 2
  givenname: Pierre
  surname: Ménard
  fullname: Ménard, Pierre
– sequence: 3
  givenname: Gilles
  surname: Stoltz
  fullname: Stoltz, Gilles
BackLink https://hal.science/hal-01276324$$DView record in HAL
BookMark eNqFkt1r2zAUxcXoYGm3170NBINBoc70YUv23rLSNoWwjTaDvQnZvk4UbCuV5NH995ObfTQjZQgkdPU70uHqHKOj3vaA0GtKppTl8n1nrZsyQuWUFCx_hiY0YyLJUkmP0IRwkSZSZN9eoGPvN4TQTNJ0guYX99vWOsCXxvlwhh-2JuBPcB8-4OUa8NINgG_XegvYNvgGVg4CNj3-qPs6gl-cLVvo_Ev0vNGth1e_1hP09fJieT5PFp-vrs9ni6QSKQlJRqssLwWFIte8rLOGaJoVkjFel6KqS82LigPhOi1ooVMACVJymRdUyqauK36CTnf3rnWrts502v1QVhs1ny3UWCOUScFZ-p1H9u2O3Tp7N4APamMH10d7irEsMkLwR9RKt6BM39jgdNUZX6lZlhdcykKQSCUHqBX04HQbP6IxsbzHTw_wcdTQmeqg4N2eIDIhfsJKD96rffD0afD69mafPXvEloM3Pfg4ebNaB7-THDJdOeu9g-ZPhylRY8jUGDI1hkyNIYuC9B9BZYIOJlpy2rRPy97sZBsf4sHvR5iMbJ7Sv80e--Y6_z8bPwG6XusE
CitedBy_id crossref_primary_10_1080_24725854_2021_1882014
crossref_primary_10_1109_JSAIT_2021_3081433
crossref_primary_10_1109_LCSYS_2020_2982455
crossref_primary_10_1007_s10994_021_05956_1
crossref_primary_10_1109_LCSYS_2024_3514995
crossref_primary_10_1109_JAS_2021_1004141
crossref_primary_10_1145_3224431
crossref_primary_10_1214_19_STS716
crossref_primary_10_1103_PhysRevResearch_2_033295
crossref_primary_10_1109_TAC_2021_3077454
crossref_primary_10_1109_TIT_2022_3159600
crossref_primary_10_1109_TMC_2024_3424192
crossref_primary_10_1109_TAC_2022_3221705
crossref_primary_10_1214_24_AOS2395
crossref_primary_10_1080_07474946_2024_2428245
crossref_primary_10_2139_ssrn_3892631
crossref_primary_10_1007_s11134_022_09816_0
crossref_primary_10_1109_LCSYS_2020_3005224
crossref_primary_10_1287_msom_2022_1116
crossref_primary_10_1287_msom_2022_0412
crossref_primary_10_1080_07474946_2021_1847965
Cites_doi 10.1111/j.2517-6161.1966.tb00626.x
10.1023/A:1013689704352
10.1007/s10998-010-3055-6
10.2307/2332286
10.1109/9.847107
10.1017/CBO9780511546921
10.1006/aama.1996.0007
10.1561/2200000024
10.1051/proc/201551014
10.1214/13-AOS1119
10.1016/0196-8858(85)90002-8
10.1137/S0097539701398375
ContentType Journal Article
Copyright Copyright: © 2018 INFORMS
COPYRIGHT 2019 Institute for Operations Research and the Management Sciences
Copyright Institute for Operations Research and the Management Sciences May 2019
Distributed under a Creative Commons Attribution 4.0 International License
Copyright_xml – notice: Copyright: © 2018 INFORMS
– notice: COPYRIGHT 2019 Institute for Operations Research and the Management Sciences
– notice: Copyright Institute for Operations Research and the Management Sciences May 2019
– notice: Distributed under a Creative Commons Attribution 4.0 International License
DBID AAYXX
CITATION
N95
ISR
JQ2
1XC
VOOES
DOI 10.1287/moor.2017.0928
DatabaseName CrossRef
Business: Insights (Gale)
Gale In Context: Science
ProQuest Computer Science Collection
Hyper Article en Ligne (HAL)
Hyper Article en Ligne (HAL) (Open Access)
DatabaseTitle CrossRef
ProQuest Computer Science Collection
DatabaseTitleList

ProQuest Computer Science Collection





CrossRef

DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Computer Science
Business
Mathematics
Statistics
EISSN 1526-5471
EndPage 399
ExternalDocumentID oai_HAL_hal_01276324v3
A589377960
10_1287_moor_2017_0928
27287841
moor20170928
Genre Research Articles
GroupedDBID 08R
1AW
1OL
29M
3V.
4.4
4S
5GY
7WY
85S
8AL
8AO
8FE
8FG
8FL
8G5
8H
8VB
AAKYL
AAPBV
ABBHK
ABEFU
ABFLS
ABJCF
ABPPZ
ABUWG
ACIWK
ACNCT
ACYGS
ADCOW
ADGDI
ADMHP
ADODI
AEILP
AELPN
AENEX
AEUPB
AFKRA
AFXKK
AKVCP
ALMA_UNASSIGNED_HOLDINGS
ARAPS
ARCSS
AZQEC
BDTQF
BENPR
BES
BEZIV
BGLVJ
BHOJU
BKOMP
BPHCQ
CBXGM
CHNMF
CS3
CWXUR
CZBKB
DQDLB
DSRWC
DWQXO
EBA
EBE
EBO
EBR
EBS
EBU
ECEWR
ECR
ECS
EDO
EFSUC
EJD
EMK
EPL
F20
FEDTE
FRNLG
GIFXF
GNUQQ
GROUPED_ABI_INFORM_COMPLETE
GROUPED_ABI_INFORM_RESEARCH
GUQSH
HCIFZ
HECYW
HGD
HQ6
HVGLF
H~9
IAO
ICW
IEA
IGG
IOF
ISR
ITC
JAA
JBU
JMS
JPL
JSODD
JST
K6
K60
K6V
K7-
L6V
M0C
M0N
M2O
M7S
MBDVC
MV1
N95
NIEAY
P-O
P2P
P62
PADUT
PQEST
PQQKQ
PQUKI
PRG
PRINS
PROAC
PTHSS
QWB
RNS
RPU
RXW
SA0
TAE
TH9
TN5
TUS
U5U
WH7
X
XFK
XHC
XI7
Y99
ZL0
ZY4
-~X
.DC
18M
2AX
AAWTO
ABDNZ
ABFAN
ABKVW
ABQDR
ABYRZ
ABYWD
ABYYQ
ACGFO
ACMTB
ACTMH
ACVFL
ACXJH
AEGXH
AELLO
AEMOZ
AFVYC
AHAJD
AHQJS
AIAGR
AKBRZ
ALRMG
AMVHM
BAAKF
IPSME
JAAYA
JBMMH
JBZCM
JENOY
JHFFW
JKQEH
JLEZI
JLXEF
JPPEU
K1G
K6~
8H~
AAOAC
AAWIL
AAYXX
ABAWQ
ABXSQ
ACDIW
ACHJO
ACUHF
ADULT
AGLNM
AIHAF
APTMU
ASMEE
CCPQU
CITATION
PHGZM
PHGZT
PQBIZ
PQBZA
WHG
XOL
JQ2
1XC
PQGLB
PUEGO
VOOES
ID FETCH-LOGICAL-c640t-51c58b61e98a3bd5f0a1597223db6cdba39c3e03a4919a4ee7e773789177fddc3
ISSN 0364-765X
IngestDate Fri Sep 12 12:41:03 EDT 2025
Sat Aug 16 10:21:26 EDT 2025
Tue Jun 17 22:09:54 EDT 2025
Fri Jun 13 00:00:40 EDT 2025
Tue Jun 10 21:03:47 EDT 2025
Fri Jun 27 05:27:20 EDT 2025
Fri Jun 27 05:26:52 EDT 2025
Fri May 23 02:45:52 EDT 2025
Tue Jul 01 02:11:00 EDT 2025
Thu Apr 24 22:52:16 EDT 2025
Thu May 29 08:47:49 EDT 2025
Wed Jan 06 02:48:02 EST 2021
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 2
Keywords multi-armed bandits
non-asymptotic lower bounds
cumulative regret
information-theoretic proof techniques
Language English
License Distributed under a Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c640t-51c58b61e98a3bd5f0a1597223db6cdba39c3e03a4919a4ee7e773789177fddc3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0003-1240-1007
0000-0002-4906-9573
OpenAccessLink https://hal.science/hal-01276324
PQID 2253246633
PQPubID 37790
PageCount 23
ParticipantIDs gale_infotracmisc_A589377960
crossref_primary_10_1287_moor_2017_0928
informs_primary_10_1287_moor_2017_0928
gale_infotracgeneralonefile_A589377960
gale_incontextgauss__A589377960
gale_businessinsightsgauss_A589377960
proquest_journals_2253246633
hal_primary_oai_HAL_hal_01276324v3
crossref_citationtrail_10_1287_moor_2017_0928
gale_incontextgauss_ISR_A589377960
gale_infotracacademiconefile_A589377960
jstor_primary_27287841
ProviderPackageCode Y99
RPU
NIEAY
CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2019-05-01
PublicationDateYYYYMMDD 2019-05-01
PublicationDate_xml – month: 05
  year: 2019
  text: 2019-05-01
  day: 01
PublicationDecade 2010
PublicationPlace Linthicum
PublicationPlace_xml – name: Linthicum
PublicationTitle Mathematics of operations research
PublicationYear 2019
Publisher INFORMS
Institute for Operations Research and the Management Sciences
Publisher_xml – name: INFORMS
– name: Institute for Operations Research and the Management Sciences
References B20
B21
B22
B23
B24
B25
B10
B11
B12
B13
B14
B15
B16
B17
B18
B19
B1
B2
B3
B4
B5
B6
B7
B8
B9
Kaufmann E (B19) 2016; 17
Wu Y (B25) 2015
Bubeck S (B7) 2013; 30
Mannor S (B23) 2004; 5
Garivier A (B16) 2016
Honda J (B17) 2015; 16
Lehmann EL (B22) 1998
Ali SM (B1) 1966; 28
References_xml – ident: B12
– ident: B9
– ident: B14
– ident: B10
– ident: B3
– ident: B20
– ident: B1
– ident: B7
– ident: B5
– ident: B25
– ident: B23
– ident: B21
– ident: B18
– ident: B16
– ident: B8
– ident: B11
– ident: B13
– ident: B2
– ident: B4
– ident: B6
– ident: B24
– ident: B22
– ident: B17
– ident: B15
– ident: B19
– volume: 28
  start-page: 131
  issue: 1
  year: 1966
  ident: B1
  publication-title: J. Roy. Statist. Soc. Ser. B. Methodological
  doi: 10.1111/j.2517-6161.1966.tb00626.x
– ident: B3
  doi: 10.1023/A:1013689704352
– volume-title: Theory of Point Estimation
  year: 1998
  ident: B22
– start-page: 1360
  volume-title: Advances in Neural Information Processing Systems 28 (NIPS 2015)
  year: 2015
  ident: B25
– volume: 16
  start-page: 3721
  year: 2015
  ident: B17
  publication-title: J. Machine Learn. Res.
– ident: B2
  doi: 10.1007/s10998-010-3055-6
– volume: 30
  start-page: 122
  year: 2013
  ident: B7
  publication-title: Proc. 26th Annual Conf. Learn. Theory (COLT), JMLR W&CP
– ident: B24
  doi: 10.2307/2332286
– volume: 17
  start-page: 1
  issue: 1
  year: 2016
  ident: B19
  publication-title: J. Machine Learn. Res.
– ident: B20
  doi: 10.1109/9.847107
– start-page: 784
  volume-title: Advances in Neural Information Processing Systems 29 (NIPS 2016)
  year: 2016
  ident: B16
– ident: B12
  doi: 10.1017/CBO9780511546921
– ident: B9
  doi: 10.1006/aama.1996.0007
– ident: B6
  doi: 10.1561/2200000024
– volume: 5
  start-page: 623
  year: 2004
  ident: B23
  publication-title: J. Machine Learn. Res.
– ident: B15
  doi: 10.1051/proc/201551014
– ident: B11
  doi: 10.1214/13-AOS1119
– ident: B21
  doi: 10.1016/0196-8858(85)90002-8
– ident: B4
  doi: 10.1137/S0097539701398375
SSID ssj0015714
Score 2.5527742
Snippet We revisit lower bounds on the regret in the case of multiarmed bandit problems. We obtain nonasymptotic, distribution-dependent bounds and provide simple...
We revisit lower bounds on the regret in the case of multi-armed bandit problems. We obtain non-asymptotic, distribution-dependent bounds and provide...
SourceID hal
proquest
gale
crossref
jstor
informs
SourceType Open Access Repository
Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 377
SubjectTerms Boundary value problems
Computer Science
cumulative regret
Entropy
Information theory
information-theoretic proof techniques
Lower bounds
Machine Learning
Mathematical models
Mathematical research
Mathematics
multiarmed bandits
nonasymptotic lower bounds
Operations research
Probability distribution
Randomized algorithms
Statistics
Uncertainty (Information theory)
Title Explore First, Exploit Next: The True Shape of Regret in Bandit Problems
URI https://www.jstor.org/stable/27287841
https://www.proquest.com/docview/2253246633
https://hal.science/hal-01276324
Volume 44
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1bb9MwFLa6TiB44DKYKAywELCHkdHESZzwVia6chlMrJv6FjmJMyqNpGpShPj1nBM7N62IwUtUWUeJ4_PFPna_8x1CnmMFdwELj5FILzJsmYQGLCOhwaKYSSv2JItLguxnd3Jqf5g5s15v0WItrYpwP_q1Nq_kf7wKbeBXzJL9B8_WN4UG-A3-hSt4GK5X8rEi0Mm98XypUjfKhnkBU9fPouJTTJcribrMC6mER2CDXZYFeIsJLQVmCmBFmbwdpR7VWq4l0SNbyKVmzGltoPoM-RC22riyllMMajPPm9SyI5kKRZw_BouGZHtSZBdlFdm9Q0xE7Bw8YK6T0z54aMgMyIf80vSkogzWFNCGyFNNWO1zSObaBnedWXtKVpKQGnpWa35lqubLpXnfwpOT8fcsQ4lXk-8PfZ1y3tXSHjkYnnHYtW2QTYtDtNUnm6OPZwfH9T9PDje15Jjqlhb6hAe87t6-E8jo5XzjG7JprynN27yiuF5a5svYZXqH3NKbDjpSCLpLejLdIternIctcruq7UH1yG2Rmy2hyntkopFGS6S9ohpnFHH2hgLKKKKMliijWUIVyug8pQpltELZfXI6fjc9mBi6BocRufawMBwzcrzQNaXvCRbGTjIUEABzCCrj0I3iUDA_YnLIhO2bvrCl5JJzxj3f5DyJ44htk36apfIBoUkoBXdjGHaL224ofcdmEHEL4fqJGzrWgBjVgAaRFqjHOikXAW5UwQEBOiBABwTogAHZre0XSprlj5Yv0D-BrusKlxxPvvJzscrzoAHFgDwr7VAXJUXilTJ4f_K1Y_R0nVHHYldbJBm8QCR0pgsMA4qtdSxfdizPldT8OsOdjiGsAVG344C8ehRQMn4y-hRgG1JLsCTDDwYP08D863Btl7itzSwORp5tQi8qIAd6GswDCAjg9vApsYdXfZtH5EYzoeyQfgEQfQyxfRE-0Z_jb8Gy-h4
linkProvider EBSCOhost
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Explore+First%2C+Exploit+Next%3A+The+True+Shape+of+Regret+in+Bandit+Problems&rft.jtitle=Mathematics+of+operations+research&rft.au=Garivier%2C+Aurelien&rft.au=Menard%2C+Pierre&rft.au=Stoltz%2C+Gilles&rft.date=2019-05-01&rft.pub=Institute+for+Operations+Research+and+the+Management+Sciences&rft.issn=0364-765X&rft.volume=44&rft.issue=2&rft.spage=377&rft_id=info:doi/10.1287%2Fmoor.2017.0928&rft.externalDocID=A589377960
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0364-765X&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0364-765X&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0364-765X&client=summon