Explore First, Exploit Next: The True Shape of Regret in Bandit Problems

We revisit lower bounds on the regret in the case of multiarmed bandit problems. We obtain nonasymptotic, distribution-dependent bounds and provide simple proofs based only on well-known properties of Kullback–Leibler divergences. These bounds show in particular that in the initial phase the regret...

Full description

Saved in:

Bibliographic Details
Published in	Mathematics of operations research Vol. 44; no. 2; pp. 377 - 399
Main Authors	Garivier, Aurélien, Ménard, Pierre, Stoltz, Gilles
Format	Journal Article
Language	English
Published	Linthicum INFORMS 01.05.2019 Institute for Operations Research and the Management Sciences
Subjects	Boundary value problems Computer Science cumulative regret Entropy Information theory information-theoretic proof techniques Lower bounds Machine Learning Mathematical models Mathematical research Mathematics multiarmed bandits nonasymptotic lower bounds Operations research Probability distribution Randomized algorithms Statistics Uncertainty (Information theory) multi-armed bandits non-asymptotic lower bounds cumulative regret information-theoretic proof techniques
Online Access	Get full text
ISSN	0364-765X 1526-5471
DOI	10.1287/moor.2017.0928

Cover

Abstract	We revisit lower bounds on the regret in the case of multiarmed bandit problems. We obtain nonasymptotic, distribution-dependent bounds and provide simple proofs based only on well-known properties of Kullback–Leibler divergences. These bounds show in particular that in the initial phase the regret grows almost linearly, and that the well-known logarithmic growth of the regret only holds in a final phase. The proof techniques come to the essence of the information-theoretic arguments used and they involve no unnecessary complications.
AbstractList	We revisit lower bounds on the regret in the case of multiarmed bandit problems. We obtain nonasymptotic, distribution-dependent bounds and provide simple proofs based only on well-known properties of Kullback-Leibler divergences. These bounds show in particular that in the initial phase the regret grows almost linearly, and that the well-known logarithmic growth of the regret only holds in a final phase. The proof techniques come to the essence of the information-theoretic arguments used and they involve no unnecessary complications. Funding: This work was partially supported by the CIMI (Centre International de Mathdmatiques et d'Informatique) Excellence program while Gilles Stoltz visited Toulouse in November 2015. The authors acknowledge the support of the French Agence Nationale de la Recherche (ANR), under project SPADRO [Grant ANR-13-BS01-0005] and under project ALICIA [Grant ANR-13-CORD-0020]. Gilles Stoltz would like to thank Investissements d'Avenir [Grant ANR-11-IDEX-0003/Labex Ecodec/ANR-ll-LABX-0047] for financial support. Keywords: multiarmed bandits * cumulative regret * Information-theoretic proof techniques * nonasymptotic lower bounds We revisit lower bounds on the regret in the case of multiarmed bandit problems. We obtain nonasymptotic, distribution-dependent bounds and provide simple proofs based only on well-known properties of Kullback–Leibler divergences. These bounds show in particular that in the initial phase the regret grows almost linearly, and that the well-known logarithmic growth of the regret only holds in a final phase. The proof techniques come to the essence of the information-theoretic arguments used and they involve no unnecessary complications. We revisit lower bounds on the regret in the case of multi-armed bandit problems. We obtain non-asymptotic, distribution-dependent bounds and provide straightforward proofs based only on well-known properties of Kullback-Leibler divergences. These bounds show in particular that in an initial phase the regret grows almost linearly, and that the well-known logarithmic growth of the regret only holds in a final phase. The proof techniques come to the essence of the information-theoretic arguments used and they are deprived of all unnecessary complications.
Audience	Academic
Author	Stoltz, Gilles Ménard, Pierre Garivier, Aurélien
Author_xml	– sequence: 1 givenname: Aurélien surname: Garivier fullname: Garivier, Aurélien – sequence: 2 givenname: Pierre surname: Ménard fullname: Ménard, Pierre – sequence: 3 givenname: Gilles surname: Stoltz fullname: Stoltz, Gilles
BackLink	https://hal.science/hal-01276324$$DView record in HAL
BookMark	eNqFkt1r2zAUxcXoYGm3170NBINBoc70YUv23rLSNoWwjTaDvQnZvk4UbCuV5NH995ObfTQjZQgkdPU70uHqHKOj3vaA0GtKppTl8n1nrZsyQuWUFCx_hiY0YyLJUkmP0IRwkSZSZN9eoGPvN4TQTNJ0guYX99vWOsCXxvlwhh-2JuBPcB8-4OUa8NINgG_XegvYNvgGVg4CNj3-qPs6gl-cLVvo_Ev0vNGth1e_1hP09fJieT5PFp-vrs9ni6QSKQlJRqssLwWFIte8rLOGaJoVkjFel6KqS82LigPhOi1ooVMACVJymRdUyqauK36CTnf3rnWrts502v1QVhs1ny3UWCOUScFZ-p1H9u2O3Tp7N4APamMH10d7irEsMkLwR9RKt6BM39jgdNUZX6lZlhdcykKQSCUHqBX04HQbP6IxsbzHTw_wcdTQmeqg4N2eIDIhfsJKD96rffD0afD69mafPXvEloM3Pfg4ebNaB7-THDJdOeu9g-ZPhylRY8jUGDI1hkyNIYuC9B9BZYIOJlpy2rRPy97sZBsf4sHvR5iMbJ7Sv80e--Y6_z8bPwG6XusE
CitedBy_id	crossref_primary_10_1080_24725854_2021_1882014 crossref_primary_10_1109_JSAIT_2021_3081433 crossref_primary_10_1109_LCSYS_2020_2982455 crossref_primary_10_1007_s10994_021_05956_1 crossref_primary_10_1109_LCSYS_2024_3514995 crossref_primary_10_1109_JAS_2021_1004141 crossref_primary_10_1145_3224431 crossref_primary_10_1214_19_STS716 crossref_primary_10_1103_PhysRevResearch_2_033295 crossref_primary_10_1109_TAC_2021_3077454 crossref_primary_10_1109_TIT_2022_3159600 crossref_primary_10_1109_TMC_2024_3424192 crossref_primary_10_1109_TAC_2022_3221705 crossref_primary_10_1214_24_AOS2395 crossref_primary_10_1080_07474946_2024_2428245 crossref_primary_10_2139_ssrn_3892631 crossref_primary_10_1007_s11134_022_09816_0 crossref_primary_10_1109_LCSYS_2020_3005224 crossref_primary_10_1287_msom_2022_1116 crossref_primary_10_1287_msom_2022_0412 crossref_primary_10_1080_07474946_2021_1847965
Cites_doi	10.1111/j.2517-6161.1966.tb00626.x 10.1023/A:1013689704352 10.1007/s10998-010-3055-6 10.2307/2332286 10.1109/9.847107 10.1017/CBO9780511546921 10.1006/aama.1996.0007 10.1561/2200000024 10.1051/proc/201551014 10.1214/13-AOS1119 10.1016/0196-8858(85)90002-8 10.1137/S0097539701398375
ContentType	Journal Article
Copyright	Copyright: © 2018 INFORMS COPYRIGHT 2019 Institute for Operations Research and the Management Sciences Copyright Institute for Operations Research and the Management Sciences May 2019 Distributed under a Creative Commons Attribution 4.0 International License
Copyright_xml	– notice: Copyright: © 2018 INFORMS – notice: COPYRIGHT 2019 Institute for Operations Research and the Management Sciences – notice: Copyright Institute for Operations Research and the Management Sciences May 2019 – notice: Distributed under a Creative Commons Attribution 4.0 International License
DBID	AAYXX CITATION N95 ISR JQ2 1XC VOOES
DOI	10.1287/moor.2017.0928
DatabaseName	CrossRef Business: Insights (Gale) Gale In Context: Science ProQuest Computer Science Collection Hyper Article en Ligne (HAL) Hyper Article en Ligne (HAL) (Open Access)
DatabaseTitle	CrossRef ProQuest Computer Science Collection
DatabaseTitleList	ProQuest Computer Science Collection CrossRef
DeliveryMethod	fulltext_linktorsrc
Discipline	Engineering Computer Science Business Mathematics Statistics
EISSN	1526-5471
EndPage	399
ExternalDocumentID	oai_HAL_hal_01276324v3 A589377960 10_1287_moor_2017_0928 27287841 moor20170928
Genre	Research Articles
GroupedDBID	08R 1AW 1OL 29M 3V. 4.4 4S 5GY 7WY 85S 8AL 8AO 8FE 8FG 8FL 8G5 8H 8VB AAKYL AAPBV ABBHK ABEFU ABFLS ABJCF ABPPZ ABUWG ACIWK ACNCT ACYGS ADCOW ADGDI ADMHP ADODI AEILP AELPN AENEX AEUPB AFKRA AFXKK AKVCP ALMA_UNASSIGNED_HOLDINGS ARAPS ARCSS AZQEC BDTQF BENPR BES BEZIV BGLVJ BHOJU BKOMP BPHCQ CBXGM CHNMF CS3 CWXUR CZBKB DQDLB DSRWC DWQXO EBA EBE EBO EBR EBS EBU ECEWR ECR ECS EDO EFSUC EJD EMK EPL F20 FEDTE FRNLG GIFXF GNUQQ GROUPED_ABI_INFORM_COMPLETE GROUPED_ABI_INFORM_RESEARCH GUQSH HCIFZ HECYW HGD HQ6 HVGLF H~9 IAO ICW IEA IGG IOF ISR ITC JAA JBU JMS JPL JSODD JST K6 K60 K6V K7- L6V M0C M0N M2O M7S MBDVC MV1 N95 NIEAY P-O P2P P62 PADUT PQEST PQQKQ PQUKI PRG PRINS PROAC PTHSS QWB RNS RPU RXW SA0 TAE TH9 TN5 TUS U5U WH7 X XFK XHC XI7 Y99 ZL0 ZY4 -~X .DC 18M 2AX AAWTO ABDNZ ABFAN ABKVW ABQDR ABYRZ ABYWD ABYYQ ACGFO ACMTB ACTMH ACVFL ACXJH AEGXH AELLO AEMOZ AFVYC AHAJD AHQJS AIAGR AKBRZ ALRMG AMVHM BAAKF IPSME JAAYA JBMMH JBZCM JENOY JHFFW JKQEH JLEZI JLXEF JPPEU K1G K6~ 8H~ AAOAC AAWIL AAYXX ABAWQ ABXSQ ACDIW ACHJO ACUHF ADULT AGLNM AIHAF APTMU ASMEE CCPQU CITATION PHGZM PHGZT PQBIZ PQBZA WHG XOL JQ2 1XC PQGLB PUEGO VOOES
ID	FETCH-LOGICAL-c640t-51c58b61e98a3bd5f0a1597223db6cdba39c3e03a4919a4ee7e773789177fddc3
ISSN	0364-765X
IngestDate	Fri Sep 12 12:41:03 EDT 2025 Sat Aug 16 10:21:26 EDT 2025 Tue Jun 17 22:09:54 EDT 2025 Fri Jun 13 00:00:40 EDT 2025 Tue Jun 10 21:03:47 EDT 2025 Fri Jun 27 05:27:20 EDT 2025 Fri Jun 27 05:26:52 EDT 2025 Fri May 23 02:45:52 EDT 2025 Tue Jul 01 02:11:00 EDT 2025 Thu Apr 24 22:52:16 EDT 2025 Thu May 29 08:47:49 EDT 2025 Wed Jan 06 02:48:02 EST 2021
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Issue	2
Keywords	multi-armed bandits non-asymptotic lower bounds cumulative regret information-theoretic proof techniques
Language	English
License	Distributed under a Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0
LinkModel	OpenURL
MergedId	FETCHMERGED-LOGICAL-c640t-51c58b61e98a3bd5f0a1597223db6cdba39c3e03a4919a4ee7e773789177fddc3
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ORCID	0000-0003-1240-1007 0000-0002-4906-9573
OpenAccessLink	https://hal.science/hal-01276324
PQID	2253246633
PQPubID	37790
PageCount	23
ParticipantIDs	gale_infotracmisc_A589377960 crossref_primary_10_1287_moor_2017_0928 informs_primary_10_1287_moor_2017_0928 gale_infotracgeneralonefile_A589377960 gale_incontextgauss__A589377960 gale_businessinsightsgauss_A589377960 proquest_journals_2253246633 hal_primary_oai_HAL_hal_01276324v3 crossref_citationtrail_10_1287_moor_2017_0928 gale_incontextgauss_ISR_A589377960 gale_infotracacademiconefile_A589377960 jstor_primary_27287841
ProviderPackageCode	Y99 RPU NIEAY CITATION AAYXX
PublicationCentury	2000
PublicationDate	2019-05-01
PublicationDateYYYYMMDD	2019-05-01
PublicationDate_xml	– month: 05 year: 2019 text: 2019-05-01 day: 01
PublicationDecade	2010
PublicationPlace	Linthicum
PublicationPlace_xml	– name: Linthicum
PublicationTitle	Mathematics of operations research
PublicationYear	2019
Publisher	INFORMS Institute for Operations Research and the Management Sciences
Publisher_xml	– name: INFORMS – name: Institute for Operations Research and the Management Sciences
References	B20 B21 B22 B23 B24 B25 B10 B11 B12 B13 B14 B15 B16 B17 B18 B19 B1 B2 B3 B4 B5 B6 B7 B8 B9 Kaufmann E (B19) 2016; 17 Wu Y (B25) 2015 Bubeck S (B7) 2013; 30 Mannor S (B23) 2004; 5 Garivier A (B16) 2016 Honda J (B17) 2015; 16 Lehmann EL (B22) 1998 Ali SM (B1) 1966; 28
References_xml	– ident: B12 – ident: B9 – ident: B14 – ident: B10 – ident: B3 – ident: B20 – ident: B1 – ident: B7 – ident: B5 – ident: B25 – ident: B23 – ident: B21 – ident: B18 – ident: B16 – ident: B8 – ident: B11 – ident: B13 – ident: B2 – ident: B4 – ident: B6 – ident: B24 – ident: B22 – ident: B17 – ident: B15 – ident: B19 – volume: 28 start-page: 131 issue: 1 year: 1966 ident: B1 publication-title: J. Roy. Statist. Soc. Ser. B. Methodological doi: 10.1111/j.2517-6161.1966.tb00626.x – ident: B3 doi: 10.1023/A:1013689704352 – volume-title: Theory of Point Estimation year: 1998 ident: B22 – start-page: 1360 volume-title: Advances in Neural Information Processing Systems 28 (NIPS 2015) year: 2015 ident: B25 – volume: 16 start-page: 3721 year: 2015 ident: B17 publication-title: J. Machine Learn. Res. – ident: B2 doi: 10.1007/s10998-010-3055-6 – volume: 30 start-page: 122 year: 2013 ident: B7 publication-title: Proc. 26th Annual Conf. Learn. Theory (COLT), JMLR W&CP – ident: B24 doi: 10.2307/2332286 – volume: 17 start-page: 1 issue: 1 year: 2016 ident: B19 publication-title: J. Machine Learn. Res. – ident: B20 doi: 10.1109/9.847107 – start-page: 784 volume-title: Advances in Neural Information Processing Systems 29 (NIPS 2016) year: 2016 ident: B16 – ident: B12 doi: 10.1017/CBO9780511546921 – ident: B9 doi: 10.1006/aama.1996.0007 – ident: B6 doi: 10.1561/2200000024 – volume: 5 start-page: 623 year: 2004 ident: B23 publication-title: J. Machine Learn. Res. – ident: B15 doi: 10.1051/proc/201551014 – ident: B11 doi: 10.1214/13-AOS1119 – ident: B21 doi: 10.1016/0196-8858(85)90002-8 – ident: B4 doi: 10.1137/S0097539701398375
SSID	ssj0015714
Score	2.5527742
Snippet	We revisit lower bounds on the regret in the case of multiarmed bandit problems. We obtain nonasymptotic, distribution-dependent bounds and provide simple... We revisit lower bounds on the regret in the case of multi-armed bandit problems. We obtain non-asymptotic, distribution-dependent bounds and provide...
SourceID	hal proquest gale crossref jstor informs
SourceType	Open Access Repository Aggregation Database Enrichment Source Index Database Publisher
StartPage	377
SubjectTerms	Boundary value problems Computer Science cumulative regret Entropy Information theory information-theoretic proof techniques Lower bounds Machine Learning Mathematical models Mathematical research Mathematics multiarmed bandits nonasymptotic lower bounds Operations research Probability distribution Randomized algorithms Statistics Uncertainty (Information theory)
Title	Explore First, Exploit Next: The True Shape of Regret in Bandit Problems
URI	https://www.jstor.org/stable/27287841 https://www.proquest.com/docview/2253246633 https://hal.science/hal-01276324
Volume	44
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1bb9MwFLa6TiB44DKYKAywELCHkdHESZzwVia6chlMrJv6FjmJMyqNpGpShPj1nBM7N62IwUtUWUeJ4_PFPna_8x1CnmMFdwELj5FILzJsmYQGLCOhwaKYSSv2JItLguxnd3Jqf5g5s15v0WItrYpwP_q1Nq_kf7wKbeBXzJL9B8_WN4UG-A3-hSt4GK5X8rEi0Mm98XypUjfKhnkBU9fPouJTTJcribrMC6mER2CDXZYFeIsJLQVmCmBFmbwdpR7VWq4l0SNbyKVmzGltoPoM-RC22riyllMMajPPm9SyI5kKRZw_BouGZHtSZBdlFdm9Q0xE7Bw8YK6T0z54aMgMyIf80vSkogzWFNCGyFNNWO1zSObaBnedWXtKVpKQGnpWa35lqubLpXnfwpOT8fcsQ4lXk-8PfZ1y3tXSHjkYnnHYtW2QTYtDtNUnm6OPZwfH9T9PDje15Jjqlhb6hAe87t6-E8jo5XzjG7JprynN27yiuF5a5svYZXqH3NKbDjpSCLpLejLdIternIctcruq7UH1yG2Rmy2hyntkopFGS6S9ohpnFHH2hgLKKKKMliijWUIVyug8pQpltELZfXI6fjc9mBi6BocRufawMBwzcrzQNaXvCRbGTjIUEABzCCrj0I3iUDA_YnLIhO2bvrCl5JJzxj3f5DyJ44htk36apfIBoUkoBXdjGHaL224ofcdmEHEL4fqJGzrWgBjVgAaRFqjHOikXAW5UwQEBOiBABwTogAHZre0XSprlj5Yv0D-BrusKlxxPvvJzscrzoAHFgDwr7VAXJUXilTJ4f_K1Y_R0nVHHYldbJBm8QCR0pgsMA4qtdSxfdizPldT8OsOdjiGsAVG344C8ehRQMn4y-hRgG1JLsCTDDwYP08D863Btl7itzSwORp5tQi8qIAd6GswDCAjg9vApsYdXfZtH5EYzoeyQfgEQfQyxfRE-0Z_jb8Gy-h4
linkProvider	EBSCOhost
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Explore+First%2C+Exploit+Next%3A+The+True+Shape+of+Regret+in+Bandit+Problems&rft.jtitle=Mathematics+of+operations+research&rft.au=Garivier%2C+Aurelien&rft.au=Menard%2C+Pierre&rft.au=Stoltz%2C+Gilles&rft.date=2019-05-01&rft.pub=Institute+for+Operations+Research+and+the+Management+Sciences&rft.issn=0364-765X&rft.volume=44&rft.issue=2&rft.spage=377&rft_id=info:doi/10.1287%2Fmoor.2017.0928&rft.externalDocID=A589377960
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0364-765X&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0364-765X&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0364-765X&client=summon