Explore First, Exploit Next: The True Shape of Regret in Bandit Problems
We revisit lower bounds on the regret in the case of multiarmed bandit problems. We obtain nonasymptotic, distribution-dependent bounds and provide simple proofs based only on well-known properties of Kullback–Leibler divergences. These bounds show in particular that in the initial phase the regret...
Saved in:
Published in | Mathematics of operations research Vol. 44; no. 2; pp. 377 - 399 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
Linthicum
INFORMS
01.05.2019
Institute for Operations Research and the Management Sciences |
Subjects | |
Online Access | Get full text |
ISSN | 0364-765X 1526-5471 |
DOI | 10.1287/moor.2017.0928 |
Cover
Abstract | We revisit lower bounds on the regret in the case of multiarmed bandit problems. We obtain nonasymptotic, distribution-dependent bounds and provide simple proofs based only on well-known properties of Kullback–Leibler divergences. These bounds show in particular that in the initial phase the regret grows almost linearly, and that the well-known logarithmic growth of the regret only holds in a final phase. The proof techniques come to the essence of the information-theoretic arguments used and they involve no unnecessary complications. |
---|---|
AbstractList | We revisit lower bounds on the regret in the case of multiarmed bandit problems. We obtain nonasymptotic, distribution-dependent bounds and provide simple proofs based only on well-known properties of Kullback-Leibler divergences. These bounds show in particular that in the initial phase the regret grows almost linearly, and that the well-known logarithmic growth of the regret only holds in a final phase. The proof techniques come to the essence of the information-theoretic arguments used and they involve no unnecessary complications. Funding: This work was partially supported by the CIMI (Centre International de Mathdmatiques et d'Informatique) Excellence program while Gilles Stoltz visited Toulouse in November 2015. The authors acknowledge the support of the French Agence Nationale de la Recherche (ANR), under project SPADRO [Grant ANR-13-BS01-0005] and under project ALICIA [Grant ANR-13-CORD-0020]. Gilles Stoltz would like to thank Investissements d'Avenir [Grant ANR-11-IDEX-0003/Labex Ecodec/ANR-ll-LABX-0047] for financial support. Keywords: multiarmed bandits * cumulative regret * Information-theoretic proof techniques * nonasymptotic lower bounds We revisit lower bounds on the regret in the case of multiarmed bandit problems. We obtain nonasymptotic, distribution-dependent bounds and provide simple proofs based only on well-known properties of Kullback–Leibler divergences. These bounds show in particular that in the initial phase the regret grows almost linearly, and that the well-known logarithmic growth of the regret only holds in a final phase. The proof techniques come to the essence of the information-theoretic arguments used and they involve no unnecessary complications. We revisit lower bounds on the regret in the case of multi-armed bandit problems. We obtain non-asymptotic, distribution-dependent bounds and provide straightforward proofs based only on well-known properties of Kullback-Leibler divergences. These bounds show in particular that in an initial phase the regret grows almost linearly, and that the well-known logarithmic growth of the regret only holds in a final phase. The proof techniques come to the essence of the information-theoretic arguments used and they are deprived of all unnecessary complications. |
Audience | Academic |
Author | Stoltz, Gilles Ménard, Pierre Garivier, Aurélien |
Author_xml | – sequence: 1 givenname: Aurélien surname: Garivier fullname: Garivier, Aurélien – sequence: 2 givenname: Pierre surname: Ménard fullname: Ménard, Pierre – sequence: 3 givenname: Gilles surname: Stoltz fullname: Stoltz, Gilles |
BackLink | https://hal.science/hal-01276324$$DView record in HAL |
BookMark | eNqFkt1r2zAUxcXoYGm3170NBINBoc70YUv23rLSNoWwjTaDvQnZvk4UbCuV5NH995ObfTQjZQgkdPU70uHqHKOj3vaA0GtKppTl8n1nrZsyQuWUFCx_hiY0YyLJUkmP0IRwkSZSZN9eoGPvN4TQTNJ0guYX99vWOsCXxvlwhh-2JuBPcB8-4OUa8NINgG_XegvYNvgGVg4CNj3-qPs6gl-cLVvo_Ev0vNGth1e_1hP09fJieT5PFp-vrs9ni6QSKQlJRqssLwWFIte8rLOGaJoVkjFel6KqS82LigPhOi1ooVMACVJymRdUyqauK36CTnf3rnWrts502v1QVhs1ny3UWCOUScFZ-p1H9u2O3Tp7N4APamMH10d7irEsMkLwR9RKt6BM39jgdNUZX6lZlhdcykKQSCUHqBX04HQbP6IxsbzHTw_wcdTQmeqg4N2eIDIhfsJKD96rffD0afD69mafPXvEloM3Pfg4ebNaB7-THDJdOeu9g-ZPhylRY8jUGDI1hkyNIYuC9B9BZYIOJlpy2rRPy97sZBsf4sHvR5iMbJ7Sv80e--Y6_z8bPwG6XusE |
CitedBy_id | crossref_primary_10_1080_24725854_2021_1882014 crossref_primary_10_1109_JSAIT_2021_3081433 crossref_primary_10_1109_LCSYS_2020_2982455 crossref_primary_10_1007_s10994_021_05956_1 crossref_primary_10_1109_LCSYS_2024_3514995 crossref_primary_10_1109_JAS_2021_1004141 crossref_primary_10_1145_3224431 crossref_primary_10_1214_19_STS716 crossref_primary_10_1103_PhysRevResearch_2_033295 crossref_primary_10_1109_TAC_2021_3077454 crossref_primary_10_1109_TIT_2022_3159600 crossref_primary_10_1109_TMC_2024_3424192 crossref_primary_10_1109_TAC_2022_3221705 crossref_primary_10_1214_24_AOS2395 crossref_primary_10_1080_07474946_2024_2428245 crossref_primary_10_2139_ssrn_3892631 crossref_primary_10_1007_s11134_022_09816_0 crossref_primary_10_1109_LCSYS_2020_3005224 crossref_primary_10_1287_msom_2022_1116 crossref_primary_10_1287_msom_2022_0412 crossref_primary_10_1080_07474946_2021_1847965 |
Cites_doi | 10.1111/j.2517-6161.1966.tb00626.x 10.1023/A:1013689704352 10.1007/s10998-010-3055-6 10.2307/2332286 10.1109/9.847107 10.1017/CBO9780511546921 10.1006/aama.1996.0007 10.1561/2200000024 10.1051/proc/201551014 10.1214/13-AOS1119 10.1016/0196-8858(85)90002-8 10.1137/S0097539701398375 |
ContentType | Journal Article |
Copyright | Copyright: © 2018 INFORMS COPYRIGHT 2019 Institute for Operations Research and the Management Sciences Copyright Institute for Operations Research and the Management Sciences May 2019 Distributed under a Creative Commons Attribution 4.0 International License |
Copyright_xml | – notice: Copyright: © 2018 INFORMS – notice: COPYRIGHT 2019 Institute for Operations Research and the Management Sciences – notice: Copyright Institute for Operations Research and the Management Sciences May 2019 – notice: Distributed under a Creative Commons Attribution 4.0 International License |
DBID | AAYXX CITATION N95 ISR JQ2 1XC VOOES |
DOI | 10.1287/moor.2017.0928 |
DatabaseName | CrossRef Business: Insights (Gale) Gale In Context: Science ProQuest Computer Science Collection Hyper Article en Ligne (HAL) Hyper Article en Ligne (HAL) (Open Access) |
DatabaseTitle | CrossRef ProQuest Computer Science Collection |
DatabaseTitleList | ProQuest Computer Science Collection CrossRef |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Engineering Computer Science Business Mathematics Statistics |
EISSN | 1526-5471 |
EndPage | 399 |
ExternalDocumentID | oai_HAL_hal_01276324v3 A589377960 10_1287_moor_2017_0928 27287841 moor20170928 |
Genre | Research Articles |
GroupedDBID | 08R 1AW 1OL 29M 3V. 4.4 4S 5GY 7WY 85S 8AL 8AO 8FE 8FG 8FL 8G5 8H 8VB AAKYL AAPBV ABBHK ABEFU ABFLS ABJCF ABPPZ ABUWG ACIWK ACNCT ACYGS ADCOW ADGDI ADMHP ADODI AEILP AELPN AENEX AEUPB AFKRA AFXKK AKVCP ALMA_UNASSIGNED_HOLDINGS ARAPS ARCSS AZQEC BDTQF BENPR BES BEZIV BGLVJ BHOJU BKOMP BPHCQ CBXGM CHNMF CS3 CWXUR CZBKB DQDLB DSRWC DWQXO EBA EBE EBO EBR EBS EBU ECEWR ECR ECS EDO EFSUC EJD EMK EPL F20 FEDTE FRNLG GIFXF GNUQQ GROUPED_ABI_INFORM_COMPLETE GROUPED_ABI_INFORM_RESEARCH GUQSH HCIFZ HECYW HGD HQ6 HVGLF H~9 IAO ICW IEA IGG IOF ISR ITC JAA JBU JMS JPL JSODD JST K6 K60 K6V K7- L6V M0C M0N M2O M7S MBDVC MV1 N95 NIEAY P-O P2P P62 PADUT PQEST PQQKQ PQUKI PRG PRINS PROAC PTHSS QWB RNS RPU RXW SA0 TAE TH9 TN5 TUS U5U WH7 X XFK XHC XI7 Y99 ZL0 ZY4 -~X .DC 18M 2AX AAWTO ABDNZ ABFAN ABKVW ABQDR ABYRZ ABYWD ABYYQ ACGFO ACMTB ACTMH ACVFL ACXJH AEGXH AELLO AEMOZ AFVYC AHAJD AHQJS AIAGR AKBRZ ALRMG AMVHM BAAKF IPSME JAAYA JBMMH JBZCM JENOY JHFFW JKQEH JLEZI JLXEF JPPEU K1G K6~ 8H~ AAOAC AAWIL AAYXX ABAWQ ABXSQ ACDIW ACHJO ACUHF ADULT AGLNM AIHAF APTMU ASMEE CCPQU CITATION PHGZM PHGZT PQBIZ PQBZA WHG XOL JQ2 1XC PQGLB PUEGO VOOES |
ID | FETCH-LOGICAL-c640t-51c58b61e98a3bd5f0a1597223db6cdba39c3e03a4919a4ee7e773789177fddc3 |
ISSN | 0364-765X |
IngestDate | Fri Sep 12 12:41:03 EDT 2025 Sat Aug 16 10:21:26 EDT 2025 Tue Jun 17 22:09:54 EDT 2025 Fri Jun 13 00:00:40 EDT 2025 Tue Jun 10 21:03:47 EDT 2025 Fri Jun 27 05:27:20 EDT 2025 Fri Jun 27 05:26:52 EDT 2025 Fri May 23 02:45:52 EDT 2025 Tue Jul 01 02:11:00 EDT 2025 Thu Apr 24 22:52:16 EDT 2025 Thu May 29 08:47:49 EDT 2025 Wed Jan 06 02:48:02 EST 2021 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 2 |
Keywords | multi-armed bandits non-asymptotic lower bounds cumulative regret information-theoretic proof techniques |
Language | English |
License | Distributed under a Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0 |
LinkModel | OpenURL |
MergedId | FETCHMERGED-LOGICAL-c640t-51c58b61e98a3bd5f0a1597223db6cdba39c3e03a4919a4ee7e773789177fddc3 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ORCID | 0000-0003-1240-1007 0000-0002-4906-9573 |
OpenAccessLink | https://hal.science/hal-01276324 |
PQID | 2253246633 |
PQPubID | 37790 |
PageCount | 23 |
ParticipantIDs | gale_infotracmisc_A589377960 crossref_primary_10_1287_moor_2017_0928 informs_primary_10_1287_moor_2017_0928 gale_infotracgeneralonefile_A589377960 gale_incontextgauss__A589377960 gale_businessinsightsgauss_A589377960 proquest_journals_2253246633 hal_primary_oai_HAL_hal_01276324v3 crossref_citationtrail_10_1287_moor_2017_0928 gale_incontextgauss_ISR_A589377960 gale_infotracacademiconefile_A589377960 jstor_primary_27287841 |
ProviderPackageCode | Y99 RPU NIEAY CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 2019-05-01 |
PublicationDateYYYYMMDD | 2019-05-01 |
PublicationDate_xml | – month: 05 year: 2019 text: 2019-05-01 day: 01 |
PublicationDecade | 2010 |
PublicationPlace | Linthicum |
PublicationPlace_xml | – name: Linthicum |
PublicationTitle | Mathematics of operations research |
PublicationYear | 2019 |
Publisher | INFORMS Institute for Operations Research and the Management Sciences |
Publisher_xml | – name: INFORMS – name: Institute for Operations Research and the Management Sciences |
References | B20 B21 B22 B23 B24 B25 B10 B11 B12 B13 B14 B15 B16 B17 B18 B19 B1 B2 B3 B4 B5 B6 B7 B8 B9 Kaufmann E (B19) 2016; 17 Wu Y (B25) 2015 Bubeck S (B7) 2013; 30 Mannor S (B23) 2004; 5 Garivier A (B16) 2016 Honda J (B17) 2015; 16 Lehmann EL (B22) 1998 Ali SM (B1) 1966; 28 |
References_xml | – ident: B12 – ident: B9 – ident: B14 – ident: B10 – ident: B3 – ident: B20 – ident: B1 – ident: B7 – ident: B5 – ident: B25 – ident: B23 – ident: B21 – ident: B18 – ident: B16 – ident: B8 – ident: B11 – ident: B13 – ident: B2 – ident: B4 – ident: B6 – ident: B24 – ident: B22 – ident: B17 – ident: B15 – ident: B19 – volume: 28 start-page: 131 issue: 1 year: 1966 ident: B1 publication-title: J. Roy. Statist. Soc. Ser. B. Methodological doi: 10.1111/j.2517-6161.1966.tb00626.x – ident: B3 doi: 10.1023/A:1013689704352 – volume-title: Theory of Point Estimation year: 1998 ident: B22 – start-page: 1360 volume-title: Advances in Neural Information Processing Systems 28 (NIPS 2015) year: 2015 ident: B25 – volume: 16 start-page: 3721 year: 2015 ident: B17 publication-title: J. Machine Learn. Res. – ident: B2 doi: 10.1007/s10998-010-3055-6 – volume: 30 start-page: 122 year: 2013 ident: B7 publication-title: Proc. 26th Annual Conf. Learn. Theory (COLT), JMLR W&CP – ident: B24 doi: 10.2307/2332286 – volume: 17 start-page: 1 issue: 1 year: 2016 ident: B19 publication-title: J. Machine Learn. Res. – ident: B20 doi: 10.1109/9.847107 – start-page: 784 volume-title: Advances in Neural Information Processing Systems 29 (NIPS 2016) year: 2016 ident: B16 – ident: B12 doi: 10.1017/CBO9780511546921 – ident: B9 doi: 10.1006/aama.1996.0007 – ident: B6 doi: 10.1561/2200000024 – volume: 5 start-page: 623 year: 2004 ident: B23 publication-title: J. Machine Learn. Res. – ident: B15 doi: 10.1051/proc/201551014 – ident: B11 doi: 10.1214/13-AOS1119 – ident: B21 doi: 10.1016/0196-8858(85)90002-8 – ident: B4 doi: 10.1137/S0097539701398375 |
SSID | ssj0015714 |
Score | 2.5527742 |
Snippet | We revisit lower bounds on the regret in the case of multiarmed bandit problems. We obtain nonasymptotic, distribution-dependent bounds and provide simple... We revisit lower bounds on the regret in the case of multi-armed bandit problems. We obtain non-asymptotic, distribution-dependent bounds and provide... |
SourceID | hal proquest gale crossref jstor informs |
SourceType | Open Access Repository Aggregation Database Enrichment Source Index Database Publisher |
StartPage | 377 |
SubjectTerms | Boundary value problems Computer Science cumulative regret Entropy Information theory information-theoretic proof techniques Lower bounds Machine Learning Mathematical models Mathematical research Mathematics multiarmed bandits nonasymptotic lower bounds Operations research Probability distribution Randomized algorithms Statistics Uncertainty (Information theory) |
Title | Explore First, Exploit Next: The True Shape of Regret in Bandit Problems |
URI | https://www.jstor.org/stable/27287841 https://www.proquest.com/docview/2253246633 https://hal.science/hal-01276324 |
Volume | 44 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1bb9MwFLa6TiB44DKYKAywELCHkdHESZzwVia6chlMrJv6FjmJMyqNpGpShPj1nBM7N62IwUtUWUeJ4_PFPna_8x1CnmMFdwELj5FILzJsmYQGLCOhwaKYSSv2JItLguxnd3Jqf5g5s15v0WItrYpwP_q1Nq_kf7wKbeBXzJL9B8_WN4UG-A3-hSt4GK5X8rEi0Mm98XypUjfKhnkBU9fPouJTTJcribrMC6mER2CDXZYFeIsJLQVmCmBFmbwdpR7VWq4l0SNbyKVmzGltoPoM-RC22riyllMMajPPm9SyI5kKRZw_BouGZHtSZBdlFdm9Q0xE7Bw8YK6T0z54aMgMyIf80vSkogzWFNCGyFNNWO1zSObaBnedWXtKVpKQGnpWa35lqubLpXnfwpOT8fcsQ4lXk-8PfZ1y3tXSHjkYnnHYtW2QTYtDtNUnm6OPZwfH9T9PDje15Jjqlhb6hAe87t6-E8jo5XzjG7JprynN27yiuF5a5svYZXqH3NKbDjpSCLpLejLdIternIctcruq7UH1yG2Rmy2hyntkopFGS6S9ohpnFHH2hgLKKKKMliijWUIVyug8pQpltELZfXI6fjc9mBi6BocRufawMBwzcrzQNaXvCRbGTjIUEABzCCrj0I3iUDA_YnLIhO2bvrCl5JJzxj3f5DyJ44htk36apfIBoUkoBXdjGHaL224ofcdmEHEL4fqJGzrWgBjVgAaRFqjHOikXAW5UwQEBOiBABwTogAHZre0XSprlj5Yv0D-BrusKlxxPvvJzscrzoAHFgDwr7VAXJUXilTJ4f_K1Y_R0nVHHYldbJBm8QCR0pgsMA4qtdSxfdizPldT8OsOdjiGsAVG344C8ehRQMn4y-hRgG1JLsCTDDwYP08D863Btl7itzSwORp5tQi8qIAd6GswDCAjg9vApsYdXfZtH5EYzoeyQfgEQfQyxfRE-0Z_jb8Gy-h4 |
linkProvider | EBSCOhost |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Explore+First%2C+Exploit+Next%3A+The+True+Shape+of+Regret+in+Bandit+Problems&rft.jtitle=Mathematics+of+operations+research&rft.au=Garivier%2C+Aurelien&rft.au=Menard%2C+Pierre&rft.au=Stoltz%2C+Gilles&rft.date=2019-05-01&rft.pub=Institute+for+Operations+Research+and+the+Management+Sciences&rft.issn=0364-765X&rft.volume=44&rft.issue=2&rft.spage=377&rft_id=info:doi/10.1287%2Fmoor.2017.0928&rft.externalDocID=A589377960 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0364-765X&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0364-765X&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0364-765X&client=summon |