A Linear Response Bandit Problem

We consider a two–armed bandit problem which involves sequential sampling from two non-homogeneous populations. The response in each is determined by a random covariate vector and a vector of parameters whose values are not known a priori. The goal is to maximize cumulative expected reward. We study...

Full description

Saved in:

Bibliographic Details
Published in	Stochastic systems Vol. 3; no. 1; pp. 230 - 261
Main Authors	Goldenshluger, Alexander, Zeevi, Assaf
Format	Journal Article
Language	English
Published	Institute for Operations Research and the Management Sciences (INFORMS) 01.06.2013
Subjects	bandit problems estimation minimax rate–optimal policy regret Sequential allocation
Online Access	Get full text
ISSN	1946-5238 1946-5238
DOI	10.1287/11-SSY032

Cover

Abstract	We consider a two–armed bandit problem which involves sequential sampling from two non-homogeneous populations. The response in each is determined by a random covariate vector and a vector of parameters whose values are not known a priori. The goal is to maximize cumulative expected reward. We study this problem in a minimax setting, and develop rate-optimal polices that combine myopic action based on least squares estimates with a suitable “forced sampling” strategy. It is shown that the regret grows logarithmically in the time horizon n and no policy can achieve a slower growth rate over all feasible problem instances. In this setting of linear response bandits, the identity of the sub-optimal action changes with the values of the covariate vector, and the optimal policy is subject to sampling from the inferior population at a rate that grows like [Formula: see text].
AbstractList	We consider a two–armed bandit problem which involves sequentialsampling from two non-homogeneous populations. The responsein each is determined by a random covariate vector and a vector ofparameters whose values are not known a priori.The goal is to maximize cumulative expected reward. We study this problemin a minimax setting, and develop rate-optimal polices that combinemyopic action based on least squares estimates with a suitable "forced sampling'' strategy. It is shown that the regret growslogarithmically in the time horizon n and no policy can achievea slower growth rate over all feasible problem instances. In thissetting of linear response bandits, the identity of thesub-optimal action changes with the values of the covariatevector, and the optimal policy is subject to sampling from theinferior population at a rate that grows like $sqrt{n}$. We consider a two–armed bandit problem which involves sequential sampling from two non-homogeneous populations. The response in each is determined by a random covariate vector and a vector of parameters whose values are not known a priori. The goal is to maximize cumulative expected reward. We study this problem in a minimax setting, and develop rate-optimal polices that combine myopic action based on least squares estimates with a suitable “forced sampling” strategy. It is shown that the regret grows logarithmically in the time horizon n and no policy can achieve a slower growth rate over all feasible problem instances. In this setting of linear response bandits, the identity of the sub-optimal action changes with the values of the covariate vector, and the optimal policy is subject to sampling from the inferior population at a rate that grows like [Formula: see text].
Author	Goldenshluger, Alexander Zeevi, Assaf
Author_xml	– sequence: 1 givenname: Alexander surname: Goldenshluger fullname: Goldenshluger, Alexander organization: Department of Statistics, University of Haifa, Haifa 31905, Israel – sequence: 2 givenname: Assaf surname: Zeevi fullname: Zeevi, Assaf organization: Graduate School of Business, Columbia University New York, NY 10027, USA
BookMark	eNptkF1LwzAUhoNMcM5d-A96q1CX5DRtcjmHH4OB4vTCq3DapJLRNSOpyP691YqI7NycD57zXLynZNT61hJyzugV47KYMZau168U-BEZM5XlqeAgR3_mEzKNcUP7yqkQSo5JMk9WrrUYkicbd76NNrnG1rgueQy-bOz2jBzX2EQ7_ekT8nJ787y4T1cPd8vFfJVWrBA8BSwYUkN5ZkwFJQOVC1VYibLudwF5BaCw4IXKwNa5xUrysrY9XYKoDYUJWQ5e43Gjd8FtMey1R6e_Dz68aQydqxqrCyhR8czWElRmuJUMDSjFMioUzwF71-Xgem93uP_ApvkVMqq_otKM6Rj3fVQ9fDHAVfAxBlsfYodYe3b2j61ch53zbRfQNQc-PgF7-3gv
CitedBy_id	crossref_primary_10_1111_sjos_12621 crossref_primary_10_1287_mnsc_2020_3819 crossref_primary_10_1111_poms_13380 crossref_primary_10_1214_16_AOS1518 crossref_primary_10_1287_mnsc_2023_4928 crossref_primary_10_2139_ssrn_4779711 crossref_primary_10_1287_mnsc_2022_4383 crossref_primary_10_1287_opre_2022_2365 crossref_primary_10_1287_mnsc_2023_4678 crossref_primary_10_2139_ssrn_4160045 crossref_primary_10_1287_mnsc_2022_01985 crossref_primary_10_2139_ssrn_3892631 crossref_primary_10_1287_mnsc_2023_4895 crossref_primary_10_1287_moor_2021_0167 crossref_primary_10_1080_01621459_2022_2108816 crossref_primary_10_1109_LCSYS_2020_3047601 crossref_primary_10_2139_ssrn_3294832 crossref_primary_10_1287_mnsc_2022_00490 crossref_primary_10_1080_01621459_2024_2439622 crossref_primary_10_2139_ssrn_3483934 crossref_primary_10_1214_21_EJS1909 crossref_primary_10_1007_s10707_025_00541_3 crossref_primary_10_1080_01621459_2020_1770098 crossref_primary_10_1287_opre_2020_2016 crossref_primary_10_1016_j_eswa_2023_123060 crossref_primary_10_1287_mnsc_2020_3605 crossref_primary_10_1007_s10729_020_09522_4 crossref_primary_10_1287_mksc_2022_0406 crossref_primary_10_1287_ijoc_2020_1009 crossref_primary_10_2139_ssrn_3893198 crossref_primary_10_1287_mnsc_2020_3773 crossref_primary_10_1287_mnsc_2023_4921 crossref_primary_10_1109_TMM_2022_3199666 crossref_primary_10_1080_01621459_2022_2152343 crossref_primary_10_1287_ijoo_2018_0005 crossref_primary_10_1080_01621459_2020_1826325 crossref_primary_10_1287_mnsc_2022_01557 crossref_primary_10_1287_msom_2022_1116 crossref_primary_10_1287_opre_2019_1902 crossref_primary_10_1287_opre_2019_1948 crossref_primary_10_1287_opre_2021_2215 crossref_primary_10_1287_opre_2021_2237 crossref_primary_10_1287_serv_2022_0306
Cites_doi	10.1111/j.2517-6161.1995.tb02062.x 10.2307/3318681 10.1109/TIT.2011.2104450 10.1214/aos/1079120131 10.1287/moor.1100.0446 10.1137/S0097539701398375 10.1214/aos/1176348382 10.1016/0196-8858(85)90002-8 10.1007/978-94-015-3711-7 10.1080/01621459.1979.10481033 10.1109/9.400491 10.1214/08-AAP589 10.1090/S0002-9904-1952-09620-8 10.1023/A:1013689704352 10.1017/CBO9780511546921
ContentType	Journal Article
DBID	AAYXX CITATION ADTOC UNPAY DOA
DOI	10.1287/11-SSY032
DatabaseName	CrossRef Unpaywall for CDI: Periodical Content Unpaywall DOAJ Directory of Open Access Journals
DatabaseTitle	CrossRef
DatabaseTitleList	CrossRef
Database_xml	– sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website – sequence: 2 dbid: UNPAY name: Unpaywall url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/ sourceTypes: Open Access Repository
DeliveryMethod	fulltext_linktorsrc
Discipline	Engineering
EISSN	1946-5238
EndPage	261
ExternalDocumentID	oai_doaj_org_article_73ba924ef8394d2e81ad39914059263a 10.1287/11-ssy032 10_1287_11_SSY032
GroupedDBID	5VS AAYXX AFFOW AKVCP ALMA_UNASSIGNED_HOLDINGS AMVHM CITATION EBA EBE EBO EBR EBU FRP GR0 GROUPED_DOAJ H13 J9A KQ8 M~E OK1 RBV RPE RPU ADTOC UNPAY
ID	FETCH-LOGICAL-c1752-3a71a0d024ddc3b1396597e8a8fdc3536c339a727943ef6eac82bfe4ddb35fd03
IEDL.DBID	DOA
ISSN	1946-5238
IngestDate	Fri Oct 03 12:41:47 EDT 2025 Mon Sep 15 10:13:24 EDT 2025 Tue Jul 01 02:30:48 EDT 2025 Thu Apr 24 22:51:20 EDT 2025
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Issue	1
Language	English
License	cc-by
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c1752-3a71a0d024ddc3b1396597e8a8fdc3536c339a727943ef6eac82bfe4ddb35fd03
OpenAccessLink	https://doaj.org/article/73ba924ef8394d2e81ad39914059263a
PageCount	32
ParticipantIDs	doaj_primary_oai_doaj_org_article_73ba924ef8394d2e81ad39914059263a unpaywall_primary_10_1287_11_ssy032 crossref_primary_10_1287_11_SSY032 crossref_citationtrail_10_1287_11_SSY032
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2013-06-00
PublicationDateYYYYMMDD	2013-06-01
PublicationDate_xml	– month: 06 year: 2013 text: 2013-06-00
PublicationDecade	2010
PublicationTitle	Stochastic systems
PublicationYear	2013
Publisher	Institute for Operations Research and the Management Sciences (INFORMS)
Publisher_xml	– name: Institute for Operations Research and the Management Sciences (INFORMS)
References	B22 Stewart G. W. (B24) 1990 Gill R. D. (B7) 1995; 1 Lai T. L. (B16) 1985; 6 Woodroofe M. (B28) 1982; 44 Lai T. L. (B17) 1995; 40 Robbins H. (B21) 1952; 55 Woodroofe M. (B27) 1979; 74 Tsybakov A. B. (B25) 2004; 32 Wang C.-C. (B26) 2005; 50 Berry D. A. (B4) 1985 Lai T. L. (B14) 1988; 10 Auer P. (B3) 2002; 32 Lai T. L. (B13) 1987; 15 Lu T. (B19) 2010 Gittins J. C. (B8) 1989 Yang Y. (B29) 2002; 30 Lai T. L. (B15) 2001; 11 Langford J. (B18) 2008; 20 Auer P. (B2) 2002; 47 Goldenshluger A. (B10) 2011; 57 Auer P. (B1) 2002; 3 Cesa–Bianchi N. (B5) 2006 Sarkar J. (B23) 1991; 19 Goldenshluger A. (B9) 2009; 19 Ginebra J. (B6) 1995; 57
References_xml	– volume: 11 start-page: 303 year: 2001 ident: B15 publication-title: Statist. Sinica – volume: 57 start-page: 771 year: 1995 ident: B6 publication-title: J. Roy. Statist. Soc. Ser. B doi: 10.1111/j.2517-6161.1995.tb02062.x – volume: 1 start-page: 59 year: 1995 ident: B7 publication-title: Bernoulli doi: 10.2307/3318681 – volume: 57 start-page: 1707 year: 2011 ident: B10 publication-title: IEEE Trans. Inf. Theory doi: 10.1109/TIT.2011.2104450 – volume: 20 start-page: 817 volume-title: Advances in Neural Information Processing Systems year: 2008 ident: B18 – volume: 32 start-page: 135 year: 2004 ident: B25 publication-title: Ann. Statist. doi: 10.1214/aos/1079120131 – volume: 50 start-page: 799 year: 2005 ident: B26 publication-title: IEEE Trans. Automat. Control – volume: 44 start-page: 403 year: 1982 ident: B28 publication-title: Sankhyā Ser. A – volume: 3 start-page: 397 year: 2002 ident: B1 publication-title: J. Mach. Learn. Res. – ident: B22 doi: 10.1287/moor.1100.0446 – volume: 15 start-page: 1091 year: 1987 ident: B13 publication-title: Ann. Statist. – volume: 32 start-page: 48 year: 2002 ident: B3 publication-title: SIAM J. Comput. doi: 10.1137/S0097539701398375 – volume: 19 start-page: 1978 year: 1991 ident: B23 publication-title: Ann. Statist. doi: 10.1214/aos/1176348382 – volume-title: Proceedings of the 13th International Conference on Artificial Intelligence and Statistics year: 2010 ident: B19 – volume: 6 start-page: 4 year: 1985 ident: B16 publication-title: Adv. Applied Math. doi: 10.1016/0196-8858(85)90002-8 – volume-title: Bandit Problems year: 1985 ident: B4 doi: 10.1007/978-94-015-3711-7 – volume: 74 start-page: 799 year: 1979 ident: B27 publication-title: J. Amer. Statist. Assoc. doi: 10.1080/01621459.1979.10481033 – volume: 10 volume-title: Stochastic Differential Systems, Stochastic Control Theory and Applications year: 1988 ident: B14 – volume-title: Matrix Perturbation Theory year: 1990 ident: B24 – volume: 40 start-page: 1199 year: 1995 ident: B17 publication-title: IEEE Trans. Automat. Control doi: 10.1109/9.400491 – volume-title: Wiley-Interscience Series in Systems and Optimization year: 1989 ident: B8 – volume: 19 start-page: 1603 year: 2009 ident: B9 publication-title: Ann. Appl. Probab. doi: 10.1214/08-AAP589 – volume: 55 start-page: 527 year: 1952 ident: B21 publication-title: Bull. Amer. Math. Soc. doi: 10.1090/S0002-9904-1952-09620-8 – volume: 47 start-page: 235 year: 2002 ident: B2 publication-title: Machine learning doi: 10.1023/A:1013689704352 – volume-title: Prediction, Learning and Games year: 2006 ident: B5 doi: 10.1017/CBO9780511546921 – volume: 30 start-page: 100 year: 2002 ident: B29 publication-title: Annals of Statis.
SSID	ssj0000605598
Score	1.8388599
Snippet	We consider a two–armed bandit problem which involves sequential sampling from two non-homogeneous populations. The response in each is determined by a random... We consider a two–armed bandit problem which involves sequentialsampling from two non-homogeneous populations. The responsein each is determined by a random...
SourceID	doaj unpaywall crossref
SourceType	Open Website Open Access Repository Enrichment Source Index Database
StartPage	230
SubjectTerms	bandit problems estimation minimax rate–optimal policy regret Sequential allocation
SummonAdditionalLinks	– databaseName: Unpaywall dbid: UNPAY link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3dT8IwEL8QeFAe_DbiV-bHgy_Fbd3a7RGMhJhIiEgCT0u7ti-SSQRi8K_39gGCIca39XLJ2mvX-93a-x3ArRTKeILFhHE7JvjEiVR-SLjwOA-1E5rsZ85zh7X73tPAH5TgapELs3p-j2D-3nHIZDK3Ke6yFeYj3C5Dpd_pNobZabHH0kAqKBiD1vTX_ExGx1-FrVkyFvNPMRqt-JDW7k8mTn515K0-m8p6_PWLmPHP7u3BToEgrUY-5ftQ0skBVFd4BQ_BalgYY-Iatl7yK7DaaqbpK1Orm9ePOYJ-6_H1oU2KUggkRv_uEiq4I2yFDlWpmEqEbWgMrgMRGGz7lMWUhgKxSOhRbRjupoErjUZtSX2jbHoM5eQ90SdgcWymp48opp4UoRQGJTpgQqBvErwGdwvDRXHBE56WqxhFabyAg8agIer1hjjoGlwvVcc5OcYmpWZq_aVCymedCdCOUfF5RJxiT1xPG8RrnnJ14AiF0AmjPz90GRU1uFnO3aZX5ZNw-i-tM9h285oWxHbOoTz9mOkLRBZTeVmsrW_oncoO priority: 102 providerName: Unpaywall
Title	A Linear Response Bandit Problem
URI	https://doi.org/10.1287/11-ssy032 https://doaj.org/article/73ba924ef8394d2e81ad39914059263a
UnpaywallVersion	publishedVersion
Volume	3
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
journalDatabaseRights	– providerCode: PRVAFT databaseName: Open Access Digital Library customDbUrl: eissn: 1946-5238 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0000605598 issn: 1946-5238 databaseCode: KQ8 dateStart: 20110101 isFulltext: true titleUrlDefault: http://grweb.coalliance.org/oadl/oadl.html providerName: Colorado Alliance of Research Libraries – providerCode: PRVAON databaseName: DOAJ Directory of Open Access Journals customDbUrl: eissn: 1946-5238 dateEnd: 20171231 omitProxy: true ssIdentifier: ssj0000605598 issn: 1946-5238 databaseCode: DOA dateStart: 20110101 isFulltext: true titleUrlDefault: https://www.doaj.org/ providerName: Directory of Open Access Journals – providerCode: PRVEBS databaseName: Mathematics Source customDbUrl: eissn: 1946-5238 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0000605598 issn: 1946-5238 databaseCode: AMVHM dateStart: 20110601 isFulltext: true titleUrlDefault: https://www.ebsco.com/products/research-databases/mathematics-source providerName: EBSCOhost – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 1946-5238 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0000605598 issn: 1946-5238 databaseCode: M~E dateStart: 20090101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre – providerCode: PRVEGU databaseName: Open Access资源_Project Euclid Open Access Journals customDbUrl: eissn: 1946-5238 dateEnd: 20170131 omitProxy: true ssIdentifier: ssj0000605598 issn: 1946-5238 databaseCode: RBV dateStart: 20110101 isFulltext: true titleUrlDefault: https://projecteuclid.org/Search providerName: Project Euclid
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV3PS8MwFA4yD-pB_InzxyjqwUtY27RJe9zEMQSHOCfbqbw0yWnUsR_I_ntfmjoqDLx4KW14kPal5Pu-Jv0eIfcSlImA55QLP6d4JqhUcUoFREKkOkhN-THnZcD7o-h5HI9rpb7snjBnD-wS1xZMAmoEbRDJIxXqJACFoIq6IE5Dzkpq5CdpTUy5Odi3zuPlknLErdpKKlshVAjtIKDD4cRn4S8wKj37D8jeqpjB-gum0xrQ9I7IYcUQvY67s2Oyo4sTclDzDTwlXsez7BDm3txtcdWetL-nLL2qPswZGfWe3h_7tCp1QHPE75AyEAH4CgFTqZxJpGUcmb5OIDF4HTOeM5YCco00YtpwnC2TUBqN0ZLFRvnsnDSKz0JfEE_gpV1dxGYWYe4kGGzRCQdA7AHRJA8_z5zllQ-4LUcxzawewPSgKMhceprkdhM6c-YX24K6NnGbAOtXXTbgKGbVKGZ_jWKT3G3Svq2rxWKNXV3-R1dXZD90JS2oH1yTxnK-0jdILJayVb5DeHzrfrTI7mjw2pl8A0bmyfc
linkProvider	Directory of Open Access Journals
linkToUnpaywall	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3dT8IwEL8QeFAe_DbiV-bHgy_Fbd3a7RGMhJhIiEgCT0u7ti-SSQRi8K_39gGCIca39XLJ2mvX-93a-x3ArRTKeILFhHE7JvjEiVR-SLjwOA-1E5rsZ85zh7X73tPAH5TgapELs3p-j2D-3nHIZDK3Ke6yFeYj3C5Dpd_pNobZabHH0kAqKBiD1vTX_ExGx1-FrVkyFvNPMRqt-JDW7k8mTn515K0-m8p6_PWLmPHP7u3BToEgrUY-5ftQ0skBVFd4BQ_BalgYY-Iatl7yK7DaaqbpK1Orm9ePOYJ-6_H1oU2KUggkRv_uEiq4I2yFDlWpmEqEbWgMrgMRGGz7lMWUhgKxSOhRbRjupoErjUZtSX2jbHoM5eQ90SdgcWymp48opp4UoRQGJTpgQqBvErwGdwvDRXHBE56WqxhFabyAg8agIer1hjjoGlwvVcc5OcYmpWZq_aVCymedCdCOUfF5RJxiT1xPG8RrnnJ14AiF0AmjPz90GRU1uFnO3aZX5ZNw-i-tM9h285oWxHbOoTz9mOkLRBZTeVmsrW_oncoO
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+linear+response+bandit+problem&rft.jtitle=Stochastic+systems&rft.au=Assaf+Zeevi&rft.au=Alexander+Goldenshluger&rft.date=2013-06-01&rft.pub=Institute+for+Operations+Research+and+the+Management+Sciences+%28INFORMS%29&rft.issn=1946-5238&rft.eissn=1946-5238&rft.volume=3&rft.issue=1&rft.spage=230&rft.epage=261&rft_id=info:doi/10.1287%2F11-SSY032&rft.externalDBID=DOA&rft.externalDocID=oai_doaj_org_article_73ba924ef8394d2e81ad39914059263a
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1946-5238&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1946-5238&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1946-5238&client=summon