A Linear Response Bandit Problem

We consider a two–armed bandit problem which involves sequential sampling from two non-homogeneous populations. The response in each is determined by a random covariate vector and a vector of parameters whose values are not known a priori. The goal is to maximize cumulative expected reward. We study...

Full description

Saved in:
Bibliographic Details
Published inStochastic systems Vol. 3; no. 1; pp. 230 - 261
Main Authors Goldenshluger, Alexander, Zeevi, Assaf
Format Journal Article
LanguageEnglish
Published Institute for Operations Research and the Management Sciences (INFORMS) 01.06.2013
Subjects
Online AccessGet full text
ISSN1946-5238
1946-5238
DOI10.1287/11-SSY032

Cover

Abstract We consider a two–armed bandit problem which involves sequential sampling from two non-homogeneous populations. The response in each is determined by a random covariate vector and a vector of parameters whose values are not known a priori. The goal is to maximize cumulative expected reward. We study this problem in a minimax setting, and develop rate-optimal polices that combine myopic action based on least squares estimates with a suitable “forced sampling” strategy. It is shown that the regret grows logarithmically in the time horizon n and no policy can achieve a slower growth rate over all feasible problem instances. In this setting of linear response bandits, the identity of the sub-optimal action changes with the values of the covariate vector, and the optimal policy is subject to sampling from the inferior population at a rate that grows like [Formula: see text].
AbstractList We consider a two–armed bandit problem which involves sequentialsampling from two non-homogeneous populations. The responsein each is determined by a random covariate vector and a vector ofparameters whose values are not known a priori.The goal is to maximize cumulative expected reward. We study this problemin a minimax setting, and develop rate-optimal polices that combinemyopic action based on least squares estimates with a suitable "forced sampling'' strategy. It is shown that the regret growslogarithmically in the time horizon n and no policy can achievea slower growth rate over all feasible problem instances. In thissetting of linear response bandits, the identity of thesub-optimal action changes with the values of the covariatevector, and the optimal policy is subject to sampling from theinferior population at a rate that grows like $sqrt{n}$.
We consider a two–armed bandit problem which involves sequential sampling from two non-homogeneous populations. The response in each is determined by a random covariate vector and a vector of parameters whose values are not known a priori. The goal is to maximize cumulative expected reward. We study this problem in a minimax setting, and develop rate-optimal polices that combine myopic action based on least squares estimates with a suitable “forced sampling” strategy. It is shown that the regret grows logarithmically in the time horizon n and no policy can achieve a slower growth rate over all feasible problem instances. In this setting of linear response bandits, the identity of the sub-optimal action changes with the values of the covariate vector, and the optimal policy is subject to sampling from the inferior population at a rate that grows like [Formula: see text].
Author Goldenshluger, Alexander
Zeevi, Assaf
Author_xml – sequence: 1
  givenname: Alexander
  surname: Goldenshluger
  fullname: Goldenshluger, Alexander
  organization: Department of Statistics, University of Haifa, Haifa 31905, Israel
– sequence: 2
  givenname: Assaf
  surname: Zeevi
  fullname: Zeevi, Assaf
  organization: Graduate School of Business, Columbia University New York, NY 10027, USA
BookMark eNptkF1LwzAUhoNMcM5d-A96q1CX5DRtcjmHH4OB4vTCq3DapJLRNSOpyP691YqI7NycD57zXLynZNT61hJyzugV47KYMZau168U-BEZM5XlqeAgR3_mEzKNcUP7yqkQSo5JMk9WrrUYkicbd76NNrnG1rgueQy-bOz2jBzX2EQ7_ekT8nJ787y4T1cPd8vFfJVWrBA8BSwYUkN5ZkwFJQOVC1VYibLudwF5BaCw4IXKwNa5xUrysrY9XYKoDYUJWQ5e43Gjd8FtMey1R6e_Dz68aQydqxqrCyhR8czWElRmuJUMDSjFMioUzwF71-Xgem93uP_ApvkVMqq_otKM6Rj3fVQ9fDHAVfAxBlsfYodYe3b2j61ch53zbRfQNQc-PgF7-3gv
CitedBy_id crossref_primary_10_1111_sjos_12621
crossref_primary_10_1287_mnsc_2020_3819
crossref_primary_10_1111_poms_13380
crossref_primary_10_1214_16_AOS1518
crossref_primary_10_1287_mnsc_2023_4928
crossref_primary_10_2139_ssrn_4779711
crossref_primary_10_1287_mnsc_2022_4383
crossref_primary_10_1287_opre_2022_2365
crossref_primary_10_1287_mnsc_2023_4678
crossref_primary_10_2139_ssrn_4160045
crossref_primary_10_1287_mnsc_2022_01985
crossref_primary_10_2139_ssrn_3892631
crossref_primary_10_1287_mnsc_2023_4895
crossref_primary_10_1287_moor_2021_0167
crossref_primary_10_1080_01621459_2022_2108816
crossref_primary_10_1109_LCSYS_2020_3047601
crossref_primary_10_2139_ssrn_3294832
crossref_primary_10_1287_mnsc_2022_00490
crossref_primary_10_1080_01621459_2024_2439622
crossref_primary_10_2139_ssrn_3483934
crossref_primary_10_1214_21_EJS1909
crossref_primary_10_1007_s10707_025_00541_3
crossref_primary_10_1080_01621459_2020_1770098
crossref_primary_10_1287_opre_2020_2016
crossref_primary_10_1016_j_eswa_2023_123060
crossref_primary_10_1287_mnsc_2020_3605
crossref_primary_10_1007_s10729_020_09522_4
crossref_primary_10_1287_mksc_2022_0406
crossref_primary_10_1287_ijoc_2020_1009
crossref_primary_10_2139_ssrn_3893198
crossref_primary_10_1287_mnsc_2020_3773
crossref_primary_10_1287_mnsc_2023_4921
crossref_primary_10_1109_TMM_2022_3199666
crossref_primary_10_1080_01621459_2022_2152343
crossref_primary_10_1287_ijoo_2018_0005
crossref_primary_10_1080_01621459_2020_1826325
crossref_primary_10_1287_mnsc_2022_01557
crossref_primary_10_1287_msom_2022_1116
crossref_primary_10_1287_opre_2019_1902
crossref_primary_10_1287_opre_2019_1948
crossref_primary_10_1287_opre_2021_2215
crossref_primary_10_1287_opre_2021_2237
crossref_primary_10_1287_serv_2022_0306
Cites_doi 10.1111/j.2517-6161.1995.tb02062.x
10.2307/3318681
10.1109/TIT.2011.2104450
10.1214/aos/1079120131
10.1287/moor.1100.0446
10.1137/S0097539701398375
10.1214/aos/1176348382
10.1016/0196-8858(85)90002-8
10.1007/978-94-015-3711-7
10.1080/01621459.1979.10481033
10.1109/9.400491
10.1214/08-AAP589
10.1090/S0002-9904-1952-09620-8
10.1023/A:1013689704352
10.1017/CBO9780511546921
ContentType Journal Article
DBID AAYXX
CITATION
ADTOC
UNPAY
DOA
DOI 10.1287/11-SSY032
DatabaseName CrossRef
Unpaywall for CDI: Periodical Content
Unpaywall
DOAJ Directory of Open Access Journals
DatabaseTitle CrossRef
DatabaseTitleList
CrossRef
Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
– sequence: 2
  dbid: UNPAY
  name: Unpaywall
  url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 1946-5238
EndPage 261
ExternalDocumentID oai_doaj_org_article_73ba924ef8394d2e81ad39914059263a
10.1287/11-ssy032
10_1287_11_SSY032
GroupedDBID 5VS
AAYXX
AFFOW
AKVCP
ALMA_UNASSIGNED_HOLDINGS
AMVHM
CITATION
EBA
EBE
EBO
EBR
EBU
FRP
GR0
GROUPED_DOAJ
H13
J9A
KQ8
M~E
OK1
RBV
RPE
RPU
ADTOC
UNPAY
ID FETCH-LOGICAL-c1752-3a71a0d024ddc3b1396597e8a8fdc3536c339a727943ef6eac82bfe4ddb35fd03
IEDL.DBID DOA
ISSN 1946-5238
IngestDate Fri Oct 03 12:41:47 EDT 2025
Mon Sep 15 10:13:24 EDT 2025
Tue Jul 01 02:30:48 EDT 2025
Thu Apr 24 22:51:20 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 1
Language English
License cc-by
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c1752-3a71a0d024ddc3b1396597e8a8fdc3536c339a727943ef6eac82bfe4ddb35fd03
OpenAccessLink https://doaj.org/article/73ba924ef8394d2e81ad39914059263a
PageCount 32
ParticipantIDs doaj_primary_oai_doaj_org_article_73ba924ef8394d2e81ad39914059263a
unpaywall_primary_10_1287_11_ssy032
crossref_primary_10_1287_11_SSY032
crossref_citationtrail_10_1287_11_SSY032
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2013-06-00
PublicationDateYYYYMMDD 2013-06-01
PublicationDate_xml – month: 06
  year: 2013
  text: 2013-06-00
PublicationDecade 2010
PublicationTitle Stochastic systems
PublicationYear 2013
Publisher Institute for Operations Research and the Management Sciences (INFORMS)
Publisher_xml – name: Institute for Operations Research and the Management Sciences (INFORMS)
References B22
Stewart G. W. (B24) 1990
Gill R. D. (B7) 1995; 1
Lai T. L. (B16) 1985; 6
Woodroofe M. (B28) 1982; 44
Lai T. L. (B17) 1995; 40
Robbins H. (B21) 1952; 55
Woodroofe M. (B27) 1979; 74
Tsybakov A. B. (B25) 2004; 32
Wang C.-C. (B26) 2005; 50
Berry D. A. (B4) 1985
Lai T. L. (B14) 1988; 10
Auer P. (B3) 2002; 32
Lai T. L. (B13) 1987; 15
Lu T. (B19) 2010
Gittins J. C. (B8) 1989
Yang Y. (B29) 2002; 30
Lai T. L. (B15) 2001; 11
Langford J. (B18) 2008; 20
Auer P. (B2) 2002; 47
Goldenshluger A. (B10) 2011; 57
Auer P. (B1) 2002; 3
Cesa–Bianchi N. (B5) 2006
Sarkar J. (B23) 1991; 19
Goldenshluger A. (B9) 2009; 19
Ginebra J. (B6) 1995; 57
References_xml – volume: 11
  start-page: 303
  year: 2001
  ident: B15
  publication-title: Statist. Sinica
– volume: 57
  start-page: 771
  year: 1995
  ident: B6
  publication-title: J. Roy. Statist. Soc. Ser. B
  doi: 10.1111/j.2517-6161.1995.tb02062.x
– volume: 1
  start-page: 59
  year: 1995
  ident: B7
  publication-title: Bernoulli
  doi: 10.2307/3318681
– volume: 57
  start-page: 1707
  year: 2011
  ident: B10
  publication-title: IEEE Trans. Inf. Theory
  doi: 10.1109/TIT.2011.2104450
– volume: 20
  start-page: 817
  volume-title: Advances in Neural Information Processing Systems
  year: 2008
  ident: B18
– volume: 32
  start-page: 135
  year: 2004
  ident: B25
  publication-title: Ann. Statist.
  doi: 10.1214/aos/1079120131
– volume: 50
  start-page: 799
  year: 2005
  ident: B26
  publication-title: IEEE Trans. Automat. Control
– volume: 44
  start-page: 403
  year: 1982
  ident: B28
  publication-title: Sankhyā Ser. A
– volume: 3
  start-page: 397
  year: 2002
  ident: B1
  publication-title: J. Mach. Learn. Res.
– ident: B22
  doi: 10.1287/moor.1100.0446
– volume: 15
  start-page: 1091
  year: 1987
  ident: B13
  publication-title: Ann. Statist.
– volume: 32
  start-page: 48
  year: 2002
  ident: B3
  publication-title: SIAM J. Comput.
  doi: 10.1137/S0097539701398375
– volume: 19
  start-page: 1978
  year: 1991
  ident: B23
  publication-title: Ann. Statist.
  doi: 10.1214/aos/1176348382
– volume-title: Proceedings of the 13th International Conference on Artificial Intelligence and Statistics
  year: 2010
  ident: B19
– volume: 6
  start-page: 4
  year: 1985
  ident: B16
  publication-title: Adv. Applied Math.
  doi: 10.1016/0196-8858(85)90002-8
– volume-title: Bandit Problems
  year: 1985
  ident: B4
  doi: 10.1007/978-94-015-3711-7
– volume: 74
  start-page: 799
  year: 1979
  ident: B27
  publication-title: J. Amer. Statist. Assoc.
  doi: 10.1080/01621459.1979.10481033
– volume: 10
  volume-title: Stochastic Differential Systems, Stochastic Control Theory and Applications
  year: 1988
  ident: B14
– volume-title: Matrix Perturbation Theory
  year: 1990
  ident: B24
– volume: 40
  start-page: 1199
  year: 1995
  ident: B17
  publication-title: IEEE Trans. Automat. Control
  doi: 10.1109/9.400491
– volume-title: Wiley-Interscience Series in Systems and Optimization
  year: 1989
  ident: B8
– volume: 19
  start-page: 1603
  year: 2009
  ident: B9
  publication-title: Ann. Appl. Probab.
  doi: 10.1214/08-AAP589
– volume: 55
  start-page: 527
  year: 1952
  ident: B21
  publication-title: Bull. Amer. Math. Soc.
  doi: 10.1090/S0002-9904-1952-09620-8
– volume: 47
  start-page: 235
  year: 2002
  ident: B2
  publication-title: Machine learning
  doi: 10.1023/A:1013689704352
– volume-title: Prediction, Learning and Games
  year: 2006
  ident: B5
  doi: 10.1017/CBO9780511546921
– volume: 30
  start-page: 100
  year: 2002
  ident: B29
  publication-title: Annals of Statis.
SSID ssj0000605598
Score 1.8388599
Snippet We consider a two–armed bandit problem which involves sequential sampling from two non-homogeneous populations. The response in each is determined by a random...
We consider a two–armed bandit problem which involves sequentialsampling from two non-homogeneous populations. The responsein each is determined by a random...
SourceID doaj
unpaywall
crossref
SourceType Open Website
Open Access Repository
Enrichment Source
Index Database
StartPage 230
SubjectTerms bandit problems
estimation
minimax
rate–optimal policy
regret
Sequential allocation
SummonAdditionalLinks – databaseName: Unpaywall
  dbid: UNPAY
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3dT8IwEL8QeFAe_DbiV-bHgy_Fbd3a7RGMhJhIiEgCT0u7ti-SSQRi8K_39gGCIca39XLJ2mvX-93a-x3ArRTKeILFhHE7JvjEiVR-SLjwOA-1E5rsZ85zh7X73tPAH5TgapELs3p-j2D-3nHIZDK3Ke6yFeYj3C5Dpd_pNobZabHH0kAqKBiD1vTX_ExGx1-FrVkyFvNPMRqt-JDW7k8mTn515K0-m8p6_PWLmPHP7u3BToEgrUY-5ftQ0skBVFd4BQ_BalgYY-Iatl7yK7DaaqbpK1Orm9ePOYJ-6_H1oU2KUggkRv_uEiq4I2yFDlWpmEqEbWgMrgMRGGz7lMWUhgKxSOhRbRjupoErjUZtSX2jbHoM5eQ90SdgcWymp48opp4UoRQGJTpgQqBvErwGdwvDRXHBE56WqxhFabyAg8agIer1hjjoGlwvVcc5OcYmpWZq_aVCymedCdCOUfF5RJxiT1xPG8RrnnJ14AiF0AmjPz90GRU1uFnO3aZX5ZNw-i-tM9h285oWxHbOoTz9mOkLRBZTeVmsrW_oncoO
  priority: 102
  providerName: Unpaywall
Title A Linear Response Bandit Problem
URI https://doi.org/10.1287/11-ssy032
https://doaj.org/article/73ba924ef8394d2e81ad39914059263a
UnpaywallVersion publishedVersion
Volume 3
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAFT
  databaseName: Open Access Digital Library
  customDbUrl:
  eissn: 1946-5238
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0000605598
  issn: 1946-5238
  databaseCode: KQ8
  dateStart: 20110101
  isFulltext: true
  titleUrlDefault: http://grweb.coalliance.org/oadl/oadl.html
  providerName: Colorado Alliance of Research Libraries
– providerCode: PRVAON
  databaseName: DOAJ Directory of Open Access Journals
  customDbUrl:
  eissn: 1946-5238
  dateEnd: 20171231
  omitProxy: true
  ssIdentifier: ssj0000605598
  issn: 1946-5238
  databaseCode: DOA
  dateStart: 20110101
  isFulltext: true
  titleUrlDefault: https://www.doaj.org/
  providerName: Directory of Open Access Journals
– providerCode: PRVEBS
  databaseName: Mathematics Source
  customDbUrl:
  eissn: 1946-5238
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0000605598
  issn: 1946-5238
  databaseCode: AMVHM
  dateStart: 20110601
  isFulltext: true
  titleUrlDefault: https://www.ebsco.com/products/research-databases/mathematics-source
  providerName: EBSCOhost
– providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 1946-5238
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0000605598
  issn: 1946-5238
  databaseCode: M~E
  dateStart: 20090101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
– providerCode: PRVEGU
  databaseName: Open Access资源_Project Euclid Open Access Journals
  customDbUrl:
  eissn: 1946-5238
  dateEnd: 20170131
  omitProxy: true
  ssIdentifier: ssj0000605598
  issn: 1946-5238
  databaseCode: RBV
  dateStart: 20110101
  isFulltext: true
  titleUrlDefault: https://projecteuclid.org/Search
  providerName: Project Euclid
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV3PS8MwFA4yD-pB_InzxyjqwUtY27RJe9zEMQSHOCfbqbw0yWnUsR_I_ntfmjoqDLx4KW14kPal5Pu-Jv0eIfcSlImA55QLP6d4JqhUcUoFREKkOkhN-THnZcD7o-h5HI9rpb7snjBnD-wS1xZMAmoEbRDJIxXqJACFoIq6IE5Dzkpq5CdpTUy5Odi3zuPlknLErdpKKlshVAjtIKDD4cRn4S8wKj37D8jeqpjB-gum0xrQ9I7IYcUQvY67s2Oyo4sTclDzDTwlXsez7BDm3txtcdWetL-nLL2qPswZGfWe3h_7tCp1QHPE75AyEAH4CgFTqZxJpGUcmb5OIDF4HTOeM5YCco00YtpwnC2TUBqN0ZLFRvnsnDSKz0JfEE_gpV1dxGYWYe4kGGzRCQdA7AHRJA8_z5zllQ-4LUcxzawewPSgKMhceprkdhM6c-YX24K6NnGbAOtXXTbgKGbVKGZ_jWKT3G3Svq2rxWKNXV3-R1dXZD90JS2oH1yTxnK-0jdILJayVb5DeHzrfrTI7mjw2pl8A0bmyfc
linkProvider Directory of Open Access Journals
linkToUnpaywall http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3dT8IwEL8QeFAe_DbiV-bHgy_Fbd3a7RGMhJhIiEgCT0u7ti-SSQRi8K_39gGCIca39XLJ2mvX-93a-x3ArRTKeILFhHE7JvjEiVR-SLjwOA-1E5rsZ85zh7X73tPAH5TgapELs3p-j2D-3nHIZDK3Ke6yFeYj3C5Dpd_pNobZabHH0kAqKBiD1vTX_ExGx1-FrVkyFvNPMRqt-JDW7k8mTn515K0-m8p6_PWLmPHP7u3BToEgrUY-5ftQ0skBVFd4BQ_BalgYY-Iatl7yK7DaaqbpK1Orm9ePOYJ-6_H1oU2KUggkRv_uEiq4I2yFDlWpmEqEbWgMrgMRGGz7lMWUhgKxSOhRbRjupoErjUZtSX2jbHoM5eQ90SdgcWymp48opp4UoRQGJTpgQqBvErwGdwvDRXHBE56WqxhFabyAg8agIer1hjjoGlwvVcc5OcYmpWZq_aVCymedCdCOUfF5RJxiT1xPG8RrnnJ14AiF0AmjPz90GRU1uFnO3aZX5ZNw-i-tM9h285oWxHbOoTz9mOkLRBZTeVmsrW_oncoO
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+linear+response+bandit+problem&rft.jtitle=Stochastic+systems&rft.au=Assaf+Zeevi&rft.au=Alexander+Goldenshluger&rft.date=2013-06-01&rft.pub=Institute+for+Operations+Research+and+the+Management+Sciences+%28INFORMS%29&rft.issn=1946-5238&rft.eissn=1946-5238&rft.volume=3&rft.issue=1&rft.spage=230&rft.epage=261&rft_id=info:doi/10.1287%2F11-SSY032&rft.externalDBID=DOA&rft.externalDocID=oai_doaj_org_article_73ba924ef8394d2e81ad39914059263a
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1946-5238&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1946-5238&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1946-5238&client=summon