A Linear Response Bandit Problem
We consider a two–armed bandit problem which involves sequential sampling from two non-homogeneous populations. The response in each is determined by a random covariate vector and a vector of parameters whose values are not known a priori. The goal is to maximize cumulative expected reward. We study...
Saved in:
| Published in | Stochastic systems Vol. 3; no. 1; pp. 230 - 261 |
|---|---|
| Main Authors | , |
| Format | Journal Article |
| Language | English |
| Published |
Institute for Operations Research and the Management Sciences (INFORMS)
01.06.2013
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 1946-5238 1946-5238 |
| DOI | 10.1287/11-SSY032 |
Cover
| Abstract | We consider a two–armed bandit problem which involves sequential sampling from two non-homogeneous populations. The response in each is determined by a random covariate vector and a vector of parameters whose values are not known a priori. The goal is to maximize cumulative expected reward. We study this problem in a minimax setting, and develop rate-optimal polices that combine myopic action based on least squares estimates with a suitable “forced sampling” strategy. It is shown that the regret grows logarithmically in the time horizon n and no policy can achieve a slower growth rate over all feasible problem instances. In this setting of linear response bandits, the identity of the sub-optimal action changes with the values of the covariate vector, and the optimal policy is subject to sampling from the inferior population at a rate that grows like [Formula: see text]. |
|---|---|
| AbstractList | We consider a two–armed bandit problem which involves sequentialsampling from two non-homogeneous populations. The responsein each is determined by a random covariate vector and a vector ofparameters whose values are not known a priori.The goal is to maximize cumulative expected reward. We study this problemin a minimax setting, and develop rate-optimal polices that combinemyopic action based on least squares estimates with a suitable "forced sampling'' strategy. It is shown that the regret growslogarithmically in the time horizon n and no policy can achievea slower growth rate over all feasible problem instances. In thissetting of linear response bandits, the identity of thesub-optimal action changes with the values of the covariatevector, and the optimal policy is subject to sampling from theinferior population at a rate that grows like $sqrt{n}$. We consider a two–armed bandit problem which involves sequential sampling from two non-homogeneous populations. The response in each is determined by a random covariate vector and a vector of parameters whose values are not known a priori. The goal is to maximize cumulative expected reward. We study this problem in a minimax setting, and develop rate-optimal polices that combine myopic action based on least squares estimates with a suitable “forced sampling” strategy. It is shown that the regret grows logarithmically in the time horizon n and no policy can achieve a slower growth rate over all feasible problem instances. In this setting of linear response bandits, the identity of the sub-optimal action changes with the values of the covariate vector, and the optimal policy is subject to sampling from the inferior population at a rate that grows like [Formula: see text]. |
| Author | Goldenshluger, Alexander Zeevi, Assaf |
| Author_xml | – sequence: 1 givenname: Alexander surname: Goldenshluger fullname: Goldenshluger, Alexander organization: Department of Statistics, University of Haifa, Haifa 31905, Israel – sequence: 2 givenname: Assaf surname: Zeevi fullname: Zeevi, Assaf organization: Graduate School of Business, Columbia University New York, NY 10027, USA |
| BookMark | eNptkF1LwzAUhoNMcM5d-A96q1CX5DRtcjmHH4OB4vTCq3DapJLRNSOpyP691YqI7NycD57zXLynZNT61hJyzugV47KYMZau168U-BEZM5XlqeAgR3_mEzKNcUP7yqkQSo5JMk9WrrUYkicbd76NNrnG1rgueQy-bOz2jBzX2EQ7_ekT8nJ787y4T1cPd8vFfJVWrBA8BSwYUkN5ZkwFJQOVC1VYibLudwF5BaCw4IXKwNa5xUrysrY9XYKoDYUJWQ5e43Gjd8FtMey1R6e_Dz68aQydqxqrCyhR8czWElRmuJUMDSjFMioUzwF71-Xgem93uP_ApvkVMqq_otKM6Rj3fVQ9fDHAVfAxBlsfYodYe3b2j61ch53zbRfQNQc-PgF7-3gv |
| CitedBy_id | crossref_primary_10_1111_sjos_12621 crossref_primary_10_1287_mnsc_2020_3819 crossref_primary_10_1111_poms_13380 crossref_primary_10_1214_16_AOS1518 crossref_primary_10_1287_mnsc_2023_4928 crossref_primary_10_2139_ssrn_4779711 crossref_primary_10_1287_mnsc_2022_4383 crossref_primary_10_1287_opre_2022_2365 crossref_primary_10_1287_mnsc_2023_4678 crossref_primary_10_2139_ssrn_4160045 crossref_primary_10_1287_mnsc_2022_01985 crossref_primary_10_2139_ssrn_3892631 crossref_primary_10_1287_mnsc_2023_4895 crossref_primary_10_1287_moor_2021_0167 crossref_primary_10_1080_01621459_2022_2108816 crossref_primary_10_1109_LCSYS_2020_3047601 crossref_primary_10_2139_ssrn_3294832 crossref_primary_10_1287_mnsc_2022_00490 crossref_primary_10_1080_01621459_2024_2439622 crossref_primary_10_2139_ssrn_3483934 crossref_primary_10_1214_21_EJS1909 crossref_primary_10_1007_s10707_025_00541_3 crossref_primary_10_1080_01621459_2020_1770098 crossref_primary_10_1287_opre_2020_2016 crossref_primary_10_1016_j_eswa_2023_123060 crossref_primary_10_1287_mnsc_2020_3605 crossref_primary_10_1007_s10729_020_09522_4 crossref_primary_10_1287_mksc_2022_0406 crossref_primary_10_1287_ijoc_2020_1009 crossref_primary_10_2139_ssrn_3893198 crossref_primary_10_1287_mnsc_2020_3773 crossref_primary_10_1287_mnsc_2023_4921 crossref_primary_10_1109_TMM_2022_3199666 crossref_primary_10_1080_01621459_2022_2152343 crossref_primary_10_1287_ijoo_2018_0005 crossref_primary_10_1080_01621459_2020_1826325 crossref_primary_10_1287_mnsc_2022_01557 crossref_primary_10_1287_msom_2022_1116 crossref_primary_10_1287_opre_2019_1902 crossref_primary_10_1287_opre_2019_1948 crossref_primary_10_1287_opre_2021_2215 crossref_primary_10_1287_opre_2021_2237 crossref_primary_10_1287_serv_2022_0306 |
| Cites_doi | 10.1111/j.2517-6161.1995.tb02062.x 10.2307/3318681 10.1109/TIT.2011.2104450 10.1214/aos/1079120131 10.1287/moor.1100.0446 10.1137/S0097539701398375 10.1214/aos/1176348382 10.1016/0196-8858(85)90002-8 10.1007/978-94-015-3711-7 10.1080/01621459.1979.10481033 10.1109/9.400491 10.1214/08-AAP589 10.1090/S0002-9904-1952-09620-8 10.1023/A:1013689704352 10.1017/CBO9780511546921 |
| ContentType | Journal Article |
| DBID | AAYXX CITATION ADTOC UNPAY DOA |
| DOI | 10.1287/11-SSY032 |
| DatabaseName | CrossRef Unpaywall for CDI: Periodical Content Unpaywall DOAJ Directory of Open Access Journals |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | CrossRef |
| Database_xml | – sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website – sequence: 2 dbid: UNPAY name: Unpaywall url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/ sourceTypes: Open Access Repository |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering |
| EISSN | 1946-5238 |
| EndPage | 261 |
| ExternalDocumentID | oai_doaj_org_article_73ba924ef8394d2e81ad39914059263a 10.1287/11-ssy032 10_1287_11_SSY032 |
| GroupedDBID | 5VS AAYXX AFFOW AKVCP ALMA_UNASSIGNED_HOLDINGS AMVHM CITATION EBA EBE EBO EBR EBU FRP GR0 GROUPED_DOAJ H13 J9A KQ8 M~E OK1 RBV RPE RPU ADTOC UNPAY |
| ID | FETCH-LOGICAL-c1752-3a71a0d024ddc3b1396597e8a8fdc3536c339a727943ef6eac82bfe4ddb35fd03 |
| IEDL.DBID | DOA |
| ISSN | 1946-5238 |
| IngestDate | Fri Oct 03 12:41:47 EDT 2025 Mon Sep 15 10:13:24 EDT 2025 Tue Jul 01 02:30:48 EDT 2025 Thu Apr 24 22:51:20 EDT 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 1 |
| Language | English |
| License | cc-by |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c1752-3a71a0d024ddc3b1396597e8a8fdc3536c339a727943ef6eac82bfe4ddb35fd03 |
| OpenAccessLink | https://doaj.org/article/73ba924ef8394d2e81ad39914059263a |
| PageCount | 32 |
| ParticipantIDs | doaj_primary_oai_doaj_org_article_73ba924ef8394d2e81ad39914059263a unpaywall_primary_10_1287_11_ssy032 crossref_primary_10_1287_11_SSY032 crossref_citationtrail_10_1287_11_SSY032 |
| ProviderPackageCode | CITATION AAYXX |
| PublicationCentury | 2000 |
| PublicationDate | 2013-06-00 |
| PublicationDateYYYYMMDD | 2013-06-01 |
| PublicationDate_xml | – month: 06 year: 2013 text: 2013-06-00 |
| PublicationDecade | 2010 |
| PublicationTitle | Stochastic systems |
| PublicationYear | 2013 |
| Publisher | Institute for Operations Research and the Management Sciences (INFORMS) |
| Publisher_xml | – name: Institute for Operations Research and the Management Sciences (INFORMS) |
| References | B22 Stewart G. W. (B24) 1990 Gill R. D. (B7) 1995; 1 Lai T. L. (B16) 1985; 6 Woodroofe M. (B28) 1982; 44 Lai T. L. (B17) 1995; 40 Robbins H. (B21) 1952; 55 Woodroofe M. (B27) 1979; 74 Tsybakov A. B. (B25) 2004; 32 Wang C.-C. (B26) 2005; 50 Berry D. A. (B4) 1985 Lai T. L. (B14) 1988; 10 Auer P. (B3) 2002; 32 Lai T. L. (B13) 1987; 15 Lu T. (B19) 2010 Gittins J. C. (B8) 1989 Yang Y. (B29) 2002; 30 Lai T. L. (B15) 2001; 11 Langford J. (B18) 2008; 20 Auer P. (B2) 2002; 47 Goldenshluger A. (B10) 2011; 57 Auer P. (B1) 2002; 3 Cesa–Bianchi N. (B5) 2006 Sarkar J. (B23) 1991; 19 Goldenshluger A. (B9) 2009; 19 Ginebra J. (B6) 1995; 57 |
| References_xml | – volume: 11 start-page: 303 year: 2001 ident: B15 publication-title: Statist. Sinica – volume: 57 start-page: 771 year: 1995 ident: B6 publication-title: J. Roy. Statist. Soc. Ser. B doi: 10.1111/j.2517-6161.1995.tb02062.x – volume: 1 start-page: 59 year: 1995 ident: B7 publication-title: Bernoulli doi: 10.2307/3318681 – volume: 57 start-page: 1707 year: 2011 ident: B10 publication-title: IEEE Trans. Inf. Theory doi: 10.1109/TIT.2011.2104450 – volume: 20 start-page: 817 volume-title: Advances in Neural Information Processing Systems year: 2008 ident: B18 – volume: 32 start-page: 135 year: 2004 ident: B25 publication-title: Ann. Statist. doi: 10.1214/aos/1079120131 – volume: 50 start-page: 799 year: 2005 ident: B26 publication-title: IEEE Trans. Automat. Control – volume: 44 start-page: 403 year: 1982 ident: B28 publication-title: Sankhyā Ser. A – volume: 3 start-page: 397 year: 2002 ident: B1 publication-title: J. Mach. Learn. Res. – ident: B22 doi: 10.1287/moor.1100.0446 – volume: 15 start-page: 1091 year: 1987 ident: B13 publication-title: Ann. Statist. – volume: 32 start-page: 48 year: 2002 ident: B3 publication-title: SIAM J. Comput. doi: 10.1137/S0097539701398375 – volume: 19 start-page: 1978 year: 1991 ident: B23 publication-title: Ann. Statist. doi: 10.1214/aos/1176348382 – volume-title: Proceedings of the 13th International Conference on Artificial Intelligence and Statistics year: 2010 ident: B19 – volume: 6 start-page: 4 year: 1985 ident: B16 publication-title: Adv. Applied Math. doi: 10.1016/0196-8858(85)90002-8 – volume-title: Bandit Problems year: 1985 ident: B4 doi: 10.1007/978-94-015-3711-7 – volume: 74 start-page: 799 year: 1979 ident: B27 publication-title: J. Amer. Statist. Assoc. doi: 10.1080/01621459.1979.10481033 – volume: 10 volume-title: Stochastic Differential Systems, Stochastic Control Theory and Applications year: 1988 ident: B14 – volume-title: Matrix Perturbation Theory year: 1990 ident: B24 – volume: 40 start-page: 1199 year: 1995 ident: B17 publication-title: IEEE Trans. Automat. Control doi: 10.1109/9.400491 – volume-title: Wiley-Interscience Series in Systems and Optimization year: 1989 ident: B8 – volume: 19 start-page: 1603 year: 2009 ident: B9 publication-title: Ann. Appl. Probab. doi: 10.1214/08-AAP589 – volume: 55 start-page: 527 year: 1952 ident: B21 publication-title: Bull. Amer. Math. Soc. doi: 10.1090/S0002-9904-1952-09620-8 – volume: 47 start-page: 235 year: 2002 ident: B2 publication-title: Machine learning doi: 10.1023/A:1013689704352 – volume-title: Prediction, Learning and Games year: 2006 ident: B5 doi: 10.1017/CBO9780511546921 – volume: 30 start-page: 100 year: 2002 ident: B29 publication-title: Annals of Statis. |
| SSID | ssj0000605598 |
| Score | 1.8388599 |
| Snippet | We consider a two–armed bandit problem which involves sequential sampling from two non-homogeneous populations. The response in each is determined by a random... We consider a two–armed bandit problem which involves sequentialsampling from two non-homogeneous populations. The responsein each is determined by a random... |
| SourceID | doaj unpaywall crossref |
| SourceType | Open Website Open Access Repository Enrichment Source Index Database |
| StartPage | 230 |
| SubjectTerms | bandit problems estimation minimax rate–optimal policy regret Sequential allocation |
| SummonAdditionalLinks | – databaseName: Unpaywall dbid: UNPAY link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3dT8IwEL8QeFAe_DbiV-bHgy_Fbd3a7RGMhJhIiEgCT0u7ti-SSQRi8K_39gGCIca39XLJ2mvX-93a-x3ArRTKeILFhHE7JvjEiVR-SLjwOA-1E5rsZ85zh7X73tPAH5TgapELs3p-j2D-3nHIZDK3Ke6yFeYj3C5Dpd_pNobZabHH0kAqKBiD1vTX_ExGx1-FrVkyFvNPMRqt-JDW7k8mTn515K0-m8p6_PWLmPHP7u3BToEgrUY-5ftQ0skBVFd4BQ_BalgYY-Iatl7yK7DaaqbpK1Orm9ePOYJ-6_H1oU2KUggkRv_uEiq4I2yFDlWpmEqEbWgMrgMRGGz7lMWUhgKxSOhRbRjupoErjUZtSX2jbHoM5eQ90SdgcWymp48opp4UoRQGJTpgQqBvErwGdwvDRXHBE56WqxhFabyAg8agIer1hjjoGlwvVcc5OcYmpWZq_aVCymedCdCOUfF5RJxiT1xPG8RrnnJ14AiF0AmjPz90GRU1uFnO3aZX5ZNw-i-tM9h285oWxHbOoTz9mOkLRBZTeVmsrW_oncoO priority: 102 providerName: Unpaywall |
| Title | A Linear Response Bandit Problem |
| URI | https://doi.org/10.1287/11-ssy032 https://doaj.org/article/73ba924ef8394d2e81ad39914059263a |
| UnpaywallVersion | publishedVersion |
| Volume | 3 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAFT databaseName: Open Access Digital Library customDbUrl: eissn: 1946-5238 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0000605598 issn: 1946-5238 databaseCode: KQ8 dateStart: 20110101 isFulltext: true titleUrlDefault: http://grweb.coalliance.org/oadl/oadl.html providerName: Colorado Alliance of Research Libraries – providerCode: PRVAON databaseName: DOAJ Directory of Open Access Journals customDbUrl: eissn: 1946-5238 dateEnd: 20171231 omitProxy: true ssIdentifier: ssj0000605598 issn: 1946-5238 databaseCode: DOA dateStart: 20110101 isFulltext: true titleUrlDefault: https://www.doaj.org/ providerName: Directory of Open Access Journals – providerCode: PRVEBS databaseName: Mathematics Source customDbUrl: eissn: 1946-5238 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0000605598 issn: 1946-5238 databaseCode: AMVHM dateStart: 20110601 isFulltext: true titleUrlDefault: https://www.ebsco.com/products/research-databases/mathematics-source providerName: EBSCOhost – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 1946-5238 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0000605598 issn: 1946-5238 databaseCode: M~E dateStart: 20090101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre – providerCode: PRVEGU databaseName: Open Access资源_Project Euclid Open Access Journals customDbUrl: eissn: 1946-5238 dateEnd: 20170131 omitProxy: true ssIdentifier: ssj0000605598 issn: 1946-5238 databaseCode: RBV dateStart: 20110101 isFulltext: true titleUrlDefault: https://projecteuclid.org/Search providerName: Project Euclid |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV3PS8MwFA4yD-pB_InzxyjqwUtY27RJe9zEMQSHOCfbqbw0yWnUsR_I_ntfmjoqDLx4KW14kPal5Pu-Jv0eIfcSlImA55QLP6d4JqhUcUoFREKkOkhN-THnZcD7o-h5HI9rpb7snjBnD-wS1xZMAmoEbRDJIxXqJACFoIq6IE5Dzkpq5CdpTUy5Odi3zuPlknLErdpKKlshVAjtIKDD4cRn4S8wKj37D8jeqpjB-gum0xrQ9I7IYcUQvY67s2Oyo4sTclDzDTwlXsez7BDm3txtcdWetL-nLL2qPswZGfWe3h_7tCp1QHPE75AyEAH4CgFTqZxJpGUcmb5OIDF4HTOeM5YCco00YtpwnC2TUBqN0ZLFRvnsnDSKz0JfEE_gpV1dxGYWYe4kGGzRCQdA7AHRJA8_z5zllQ-4LUcxzawewPSgKMhceprkdhM6c-YX24K6NnGbAOtXXTbgKGbVKGZ_jWKT3G3Svq2rxWKNXV3-R1dXZD90JS2oH1yTxnK-0jdILJayVb5DeHzrfrTI7mjw2pl8A0bmyfc |
| linkProvider | Directory of Open Access Journals |
| linkToUnpaywall | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3dT8IwEL8QeFAe_DbiV-bHgy_Fbd3a7RGMhJhIiEgCT0u7ti-SSQRi8K_39gGCIca39XLJ2mvX-93a-x3ArRTKeILFhHE7JvjEiVR-SLjwOA-1E5rsZ85zh7X73tPAH5TgapELs3p-j2D-3nHIZDK3Ke6yFeYj3C5Dpd_pNobZabHH0kAqKBiD1vTX_ExGx1-FrVkyFvNPMRqt-JDW7k8mTn515K0-m8p6_PWLmPHP7u3BToEgrUY-5ftQ0skBVFd4BQ_BalgYY-Iatl7yK7DaaqbpK1Orm9ePOYJ-6_H1oU2KUggkRv_uEiq4I2yFDlWpmEqEbWgMrgMRGGz7lMWUhgKxSOhRbRjupoErjUZtSX2jbHoM5eQ90SdgcWymp48opp4UoRQGJTpgQqBvErwGdwvDRXHBE56WqxhFabyAg8agIer1hjjoGlwvVcc5OcYmpWZq_aVCymedCdCOUfF5RJxiT1xPG8RrnnJ14AiF0AmjPz90GRU1uFnO3aZX5ZNw-i-tM9h285oWxHbOoTz9mOkLRBZTeVmsrW_oncoO |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+linear+response+bandit+problem&rft.jtitle=Stochastic+systems&rft.au=Assaf+Zeevi&rft.au=Alexander+Goldenshluger&rft.date=2013-06-01&rft.pub=Institute+for+Operations+Research+and+the+Management+Sciences+%28INFORMS%29&rft.issn=1946-5238&rft.eissn=1946-5238&rft.volume=3&rft.issue=1&rft.spage=230&rft.epage=261&rft_id=info:doi/10.1287%2F11-SSY032&rft.externalDBID=DOA&rft.externalDocID=oai_doaj_org_article_73ba924ef8394d2e81ad39914059263a |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1946-5238&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1946-5238&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1946-5238&client=summon |