MULTI-ARMED BANDITS UNDER GENERAL DEPRECIATION AND COMMITMENT
Generally, the multi-armed has been studied under the setting that at each time step over an infinite horizon a controller chooses to activate a single process or bandit out of a finite collection of independent processes (statistical experiments, populations, etc.) for a single period, receiving a...
        Saved in:
      
    
          | Published in | Probability in the engineering and informational sciences Vol. 29; no. 1; pp. 51 - 76 | 
|---|---|
| Main Authors | , | 
| Format | Journal Article | 
| Language | English | 
| Published | 
        New York, USA
          Cambridge University Press
    
        01.01.2015
     | 
| Subjects | |
| Online Access | Get full text | 
| ISSN | 0269-9648 1469-8951 1469-8951  | 
| DOI | 10.1017/S0269964814000217 | 
Cover
| Summary: | Generally, the multi-armed has been studied under the setting that at each time step over an infinite horizon a controller chooses to activate a single process or bandit out of a finite collection of independent processes (statistical experiments, populations, etc.) for a single period, receiving a reward that is a function of the activated process, and in doing so advancing the chosen process. Classically, rewards are discounted by a constant factor β∈(0, 1) per round. In this paper, we present a solution to the problem, with potentially non-Markovian, uncountable state space reward processes, under a framework in which, first, the discount factors may be non-uniform and vary over time, and second, the periods of activation of each bandit may be not be fixed or uniform, subject instead to a possibly stochastic duration of activation before a change to a different bandit is allowed. The solution is based on generalized restart-in-state indices, and it utilizes a view of the problem not as “decisions over state space” but rather “decisions over time”. | 
|---|---|
| Bibliography: | SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14 ObjectType-Article-1 ObjectType-Feature-2 content type line 23  | 
| ISSN: | 0269-9648 1469-8951 1469-8951  | 
| DOI: | 10.1017/S0269964814000217 |