2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS) (2018)
Oct 7, 2018 to Oct 9, 2018
We introduce a general model of bandit problems in which the expected payout of an arm is an increasing concave function of the time since it was last played. We first develop a PTAS for the underlying optimization problem of determining a reward-maximizing sequence of arm pulls. We then show how to use this PTAS in a learning setting to obtain sublinear regret.
R. Kleinberg and N. Immorlica, "Recharging Bandits," 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS), Paris, France, 2019, pp. 309-319.