This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
41st Annual Symposium on Foundations of Computer Science
Using upper confidence bounds for online learning
Redondo Beach, California
November 12-November 14
ISBN: 0-7695-0850-2
P. Auer, Inst. for Theor. Comput. Sci., Graz Univ. of Technol., Austria
We show how a standard tool from statistics, namely confidence bounds, can be used to elegantly deal with situations which exhibit an exploitation/exploration trade-off. Our technique for designing and analyzing algorithms for such situations is very general and can be applied when an algorithm has to make exploitation-versus-exploration decisions based on uncertain information provided by a random process. We consider two models with such an exploitation/exploration trade-off. For the adversarial bandit problem our new algorithm suffers only O/spl tilde/(T/sup 1/2/) regret over T trials which improves significantly over the previously best O/spl tilde/(T/sup 2/3/) regret. We also extend our results for the adversarial bandit problem to shifting bandits. The second model we consider is associative reinforcement learning with linear value functions. For this model our technique improves the regret from O/spl tilde/(T/sup 3/4/) to O/spl tilde/(T/sup 1/2/).
Index Terms:
learning (artificial intelligence); statistical analysis; uncertainty handling; random processes; upper confidence bounds; online learning; statistics; exploitation decision; exploration decision; uncertain information; random process; adversarial bandit problem; associative reinforcement learning; linear value functions
Citation:
P. Auer, "Using upper confidence bounds for online learning," focs, pp.270, 41st Annual Symposium on Foundations of Computer Science, 2000
Usage of this product signifies your acceptance of the Terms of Use.