loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
17th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'05)
Learning Optimal Values from Random Walk
Hong Kong, China
November 14-November 16
ISBN: 0-7695-2488-5
K. P. Lam, Chinese University of Hong Kong
In this paper we extend the random walk example of Sutton and Barto to a multistage dynamic programming optimization setting with discounted reward. Using Bellman equations on presumed action, the optimal values are derived for general transition probability ρ and discount rate ϒ, and include the original random walk as a special case. Temporal difference methods with eligibility traces, TD(λ), are effective in predicting the optimal values for different ρ and ϒ; but their performances are found to depend critically on the choice of truncated return in the formulation when ϒ is less than 1.
Citation:
K. P. Lam, "Learning Optimal Values from Random Walk," ictai, pp.334-339, 17th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'05), 2005
Usage of this product signifies your acceptance of the Terms of Use.