17th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'05) Learning Optimal Values from Random Walk Hong Kong, China November 14-November 16 ISBN: 0-7695-2488-5
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICTAI.2005.81
In this paper we extend the random walk example of Sutton and Barto to a multistage dynamic programming optimization setting with discounted reward. Using Bellman equations on presumed action, the optimal values are derived for general transition probability ρ and discount rate ϒ, and include the original random walk as a special case. Temporal difference methods with eligibility traces, TD(λ), are effective in predicting the optimal values for different ρ and ϒ; but their performances are found to depend critically on the choice of truncated return in the formulation when ϒ is less than 1.
Citation:
K. P. Lam, "Learning Optimal Values from Random Walk," ictai, pp.334-339, 17th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'05), 2005 Usage of this product signifies your acceptance of the Terms of Use. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||