2015 Brazilian Conference on Intelligent Systems (BRACIS) (2015)
Nov. 4, 2015 to Nov. 7, 2015
Sampling and computation budgets are two of the key elements that determine the performance of a reinforcement learning algorithm. In essence, any reinforcement learning agent must sample the environment and perform some computation over the samples to decide its best action. Although very fundamental, the trade-off between sampling and computation is still not well understood. In this paper, we explore this trade-off in an actor-critic perspective. First, we propose a new RL algorithm, Dyna-MLAC, which uses model-based actor-critic updates (MLAC) within the Dyna framework. Then, we numerically indicate that the convergence time of Dyna-MLAC is smaller than pre-existing solutions, and that Dyna-MLAC allows to efficiently trade number of samples and computation time.
Approximation algorithms, Learning (artificial intelligence), Complexity theory, Standards, Computational modeling, Linear regression, Electronic mail
B. Costa, W. Caarls and D. S. Menasche, "Dyna-MLAC: Trading Computational and Sample Complexities in Actor-Critic Reinforcement Learning," 2015 Brazilian Conference on Intelligent Systems (BRACIS), Natal, Brazil, 2016, pp. 37-42.