2015 Brazilian Conference on Intelligent Systems (BRACIS) (2015)
Nov. 4, 2015 to Nov. 7, 2015
Sampling and computation budgets are two of the key elements that determine the performance of a reinforcement learning algorithm. In essence, any reinforcement learning agent must sample the environment and perform some computation over the samples to decide its best action. Although very fundamental, the trade-off between sampling and computation is still not well understood. In this paper, we explore this trade-off in an actor-critic perspective. First, we propose a new RL algorithm, Dyna-MLAC, which uses model-based actor-critic updates (MLAC) within the Dyna framework. Then, we numerically indicate that the convergence time of Dyna-MLAC is smaller than pre-existing solutions, and that Dyna-MLAC allows to efficiently trade number of samples and computation time.
Approximation algorithms, Learning (artificial intelligence), Complexity theory, Standards, Computational modeling, Linear regression, Electronic mail,mlac, reinforcement learning, actor-critic, dyna
"Dyna-MLAC: Trading Computational and Sample Complexities in Actor-Critic Reinforcement Learning", 2015 Brazilian Conference on Intelligent Systems (BRACIS), vol. 00, no. , pp. 37-42, 2015, doi:10.1109/BRACIS.2015.62