Los Angeles, CA
March 31, 2009 to April 2, 2009
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/CSIE.2009.587
Reinforcement learning is an algorithm without model which is learning what to do--how to map situations to actions--so as to maximize a numerical reward signal. Reinforcement learning provides an available method to the systems, which are very difficult to build up accurate models around complex environment. But now many practical problems demand a maximum reward with not much cost (expense). For example, the production of coal mine is closely correlated with security in that it increases production in the limited range of security situation. On the base of Markov decision process (MDP) and reinforcement learning, the paper introduced constraint Markov decision process into reinforcement learning. The paper improved Q-learning algorithm with adding cost factor and gave a new Q-learning algorithm based on constraint MDP. Finally, according to the constraint between production and safety in coal mine, the paper made the simulation investigation about the action control of coal shearer in coal mine working face. The simulation result had verified the validity of the method.
constraint MDP, reinforcement learning, Q-learning, cost, coal shearer
Zhao Xiao-hu, Zhao Ke-ke, Wang Qing-qing, Ma Fang-qing, "Research and Application of Reinforcement Learning Based on Constraint MDP in Coal Mine", CSIE, 2009, 2009 WRI World Congress on Computer Science and Information Engineering, CSIE, 2009 WRI World Congress on Computer Science and Information Engineering, CSIE 2009, pp. 687-691, doi:10.1109/CSIE.2009.587