2011 IEEE Third International Conference on Cloud Computing Technology and Science (2011)
Nov. 29, 2011 to Dec. 1, 2011
Recently, MapReduce has been used to parallelize machine learning algorithms. To obtain the best performance for these algorithms, tuning the parameters of the algorithms is required. However, this is time consuming because it requires executing a MapReduce program multiple times using various parameters. Such multiple executions can be assigned to a cluster in various ways, and the execution time varies depending on the assignments. To achieve the shortest execution time, we propose a method for optimizing the assignment of MapReduce jobs to a cluster assuming machine learning targeted runtime. We developed an execution cost model to predict the total execution time of jobs and obtained the optimal assignment by minimizing the cost model. To evaluate the proposed method, we implemented an experimental MapReduce runtime based on Message Passing Interface and executed logistic regression in four cases. The results showed that the proposed method can correctly predict the optimal job assignment. We also confirmed that the optimal assignment reduced execution time by a maximum 77% compared to the worst assignment.
MapReduce, Machine Learning, Job Scheduling
T. Araki, H. Tamano and S. Nakadai, "Optimizing Multiple Machine Learning Jobs on MapReduce," 2011 IEEE Third International Conference on Cloud Computing Technology and Science(CLOUDCOM), Athens, Greece, 2011, pp. 59-66.