2016 IEEE International Conference on Cluster Computing (CLUSTER) (2016)
Sept. 12, 2016 to Sept. 16, 2016
We use supervised machine learning algorithms (i.e., Decision Trees, Random Forest, and K-nearest Neighbors) to predict performance characteristics such as runtime and IO traffic of batch jobs on high-end clusters, using only user job scripts as input. We show that decision trees outperform other algorithms and accurately predict the runtime of 73% of jobs within a error tolerance of 10 minutes, which is a 51% improvement over the user requested runtime.
Decision trees, Runtime, Training, Training data, Measurement, Machine learning algorithms, Monitoring
R. McKenna, S. Herbein, A. Moody, T. Gamblin and M. Taufer, "Machine Learning Predictions of Runtime and IO Traffic on High-End Clusters," 2016 IEEE International Conference on Cluster Computing (CLUSTER), Taipei, Taiwan, 2016, pp. 255-258.