2016 International Conference on Parallel Architecture and Compilation Techniques (PACT) (2016)
Sept. 11, 2016 to Sept. 15, 2016
Zhen Jia , Institute of Computing Technology, Chinese Academy of Sciences, China
Chao Xue , IBM Research-China
Guancheng Chen , IBM Research-China
Jianfeng Zhan , Institute of Computing Technology, Chinese Academy of Sciences, China
Lixin Zhang , Institute of Computing Technology, Chinese Academy of Sciences, China
Yonghua Lin , IBM Research-China
Peter Hofstee , IBM Research-Austin, United States
Much research work devotes to tuning big data analytics in modern data centers, since even a small percentage of performance improvement immediately translates to huge cost savings because of the large scale. Simultaneous multithreading (SMT) receives great interest from data center communities, as it has the potential to boost performance of big data analytics by increasing the processor resources utilization. For example, the emerging processor architectures like POWER8 support up to 8-way multithreading. However, as different big data workloads have disparate architectural characteristics, how to identify the most efficient SMT configuration to achieve the best performance is challenging in terms of both complex application behaviors and processor architectures. In this paper, we specifically focus on auto-tuning SMT configuration for Spark-based big data workloads on POWER8. However, our methodology could be generalized and extended to other programming software stacks and other architectures. We propose a prediction-based dynamic SMT threading (PBDST) framework to adjust the thread count in SMT cores on POWER8 processors by using versatile machine learning algorithms. Its innovation lies in adopting online SMT configuration predictions derived from microarchitecture level profiling, to regulate the thread counts that could achieve nearly optimal performance. Moreover it is implemented at Spark software stack layer and transparent to user applications. After evaluating a large set of machine learning algorithms, we choose the most efficient ones to perform online predictions. The experimental results demonstrate that our approach can achieve up to 56.3% performance improvement and an average performance gain of 16.2% in comparison with the default configuration—the maximum SMT configuration—SMT8 on our system.
Sparks, Big data, Hardware, Training, Machine learning algorithms, Instruction sets
Z. Jia et al., "Auto-tuning Spark big data workloads on POWER8: Prediction-based dynamic SMT threading," 2016 International Conference on Parallel Architecture and Compilation Techniques (PACT), Haifa, Israel, 2016, pp. 387-400.