Issue No.01 - Jan. (2013 vol.24)
pp: 144-157
Dong Li , Oak Ridge National Laboratory, Oak Ridge and Virginia Tech, Blacksburg
Bronis R. de Supinski , Lawrence Livermore National Lab, Livermore
Martin Schulz , Lawrence Livermore National Lab, Livermore
Dimitrios S. Nikolopoulos , Queen's University of Belfast and FORTH-ICS, Heraklion
Kirk W. Cameron , Virginia Tech, Blacksburg
Many scientific applications are programmed using hybrid programming models that use both message passing and shared memory, due to the increasing prevalence of large-scale systems with multicore, multisocket nodes. Previous work has shown that energy efficiency can be improved using software-controlled execution schemes that consider both the programming model and the power-aware execution capabilities of the system. However, such approaches have focused on identifying optimal resource utilization for one programming model, either shared memory or message passing, in isolation. The potential solution space, thus the challenge, increases substantially when optimizing hybrid models since the possible resource configurations increase exponentially. Nonetheless, with the accelerating adoption of hybrid programming models, we increasingly need improved energy efficiency in hybrid parallel applications on large-scale systems. In this work, we present new software-controlled execution schemes that consider the effects of dynamic concurrency throttling (DCT) and dynamic voltage and frequency scaling (DVFS) in the context of hybrid programming models. Specifically, we present predictive models and novel algorithms based on statistical analysis that anticipate application power and time requirements under different concurrency and frequency configurations. We apply our models and methods to the NPB MZ benchmarks and selected applications from the ASC Sequoia codes. Overall, we achieve substantial energy savings (8.74 percent on average and up to 13.8 percent) with some performance gain (up to 7.5 percent) or negligible performance loss.
Discrete cosine transforms, Concurrent computing, Programming, Computational modeling, Time frequency analysis, Dynamic programming, Multicore processing, dynamic voltage and frequency scaling, Power management, hybrid parallel programming models, dynamic concurrency throttling
Dong Li, Bronis R. de Supinski, Martin Schulz, Dimitrios S. Nikolopoulos, Kirk W. Cameron, "Strategies for Energy-Efficient Resource Management of Hybrid Programming Models", IEEE Transactions on Parallel & Distributed Systems, vol.24, no. 1, pp. 144-157, Jan. 2013, doi:10.1109/TPDS.2012.95
[1] OpenMP Architecture Rev. Board, "OpenMP Fortran/C/C++ Application Programming Interface," version 3.0, May 2008.
[2] M. Snir, S. Otto, S. Huss-Lederman, D. Walker, and J. Dongarra, MPI: The Complete Reference, second ed., vol. 1. MIT Press, 1998.
[3] R. Rabenseifner and G. Wellein, "Communication and Optimization Aspects of Parallel Programming Models on Hybrid Architectures," Int'l J. High Performance Computing Applications, vol. 17, no. 1, pp. 49-62, 2003.
[4] M. Curtis-Maury, J. Dzierwa, C.D. Antonopoulos, and D.S. Nikolopoulos, "Online Power-Performance Adaptation of Multithreaded Programs Using Hardware Event-Based Prediction," Proc. 20th ACM Int'l Conf. Supercomputing, pp. 157-166, 2006.
[5] M.A. Suleman, M.K. Qureshi, and Y.N. Patt, "Feedback-Driven Threading: Power-Efficient and High-Performance Execution of Multithreaded Workloads on CMPs," Proc. 13th ACM Symp. Architectural Support for Programming Languages and Operating Systems, pp. 277-286, 2008.
[6] M. Curtis-Maury, F. Blagojevic, C.D. Antonopoulos, and D.S. Nikolopoulos, "Prediction-Based Power-Performance Adaptation of Multithreaded Scientific Codes," IEEE Trans. Parallel and Distributed Systems, vol. 19, no. 10, pp. 1396-1410, Oct. 2008.
[7] A. Miyoshi, C. Lefurgy, E.V. Hensbergen, R. Rajamony, and R. Rajkumar, "Critical Power Slope: Understanding the Runtime Effects of Frequency Scaling," Proc. 16th Ann. ACM Int'l Conf. Supercomputing, pp. 35-44, 2002.
[8] H. Chung-Hsing and F. Wu-Chun, "A Power-Aware Run-Time System for High-Performance Computing," Proc. ACM/IEEE Conf. Supercomputing (SC '05), 2005.
[9] C. Isci and M. Martonosi, "Runtime Power Monitoring in High-End Processors: Methodology and Empirical Data," Proc. 36th Int'l Symp. Microarchitecture, pp. 93-104, 2003.
[10] V. Freeh, N. Kappiah, D. Lowenthal, and T. Bletsch, "Just-In-Time Dynamic Voltage Scaling: Exploiting Inter-node Slack to Save Energy in MPI Programs," Proc. Ann. ACM/IEEE Int'l Conf. Supercomputing (SC '05), 2005.
[11] Y. Dong, J. Chen, X. Yang, L. Deng, and X. Zhang, "Energy-Oriented OpenMP Parallel Loop Scheduling," Proc. Int'l Symp. Parallel and Distributed Processing with Applications, 2008.
[12] N. Kappiah, V. Freeh, and D. Lowenthal, "Just in Time Dynamic Voltage Scaling: Exploiting Inter-Node Slack to Save Energy in MPI Programs," Proc. ACM/IEEE Conf. Supercomputing (SC '05), 2005.
[13] M. Curtis-Maury, A. Shah, F. Blagojevic, D.S. Nikolopoulos, B.R. de Supinski, and M. Schulz, "Prediction Models for Multi-Dimensional Power-Performance Optimization on Many Cores," Proc. 17th Int'l Conf. Parallel Architectures and Compilation Techniques (PACT), pp. 250-259, 2008.
[14] R. Springer, D. Lowenthal, B. Rountree, and V. Freeh, "Minimizing Execution Time in MPI Programs on an Energy-Constrained, Power-Scalable Cluster," Proc. 11th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming (PPoPP), pp. 230-238, 2006.
[15] B. Rountree, D. Lowenthal, S. Funk, V. Freeh, B.R. de Supinski, and M. Schulz, "Bounding Energy Consumption in Large-Scale MPI Programs," Proc. ACM/IEEE Conf. Supercomputing (SC '07), 2007.
[16] V. Freeh and D. Lowenthal, "Using Multiple Energy Gears in MPI Programs on a Power-Scalable Cluster," Proc. 11th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming (PPoPP), pp. 164-173, 2007.
[17] B. Rountree, D.K. Lownenthal, B.R. de Supinski, M. Schulz, V.W. Freeh, and T. Bletsch, "Adagio: Making DVS Practical for Complex HPC Applications," Proc. 23rd Int'l Conf. Supercomputing, pp. 460-469, 2009.
[18] Lawrence Livermore Nat'l Laboratory, "ASC Sequoia Benchmarks,", 2012.
[19] M. Silvano and P. Toth, Knapsack Problems: Algorithms and Computer Implementations. John Wiley and Sons, 1990.
[20] T. Horvath and K. Skadron, "Multi-Mode Energy Management for Multi-Tier Server Clusters," Proc. 17th Int'l Conf. Parallel Architectures and Compilation Techniques (PACT), pp. 270-279, 2008.
[21] T. Li and L.K. John, "Run-Time Modeling and Estimation of Operating System Power Consumption," Proc. ACM SIGMETRICS Int'l Conf. Measurements and Modeling of Computer Systems, pp. 160-171, 2003.
[22] B. Mohr, A.D. Malony, S. Shende, and F. Wolf, "Design and Prototype of a Performance Tool Interface for OpenMP," Proc. Ann. Los Alamos Computer Science Inst. Symp. (LACSI), 2001.
[23] NASA, "NAS Parallel Benchmarks," npb.html, 2012.
[24] H. Jin and R. Van der Wijingaart, "Performance Characteristics of the Multi-Zone NAS Parallel Benchmarks," Proc. 18th Int'l Parallel and Distributed Processing Symp. (IPDPS), 2004.
[25] V.E. Henson and U.M. Yang, "BoomerAMG: A Parallel Algebraic Multigrid Solver and Preconditioner," Applied Numerical Math., vol. 41, pp. 155-177, 2000.
[26] C. Liao and B. Chapman, "A Compile-Time Cost Model for OpenMP," Proc. 21st Int'l Parallel and Distributed Proceeding Symp., 2007.
[27] G. Tournavitis, Z. Wang, B. Franke, and M.F. O$^\prime$ Boyle, "Towards a Holistic Approach to Auto-Parallelization Integrating Profile-Driven Parallelism Detection and Machine-Learning Based Mapping," Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation, pp. 177-187, 2009.
[28] S.W. Son, K. Malkowski, G. Chen, M. Kandemir, and P. Raghavan, "Reducing Energy Consumption of Parallel Sparse Matrix Applications through Integrated Link/CPU Voltage Scaling," J. Supercomputing, vol. 41, no. 3, pp. 179-213, 2007.