This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
An Adaptive Algorithm Selection Framework for Reduction Parallelization
October 2006 (vol. 17 no. 10)
pp. 1084-1096

Abstract—Irregular and dynamic memory reference patterns can cause performance variations for low level algorithms in general and for parallel algorithms in particular. In this paper, we present an adaptive algorithm selection framework which can collect and interpret the characteristics of a particular instance of parallel reduction algorithms and select the best performing one from an existing library. The framework consists of the following components: 1) an offline systematic process for characterizing the input sensitivity of parallel reduction algorithms and a method for building corresponding predictive performance models, 2) an online input characterization and algorithm selection module, and 3) a small library of parallel reduction algorithms, which represent the algorithmic choices made available at runtime. We also present one possible integration of this framework in a restructuring compiler. We validate our design experimentally and show that our framework 1) selects the most appropriate algorithms in 85 percent of the cases studied, 2) overall, delivers 98 percent of the optimal performance, 3) adaptively selects the best algorithms for dynamic phases of a running program (resulting in performance improvements otherwise not possible), and 4) adapts to the underlying machine architectures (evaluated on IBM Regatta and HP V-Class systems).

[1] P. An et al., “STAPL: An Adaptive, Generic Parallel C++ Library,” Proc. Workshop Languages and Compilers for Parallel Computing (LCPC), 2001.
[2] W. Blume et al., “Advanced Program Restructuring for High-Performance Computers with Polaris,” Computer, vol. 29, no. 12, pp. 78-82, Dec. 1996.
[3] E.A. Brewer, “High-Level Optimization via Automated Statistical Modeling,” Proc. Symp. Principles and Practice of Parallel Programming (PPoPP), 1995.
[4] J.G. Castanos and J.E. Savage, “Repartitioning Unstructured Adaptive Meshes,” Proc. Int'l Parallel and Distributed Processing Symp. (IPDPS), 2000.
[5] C. Ding and K. Kennedy, “Improving Cache Performance of Dynamic Applications with Computation and Data Layout Transformations,” Proc. Conf. Programming Language Design and Implementation (PLDI), 1999.
[6] P.C. Diniz and M.C. Rinard, “Dynamic Feedback: An Effective Technique for Adaptive Computing,” Proc. Conf. Programming Language Design and Implementation (PLDI), 1997.
[7] R. Eigenmann et al., “Experience in the Automatic Parallelization of Four Perfect Benchmark Programs,” Proc. Workshop Languages and Compilers for Parallel Computing (LCPC), 1991.
[8] M. Frigo, “A Fast Fourier Transform Compiler,” Proc. Conf. Programming Language Design and Implementation (PLDI), 1999.
[9] H. Han and C.-W. Tseng, “Improving Compiler and Run-Time Support for Adaptive Irregular Codes,” Proc. Int'l Conf. Parallel Architectures and Compilation Techniques (PACT), 1998.
[10] Y.-S. Hwang et al., “Runtime and Language Support for Compiling Adaptive Irregular Programs on Distributed-Memory Machines,” Software— Practice and Experience, vol. 25, no. 6, pp. 597-621, 1995.
[11] R. Jain, The Art of Computer Systems Performance Analysis. John Wiley & Sons, 1991.
[12] C.P. Kruskal, “Efficient Parallel Algorithms for Graph Problems,” Proc. Int'l Conf. Parallel Processing (ICPP), 1986.
[13] T. Kurc et al., “Querying Very Large Multi-Dimensional Datasets in ADR,” Proc. Supercomputing Conf. (SC), 1999.
[14] F.T. Leighton, Introduction to Parallel Algorithms and Architectures. Morgan Kaufmann, 1992.
[15] X. Li, M.J. Garzaran, and D.A. Padua, “A Dynamically Tuned Sorting Library,” Proc. Symp. Code Generation and Optimization (CGO), 2004.
[16] Y. Lin and D.A. Padua, “On the Automatic Parallelization of Sparse and Irregular Fortran Programs,” Proc. Workshop Languages, Compilers, and Run-Time Support for Scalable Systems (LCR), 1998.
[17] A. Miller, Subset Selection in Regression, second ed. Chapman & Hall/CRC, 2002.
[18] L. Oliker and R. Biswas, “Parallelization of a Dynamic Unstructured Application Using Three Leading Paradigms,” Proc. Supercomputing Conf. (SC), 1999.
[19] OpenMP Fortran Application Program Interface, version 2.0. OpenMP Architecture Rev. Board, 2000.
[20] J.H. Saltz et al., “Run-Time Parallelization and Scheduling of Loops,” IEEE Trans. Computers, vol. 40, no. 5, pp. 603-612, May 1991.
[21] N. Thomas et al., “A Framework for Adaptive Algorithm Selection in STAPL,” Proc. Symp. Principles and Practice of Parallel Programming (PPoPP), 2005.
[22] S. Turek and C. Becker, FEATFLOW: Finite Element Software for the Incompressible Navier-Strokes Equations, User Manual, release 1.1. Inst. for Applied Math., Univ. of Heidelberg, 1998.
[23] R. von Hanxleden, “Handling Irregular Problems with Fortran D — A Preliminary Report,” Proc. Workshop Compilers for Parallel Computers (CPSC), 1993.
[24] R.C. Whaley, A. Petitet, and J. Dongarra, “Automated Empirical Optimizations of Software and the ATLAS Project,” Parallel Computing, vol. 27, nos. 1-2, pp. 3-25, 2001.
[25] J. Wu, J.H. Saltz, S. Hiranandani, and H. Berryman, “Runtime Compilation Methods for Multicomputers,” Proc. Int'l Conf. Parallel Processing (ICPP), 1991.
[26] J. Xiong, J. Johnson, R. Johnson, and D.A. Padua, “SPL: A Language and Compiler for DSP Algorithms,” Proc. Conf. Programming Language Design and Interpretation (PLDI), 2001.
[27] H. Yu, F. Dang, and L. Rauchwerger, “Parallel Reduction: An Application of Adaptive Algorithm Selection,” Proc. Workshop Language Compilers for Parallel Computing (LCPC), 2002.
[28] H. Yu and L. Rauchwerger, “Adaptive Reduction Parallelization Techniques,” Proc. Int'l Conf. Supercomputing (ICS), 2000.
[29] H.P. Zima, Supercompilers for Parallel and Vector Computers. ACM Press, 1991.
[30] G. Zoppetti et al., “Compiler and Runtime Support for Irregular Reductions on a Multithreaded Architecture,” Proc. Int'l Parallel and Distributed Processing Symp (IPDPS), 2002.

Index Terms:
Runtime parallelization, adaptive optimization, reduction parallelization, compiler optimization.
Citation:
Hao Yu, Lawrence Rauchwerger, "An Adaptive Algorithm Selection Framework for Reduction Parallelization," IEEE Transactions on Parallel and Distributed Systems, vol. 17, no. 10, pp. 1084-1096, Oct. 2006, doi:10.1109/TPDS.2006.131
Usage of this product signifies your acceptance of the Terms of Use.