This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Distribution Assignment Placement: Effective Optimization of Redistribution Costs
June 2002 (vol. 13 no. 6)
pp. 628-647

Data locality and workload balance are key factors for getting high performance out of data-parallel programs on multiprocessor architectures. Data-parallel languages such as High-Performance Fortran (HPF) thus offer means allowing a programmer both to specify data distributions, as well as to change them dynamically in order to maintain these properties. On the other hand, redistributions can be quite expensive and significantly degrade a program's performance. They must thus be reduced to a minimum. In this article, we present a novel, aggressive approach for avoiding unnecessary remappings which works by eliminating partially dead and partially redundant distribution changes. Basically, this approach evolves from extending and combining two algorithms for these optimizations achieving each on its own optimal results. In distinction to the sequential setting, the data-parallel setting leads naturally to a family of algorithms of varying power and efficiency allowing requirement-customized solutions. The power and flexibility of the new approach are demonstrated by various examples, which range from typical HPF fragments to real world programs. Performance measurements underline its importance and show its effectivity on different hardware platforms and different settings.

[1] G. Agrawal, J. Saltz, and R. Das, “Interprocedural Partial Redundancy Elimination and Its Application to Distributed Memory Compilation,” Proc. ACM SIGPLAN '95 Conf. Programming Language Design and Implementation (PLDI '95), vol. 30, no. 6, pp. 258-269, 1995.
[2] E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, S. Ostrouchov, and D. Sorensen, LAPACK Users' Guide, Release 1.0. Philadelphia: SIAM, 1992.
[3] L. Blackford, J. Choi, A. Cleary, E. D'Azevedo, J. Demmel, I. Dhillon, J. Dongarra, S. Hammarling, G. Henry, A. Petitet, K. Stanley, D. Walker, and R.C. Whaley, ScaLAPACK Users' Guide. Philadelphia, Penn.: SIAM, 1997.
[4] L.S. Blackford, J.J. Dongarra, C.A. Papadopoulos, and R.C. Whaley, “Installation Guide and Design of the HPF 1.1 Interface to ScaLAPACK, SLHPF,” Technical Report CS-98-396, Univ. of Tennessee, Aug. 1998.
[5] P. Blaha, K. Schwraz, G. Madsen, D. Kvasnicka, and J. Luitz, “WIEN97/WIEN2k,” Institute of Materials Chemistry, TU Vienna.http://www.wien2k.atindex.html, 2002.
[6] Y.-C. Chung, C.-H. Hsu, and S.-W. Bai, “A Basic-Cycle Calculation Technique for Efficient Dynamic Data Redistribution,” IEEE Trans. Parallel and Distributed Systems, vol. 9, no. 4, Apr. 1998.
[7] F. Coelho and C. Ancourt, “Optimal Compilation of HPF Remappings,” J. Parallel and Distributed Computing, vol. 38, no. 2, pp. 229-236, Nov. 1996.
[8] D.M. Dhamdhere and H. Patil, "An Elimination Algorithm for Bidirectional Data Flow Problems Using Edge Placement," ACM Trans. Programming Languages and Systems, vol. 15, no. 2, pp. 312-336, Apr. 1993.
[9] H.J. Ehold, W.N. Gansterer, D.F. Kvasnicka, and C.W. Ueberhuber, “High Local Performance in HPF Codes,” Technical Report Aurora TR2000-06, Inst. for Software Science, Univ. of Vienna, Austria, 2000. Electronically available athttp://www.vcpc.univie.ac.at/aurorapublications /.
[10] H.J. Ehold, W. Gansterer, D.F. Kvasnicka, and C.W. Ueberhuber, “HPF and Numerical Libraries,” Proc. Fourth Int'l ACPC Conf., Feb. 1999.
[11] T. Fahringer and E. Mehofer, “Buffer-Safe and Cost-Driven Communication Optimization,” J. Parallel and Distributed Computing, vol. 57, pp. 33-63, 1999.
[12] High Performance Fortran Forum “High Performance Fortran Language Specification Version 2.0,” technical report, Rice Univ., Houston, TX, Jan. 1997. Available via HPFF home page:http://www.crpc.rice.eduHPFF.
[13] Message Passing Interface Forum MPI-2: Extensions to the Message-Passing Interface. July 1997.
[14] G. Fox, S. Hiranandani, K. Kennedy, C. Koelbel, U. Kremer, C.-W. Tseng, and M.-Y. Wu, “FORTRAN D Language Specification,” technical report, Rice Univ., Houston, TX, Jan. 1992.
[15] A. Geser, J. Knoop, G. Lüttgen, O. Rüthing, and B. Steffen, “Non-Monotone Fixpoint Iterations to Resolve Second Order Effects,” Proc. Sixth Int'l Conf. Compiler Construction (CC '96), pp. 106-120, 1996.
[16] R. Giegerich, U. Möncke, and R. Wilhelm, “Invariance of Approximative Semantics with Respect to Program Transformations,” Proc. Third Conf. European Co-Operation in Informatics, Informatik-Fachberichte, vol. 50, pp. 1-10, 1981.
[17] M. Gupta, E. Schonberg, and H. Srinivasan, “A Unified Data-Flow Framework for Optimizing Communication,” Proc. Seventh Workshop Languages and Compilers for Parallel Computing, Aug. 1994.
[18] M.W. Hall, S. Hiranandani, K. Kennedy, and C.-W. Tseng, "Interprocedural Compilation of Fortran D for MIMD Distributed-Memory Machines," Proc. Supercomputing '92, pp. 522-534.Los Alamitos, Calif.: IEEE CS Press, Nov. 1992.
[19] M.S. Hecht, Flow Analysis of Computer Programs. North-Holland: Elsevier, 1977.
[20] S.D. Kaushik, C.H. Huang, J. Ramanujam, and P. Sadayappan, “Multi-Phase Array Redistribution: Modeling and Evaluation,” Proc. Int'l Parallel Processing Symp., 1995.
[21] K. Kennedy and A. Sethi, “A Constraint Based Communication Placement Framework,” Technical Report CRPC-TR95515-S, Dept. of Computer Science, Rice Univ., Houston, TX, Feb. 1995.
[22] J. Knoop and E. Mehofer, “Distribution Assignment Placement: A New Aggressive Approach for Optimizing Redistribution Costs,” Technical Report TR 97-6, Inst. for Software Science, Univ. of Vienna, Austria, 1997.
[23] J. Knoop and E. Mehofer, “Interprocedural Distribution Assignment Placement: More than just Enhancing Intraprocedural Placing Techniques,” Proc. Fifth IEEE Int'l Conf. Parallel Architectures and Compilation Techniques (PACT '97), pp. 26-37, 1997.
[24] J. Knoop and E. Mehofer, “Optimal Distribution Assignment Placement“ Proc. Third European Conf. Parallel Processing (Euro-Par '97), vol. 1300, pp. 364-373, 1997.
[25] J. Knoop, O. Rüthing, and B. Steffen, "Lazy Code Motion," Proc. ACM SIGPLAN '92 Conf. Program Language Design and Implementation,San Francisco, June 1992.
[26] J. Knoop, O. Rüthing, and B. Steffen, “Optimal Code Motion: Theory and Practice,” ACM Trans. Programming Languages and Systems, vol. 16, no. 4, pp. 1117-1155, 1994.
[27] J. Knoop, O. Rüthing, and B. Steffen, “Partial Dead Code Elimination,” Proc. ACM SIGPLAN '94 Conf. Programming Language Design and Implementation (PLDI '94), vol. 29, no. 6, pp. 147-158, 1994.
[28] J. Knoop, O. Rüthing, and B. Steffen, “The Power of Assignment Motion,” Proc. ACM SIGPLAN '95 Conf. Programming Language Design and Implementation (PLDI '95), vol. 30, no. 6, pp. 233-245, 1995.
[29] J. Knoop, O. Rüthing, and B. Steffen, “Towards a Tool Kit for the Automatic Generation of Interprocedural Data Flow Analyses,” J. Programming Languages, vol. 4, no. 4, pp. 211-246, 1996.
[30] U. Kremer, “Automatic Data Layout for Distributed Memory Machines,” PhD thesis, Rice Univ., Houston, TX, Oct. 1995.
[31] E. Mehofer and H. Zima, “Distribution Assignment Placement,” Technical Report TR-96-5, Inst. for Software Science, Univ. of Vienna, Austria, 1996.
[32] S.S. Muchnick, Advanced Compiler Design and Implementation, Morgan Kaufmann, San Francisco, Calif., 1997.
[33] NPAC “HPF Applications Kernels,” Northeast Parallel Architectures Center, Syracuse Univ., NY,http://www.npac.syr.eduhpfa/, Nov. 1996.
[34] D.J. Palermo, E.W. Hodges, and P. Banerjee, “Interprocedural Array Redistribution Data-Flow Analysis,” Proc. Ninth Workshop Languages and Compilers for Parallel Computing, pp. 435-449, Aug. 1996.
[35] S. Ramaswamy, B. Simons, and P. Banerjee, “Optimizations for Efficient Array Redistribution on Distributed Memory Multicomputers,” J. Parallel and Distributed Computing, vol. 38, no. 2, pp. 217-228, Nov. 1996.
[36] S. Ranka, H.W. Yau, K.A. Hawick, and G.C. Fox, “High-Performance Fortran for SPMD Programming: An Applications Overview,” Technical Report SCCS-805, Syracuse Univ., NPAC, Syracuse, NY, May 1997.
[37] B.K. Rosen, M.N. Wegman, and F.K. Zadeck, “Global Value Numbers and Redundant Computations,” Conf. Record 15th ACM Symp. Principles of Programming Languages (POPL '88), pp. 2-27, 1988.
[38] B. Steffen, “Property-Oriented Expansion,” Proc. Third Static Analysis Symp. (SAS '96), pp. 22-41, 1996.
[39] R. Thakur, A. Choudhary, and J. Ramanujam, “Efficient Algorithms for Array Redistribution“ IEEE Trans. Parallel and Distributed Systems, vol. 7, no. 6 pp. 587-594, June 1996.
[40] VFCS/VFC Webpage, Inst. for Software Science, Univ. of Vienna,http://www.par.univie.ac.at/researchresearch-projects.html , 2002.
[41] H.W. Yau, G.C. Fox, and K.A. Hawick, “Evaluation of High Performance Fortran through Application Kernels,” Proc. High-Performance Computing and Networking (HPCN '97), Apr. 1997.
[42] H. Zima, P. Brezany, B. Chapman, P. Mehrotra, and A. Schwald, “Vienna Fortran—A Language Specification Version 1.1,” Technical Report ACPC/TR 92-4, Austrian Center for Parallel Computation, Mar. 1992.

Index Terms:
Data-parallel languages, High-Performance Fortran (HPF), dynamic data redistribution, data flow analysis, optimization, partially dead and partially redundant assignment elimination.
Citation:
Jens Knoop, Eduard Mehofer, "Distribution Assignment Placement: Effective Optimization of Redistribution Costs," IEEE Transactions on Parallel and Distributed Systems, vol. 13, no. 6, pp. 628-647, June 2002, doi:10.1109/TPDS.2002.1011416
Usage of this product signifies your acceptance of the Terms of Use.