This Article 
 Bibliographic References 
 Add to: 
On Supernode Transformation with Minimized Total Running Time
May 1998 (vol. 9 no. 5)
pp. 417-428

Abstract—With the objective of minimizing the total execution time of a parallel program on a distributed memory parallel computer, this paper discusses how to find an optimal supernode size and optimal supernode relative side lengths of a supernode transformation (also known as tiling). We identify three parameters of supernode transformation: supernode size, relative side lengths, and cutting hyperplane directions. For algorithms with perfectly nested loops and uniform dependencies, for sufficiently large supernodes and number of processors, and for the case where multiple supernodes are mapped to a single processor, we give an order n polynomial whose real positive roots include the optimal supernode size. For two special cases, 1) two-dimensional algorithm problems and 2) n-dimensional algorithm problems, where the communication cost is dominated by the startup penalty and, therefore, can be approximated by a constant, we give a closed form expression for the optimal supernode size, which is independent of the supernode relative side lengths and cutting hyperplanes. For the case where the algorithm iteration index space and the supernodes are hyperrectangular, we give closed form expressions for the optimal supernode relative side lengths. Our experiment shows a good match of the closed form expressions with experimental data.

[1] C. Ancourt and R. Triolet, "Scanning Polyhedra with Do Loops," Proc. Third ACM Symp. Principles and Practice of Parallel Programming, pp. 39-50, 1991.
[2] R. Andonov and S. Rajopadhye, "Optimal Tiling," Technical Report PI-792, IRISA, Campus de Beaulieu, Rennes, France, Jan. 1994.
[3] A.L. Beguelin, J.J. Dongarra, G.A. Geist, W.C. Jiang, R.J. Manchek, B.K. Moore, and V.S. Sunderam, PVM version 3.3: Parallel Virtual Machine System.Knoxville, Tenn.: Univ. of Tennessee, Oak Ridge Tenn.: Oak Ridge Nat'l Laboratory, Atlanta, Ga.: Emory Univ., 1994.
[4] P. Boulet, A. Darte, T. Risset, and Y. Robert, "(Pen)-Ultimate Tiling," Integration, VLSI J., vol. 17, pp. 33-51, 1994.
[5] G. Goff, K. Kennedy, and C. Tseng, "Practical Dependence Testing," Proc. SIGPLAN '91 Conf. Programming Language Design and Implementation, pp. 15-29,Toronto, Canada, June 1991.
[6] A. Darte, L. Khachiyan, and Y. Robert, "Linear Scheduling is Nearly Optimal," Parallel Processing Letters, vol. 1.2, pp. 73-81, 1991.
[7] E. Hodzic and W. Shang, “On Optimal Size and Shape of Supernode Transformations,” Proc. 1996 Int'l Conf. Parallel Processing, pp. III25-III34, Aug. 1996.
[8] E. Hodzic and W. Shang, "On Supernode Transformations with Minimized Total Running Time," Proc. Int'l Conf. Application Specific Systems, Architectures, and Processors, pp. 402-414,Chicago, Aug. 1996.
[9] F. Irigoin and R. Triolet, “Supernode Partitioning,” Proc. 15th ACM Symp. Principles of Programming Languages, pp. 319-329, Jan. 1988.
[10] M. Lam, E. Rothberg, and M. Wolf, “The Cache Performance and Optimizations of Blocked Algorithms,” Proc. Fourth Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS '91), 1991.
[11] H. Ohta, Y. Saito, M. Kainaga, and H. Ono, "Optimal Tile Size Adjustment in Compiling General DOACROSS Loop Nests," Proc. 1995 Int'l Conf. Supercomputing, pp. 270-279. ACM Press, 1995.
[12] M.J. Quinn, Parallel Computing: Theory and Practice.New York: McGraw-Hill, 1994.
[13] J. Ramanujam and P. Sadayappan, "Tiling Multidimensional Iteration Spaces for Multicomputers," J. Parallel and Distributed Computing, vol. 16, pp. 108-120, 1992.
[14] R. Schreiber and J.J. Dongarra, "Automatic Blocking of Nested Loops," Technical Report 90.38, RIACS, Aug. 1990.
[15] W. Shang and J.A.B. Fortes, "Time Optimal Linear Schedules for Algorithms with Uniform Dependencies," IEEE Trans. Computers, vol. 40, June 1991.
[16] W. Shang and J.A.B. Fortes, "Independent Partitioning of Algorithms with Uniform Dependencies," IEEE Trans. Computers, vol. 41, no. 2, pp. 190-206, Feb. 1992.
[17] B. Sinharoy and B. Szymanski, "Finding Optimum Wavefront of Parallel Computation," J. Parallel Algorithms and Applications, vol. 2, no. 1, pp. 5-26, 1994.
[18] M.R. Steed and M.J. Clement, "Performance Prediction of PVM Programs," Proc. IPPS, pp. 803-807, 1996.
[19] M. Wolfe, “More Iteration Space Tiling,” Proc. Supercomputing '89, pp. 655-664, Nov. 1989.
[20] J. Xue, "On Tiling as a Loop Transformation," Parallel Processing Letters, vol. 7, no. 4, pp. 409-424, 1997.

Index Terms:
Supernode partitioning, tiling, parallelizing compilers, distributed memory multicomputer, minimizing running time.
Edin Hodzic, Weijia Shang, "On Supernode Transformation with Minimized Total Running Time," IEEE Transactions on Parallel and Distributed Systems, vol. 9, no. 5, pp. 417-428, May 1998, doi:10.1109/71.679213
Usage of this product signifies your acceptance of the Terms of Use.