Publication 2002 Issue No. 12 - December Abstract - On Time Optimal Supernode Shape
On Time Optimal Supernode Shape
December 2002 (vol. 13 no. 12)
pp. 1220-1233
 ASCII Text x Edin Hodzic, Weijia Shang, "On Time Optimal Supernode Shape," IEEE Transactions on Parallel and Distributed Systems, vol. 13, no. 12, pp. 1220-1233, December, 2002.
 BibTex x @article{ 10.1109/TPDS.2002.1158261,author = {Edin Hodzic and Weijia Shang},title = {On Time Optimal Supernode Shape},journal ={IEEE Transactions on Parallel and Distributed Systems},volume = {13},number = {12},issn = {1045-9219},year = {2002},pages = {1220-1233},doi = {http://doi.ieeecomputersociety.org/10.1109/TPDS.2002.1158261},publisher = {IEEE Computer Society},address = {Los Alamitos, CA, USA},}
 RefWorks Procite/RefMan/Endnote x TY - JOURJO - IEEE Transactions on Parallel and Distributed SystemsTI - On Time Optimal Supernode ShapeIS - 12SN - 1045-9219SP1220EP1233EPD - 1220-1233A1 - Edin Hodzic, A1 - Weijia Shang, PY - 2002KW - Supernode transformationKW - tilingKW - algorithm partitioningKW - parallelizing compilersKW - minimizing running timeKW - distributed memory multicomputer.VL - 13JA - IEEE Transactions on Parallel and Distributed SystemsER -

Abstract—With the objective of minimizing the total execution time of a parallel program on a distributed memory parallel computer, this paper discusses the selection of an optimal supernode shape of a supernode transformation (also known as tiling). We identify three parameters of a supernode transformation: supernode size, relative side lengths, and cutting hyperplane directions. For supernode transformations on algorithms with perfectly nested loops and uniform dependencies, we prove the optimality of a constant linear schedule vector and give a necessary and sufficient condition for optimal relative side lengths. We also prove that the total running time is minimized by a cutting hyperplane direction matrix from a particular subset of all valid directions and we discuss the cases where this subset is unique. The results are derived in continuous space and should be considered approximate. Our model does not include cache effects and assumes an unbounded number of available processors, the communication cost approximated by a constant, uniform dependences, and loop bounds known at compile time. A comprehensive example is discussed with an application of the results to the Jacobi algorithm.

[1] C. Ancourt and F. Irigoin, “Scanning Polyhedra with DO Loops,” Proc. Third ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, vol. 26, pp. 3950, Apr. 1991.
[2] R. Andonov, H. Bourzoufi, and S. Rajopadhye, “Two-Dimensional Orthogonal Tiling: From Theory to Practice,” Proc. Int'l Conf. High Performance Computing (HiPC), pp. 225231, Dec. 1996.
[3] R. Andonov and S. Rajopadhye, “Optimal Orthogonal Tiling of 2-D Iterations,” J. Parallel and Distributed Computing, vol. 45, no. 2, pp. 159165, Sept. 1997.
[4] R. Andonov, S. Rajopadhye, and N. Yanev, “Optimal Orthogonal Tiling,” Proc. Fourth Int'l Euro-Par Conf., D. Pritchard and J. Reeve, eds. pp. 480490, Sept. 1998.
[5] P. Boulet, A. Darte, T. Risset, and Y. Robert, “(Pen)-Ultimate Tiling,” INTEGRATION, the VLSI J., vol. 17, pp. 3351, 1994.
[6] A. Darte, L. Khachiyan, and Y. Robert, “Linear Scheduling is Nearly Optimal,” Parallel Processing Letters, vol. 1, no. 2, pp. 7381, 1991.
[7] I.T. Foster, Designing and Building Parallel Programs: Concepts and Tools for Parallel Software Engineering. Addison-Wesley, 1995.
[8] G. Goff, K. Kennedy, and C.-W. Tseng, “Practical Dependence Testing,” Proc. ACM SIGPLAN '91 Conf. Programming Language Design and Implementation (PLDI), vol. 26, pp. 1529, 1991.
[9] E. Hodzic, “Time Optimal Tiling of Algorithms with Uniform Dependencies for Distributed Memory Parallel Computers,” PhD thesis, Santa Clara Univ., Santa Clara, Calif., June 1999.
[10] E. Hodzic and W. Shang, “On Supernode Partitioning Hyperplanes for Two Dimensional Algorithms,” Proc. IASTED Int'l Conf. Parallel and Distributed Computing and Networks (PDCN '97), pp. 8388, August 1997.
[11] E. Hodzic and W. Shang, “Time Optimal Supernode Shape for Algorithms with $\big. n\bigr.$ Extreme Dependence Directions,” Proc. Second IASTED Int'l Conf., PDCN, pp. 577583, Dec. 1998.
[12] E. Hodzic and W. Shang, “On Time Optimal Supernode Shape,” Proc. Int'l Conf. Parallel and Distributed Processing Techniques and Applications, June 1999.
[13] E. Hodzic and W. Shang, “On Optimal Size and Shape of Supernode Transformations,” Proc. Int'l Conf. Parallel Processing, vol. 3, pp. 2534, Aug. 1996.
[14] E. Hodzic and W. Shang, “On Supernode Transformation with Minimized Total Running Time,” IEEE Trans. Parallel and Distributed Systems, vol. 9, no. 5, pp. 417428, May 1998.
[15] K. Högstedt, L. Carter, and J. Ferrante, “Selecting Tile Shape for Minimal Execution Time,” Proc. ACM Symp. Parallel Algorithms and Architectures, pp. 201211 1999.
[16] F. Irigoin, “Partitionnement des Boucles Imbriquées, Une Technique D'Optimisation pour les Programmes Scientifiques,” PhD thesis, École Nationale Supérieure des Mines de Paris, June 1987.
[17] F. Irigoin and R. Troilet, “Supernode Partitioning,” Proc. 15th Ann. ACM Symp. Principles of Programming Languages, pp. 319329, 1988.
[18] R.M. Karp, R.E. Miller, and S. Winograd, “The Organization of Computations for Uniform Recurrence Equations,” J. ACM, vol. 14, no. 3, pp. 563590, July 1967.
[19] M.S. Lam, E.E. Rothberg, and M.E. Wolf, “The Cache Performance and Optimizations of Blocked Algorithms,” Proc. Fourth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 6374, Apr. 1991.
[20] H. Ohta, Y. Saito, M. Kainaga, and H. Ono, “Optimal Tile Size Adjustment in Compiling General Doacross Loop Nests,” Proc. Int'l Conf. Supercomputing, pp. 270279, 1995.
[21] M.J. Quinn, Parallel Computing: Theory and Practice, second ed. New York: McGraw-Hill, 1994.
[22] J. Ramanujam and P. Sadayappan, “Tiling Multidimensional Iteration Spaces for Multicomputers,” J. Parallel and Distributed Computing, vol. 16, no. 2, pp. 108230, 1992.
[23] A. Schrijver, Theory of Linear and Integer Programming. New York: John Wiley & Sons, 1986.
[24] W. Shang and J.A.B. Fortes, “Time Optimal Linear Schedules for Algorithms with Uniform Dependencies,” IEEE Trans. Computers, vol. 40, no. 6, pp. 723742, June 1991.
[25] W. Shang and J.A.B. Fortes, “Independent Partitioning of Algorithms with Uniform Dependencies,” IEEE Trans. Computers, vol. 41, no. 2, pp. 190206, Feb. 1992.
[26] B. Sinharoy and B. Szymanski, “Finding Optimum Wavefront of Parallel Computation,” J. Parallel Algorithms and Applications, vol. 2, no. 1, pp. 526, 1994.
[27] M. Wolfe, “More Iteration Space Tiling,” Proc. Supercomputing '89, pp. 655664, Nov. 1989.
[28] J. Xue, “Communication-Minimal Tiling of Uniform Dependence Loops,” J. Parallel and Distributed Computing, vol. 1, pp. 4259, 1997.
[29] J. Xue, “On Tiling as a Loop Transformation,” Parallel Processing Letters, vol. 7, pp. 409424, 1997.

Index Terms:
Supernode transformation, tiling, algorithm partitioning, parallelizing compilers, minimizing running time, distributed memory multicomputer.
Citation:
Edin Hodzic, Weijia Shang, "On Time Optimal Supernode Shape," IEEE Transactions on Parallel and Distributed Systems, vol. 13, no. 12, pp. 1220-1233, Dec. 2002, doi:10.1109/TPDS.2002.1158261