
This Article  
 
Share  
Bibliographic References  
Add to:  
Digg Furl Spurl Blink Simpy Del.icio.us Y!MyWeb  
Search  
 
ASCII Text  x  
Edin Hodzic, Weijia Shang, "On Time Optimal Supernode Shape," IEEE Transactions on Parallel and Distributed Systems, vol. 13, no. 12, pp. 12201233, December, 2002.  
BibTex  x  
@article{ 10.1109/TPDS.2002.1158261, author = {Edin Hodzic and Weijia Shang}, title = {On Time Optimal Supernode Shape}, journal ={IEEE Transactions on Parallel and Distributed Systems}, volume = {13}, number = {12}, issn = {10459219}, year = {2002}, pages = {12201233}, doi = {http://doi.ieeecomputersociety.org/10.1109/TPDS.2002.1158261}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }  
RefWorks Procite/RefMan/Endnote  x  
TY  JOUR JO  IEEE Transactions on Parallel and Distributed Systems TI  On Time Optimal Supernode Shape IS  12 SN  10459219 SP1220 EP1233 EPD  12201233 A1  Edin Hodzic, A1  Weijia Shang, PY  2002 KW  Supernode transformation KW  tiling KW  algorithm partitioning KW  parallelizing compilers KW  minimizing running time KW  distributed memory multicomputer. VL  13 JA  IEEE Transactions on Parallel and Distributed Systems ER   
Abstract—With the objective of minimizing the total execution time of a parallel program on a distributed memory parallel computer, this paper discusses the selection of an optimal supernode shape of a supernode transformation (also known as tiling). We identify three parameters of a supernode transformation: supernode size, relative side lengths, and cutting hyperplane directions. For supernode transformations on algorithms with perfectly nested loops and uniform dependencies, we prove the optimality of a constant linear schedule vector and give a necessary and sufficient condition for optimal relative side lengths. We also prove that the total running time is minimized by a cutting hyperplane direction matrix from a particular subset of all valid directions and we discuss the cases where this subset is unique. The results are derived in continuous space and should be considered approximate. Our model does not include cache effects and assumes an unbounded number of available processors, the communication cost approximated by a constant, uniform dependences, and loop bounds known at compile time. A comprehensive example is discussed with an application of the results to the Jacobi algorithm.
[1] C. Ancourt and F. Irigoin, “Scanning Polyhedra with DO Loops,” Proc. Third ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, vol. 26, pp. 3950, Apr. 1991.
[2] R. Andonov, H. Bourzoufi, and S. Rajopadhye, “TwoDimensional Orthogonal Tiling: From Theory to Practice,” Proc. Int'l Conf. High Performance Computing (HiPC), pp. 225231, Dec. 1996.
[3] R. Andonov and S. Rajopadhye, “Optimal Orthogonal Tiling of 2D Iterations,” J. Parallel and Distributed Computing, vol. 45, no. 2, pp. 159165, Sept. 1997.
[4] R. Andonov, S. Rajopadhye, and N. Yanev, “Optimal Orthogonal Tiling,” Proc. Fourth Int'l EuroPar Conf., D. Pritchard and J. Reeve, eds. pp. 480490, Sept. 1998.
[5] P. Boulet, A. Darte, T. Risset, and Y. Robert, “(Pen)Ultimate Tiling,” INTEGRATION, the VLSI J., vol. 17, pp. 3351, 1994.
[6] A. Darte, L. Khachiyan, and Y. Robert, “Linear Scheduling is Nearly Optimal,” Parallel Processing Letters, vol. 1, no. 2, pp. 7381, 1991.
[7] I.T. Foster, Designing and Building Parallel Programs: Concepts and Tools for Parallel Software Engineering. AddisonWesley, 1995.
[8] G. Goff, K. Kennedy, and C.W. Tseng, “Practical Dependence Testing,” Proc. ACM SIGPLAN '91 Conf. Programming Language Design and Implementation (PLDI), vol. 26, pp. 1529, 1991.
[9] E. Hodzic, “Time Optimal Tiling of Algorithms with Uniform Dependencies for Distributed Memory Parallel Computers,” PhD thesis, Santa Clara Univ., Santa Clara, Calif., June 1999.
[10] E. Hodzic and W. Shang, “On Supernode Partitioning Hyperplanes for Two Dimensional Algorithms,” Proc. IASTED Int'l Conf. Parallel and Distributed Computing and Networks (PDCN '97), pp. 8388, August 1997.
[11] E. Hodzic and W. Shang, “Time Optimal Supernode Shape for Algorithms with $\big. n\bigr.$ Extreme Dependence Directions,” Proc. Second IASTED Int'l Conf., PDCN, pp. 577583, Dec. 1998.
[12] E. Hodzic and W. Shang, “On Time Optimal Supernode Shape,” Proc. Int'l Conf. Parallel and Distributed Processing Techniques and Applications, June 1999.
[13] E. Hodzic and W. Shang, “On Optimal Size and Shape of Supernode Transformations,” Proc. Int'l Conf. Parallel Processing, vol. 3, pp. 2534, Aug. 1996.
[14] E. Hodzic and W. Shang, “On Supernode Transformation with Minimized Total Running Time,” IEEE Trans. Parallel and Distributed Systems, vol. 9, no. 5, pp. 417428, May 1998.
[15] K. Högstedt, L. Carter, and J. Ferrante, “Selecting Tile Shape for Minimal Execution Time,” Proc. ACM Symp. Parallel Algorithms and Architectures, pp. 201211 1999.
[16] F. Irigoin, “Partitionnement des Boucles Imbriquées, Une Technique D'Optimisation pour les Programmes Scientifiques,” PhD thesis, École Nationale Supérieure des Mines de Paris, June 1987.
[17] F. Irigoin and R. Troilet, “Supernode Partitioning,” Proc. 15th Ann. ACM Symp. Principles of Programming Languages, pp. 319329, 1988.
[18] R.M. Karp, R.E. Miller, and S. Winograd, “The Organization of Computations for Uniform Recurrence Equations,” J. ACM, vol. 14, no. 3, pp. 563590, July 1967.
[19] M.S. Lam, E.E. Rothberg, and M.E. Wolf, “The Cache Performance and Optimizations of Blocked Algorithms,” Proc. Fourth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 6374, Apr. 1991.
[20] H. Ohta, Y. Saito, M. Kainaga, and H. Ono, “Optimal Tile Size Adjustment in Compiling General Doacross Loop Nests,” Proc. Int'l Conf. Supercomputing, pp. 270279, 1995.
[21] M.J. Quinn, Parallel Computing: Theory and Practice, second ed. New York: McGrawHill, 1994.
[22] J. Ramanujam and P. Sadayappan, “Tiling Multidimensional Iteration Spaces for Multicomputers,” J. Parallel and Distributed Computing, vol. 16, no. 2, pp. 108230, 1992.
[23] A. Schrijver, Theory of Linear and Integer Programming. New York: John Wiley & Sons, 1986.
[24] W. Shang and J.A.B. Fortes, “Time Optimal Linear Schedules for Algorithms with Uniform Dependencies,” IEEE Trans. Computers, vol. 40, no. 6, pp. 723742, June 1991.
[25] W. Shang and J.A.B. Fortes, “Independent Partitioning of Algorithms with Uniform Dependencies,” IEEE Trans. Computers, vol. 41, no. 2, pp. 190206, Feb. 1992.
[26] B. Sinharoy and B. Szymanski, “Finding Optimum Wavefront of Parallel Computation,” J. Parallel Algorithms and Applications, vol. 2, no. 1, pp. 526, 1994.
[27] M. Wolfe, “More Iteration Space Tiling,” Proc. Supercomputing '89, pp. 655664, Nov. 1989.
[28] J. Xue, “CommunicationMinimal Tiling of Uniform Dependence Loops,” J. Parallel and Distributed Computing, vol. 1, pp. 4259, 1997.
[29] J. Xue, “On Tiling as a Loop Transformation,” Parallel Processing Letters, vol. 7, pp. 409424, 1997.