This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Automatic Partitioning of Parallel Loops with Parallelepiped-Shaped Tiles
May 2002 (vol. 13 no. 5)
pp. 460-470

In this paper, an efficient algorithm to implement loop partitioning is introduced and evaluated. We start from results of Agarwal et al. whose aim is to minimize the number of accessed data throughout the computation of a tile; this number is called the cumulative footprint of the tile. We improve these results along several directions. First, we derive a new formulation of the cumulative footprint, allowing for an analytical solution of the optimization problem stated in. Second, we deal with arbitrary parallelepiped-shaped tiles, as opposed to rectangular tiles in. We design an efficient heuristic to determine the optimal tile shape in this general setting and we show its usefulness using both examples from and a large collection of randomly generated data.

[1] A. Agarwal, D. Kranz, and V. Natarajan, “Automatic Partitioning of Parallel Loops and Data Arrays for Distributed Shared-Memory Multiprocessors,” IEEE Trans. Parallel and Distributed Systems, vol. 6, no. 9, pp. 943-962, Sept. 1995.
[2] P. Boulet, A. Darte, T. Risset, and Y. Robert, "(Pen)-Ultimate Tiling," Integration, VLSI J., vol. 17, pp. 33-51, 1994.
[3] J. Brenner and L. Cummings, “The Hadamard Maximum Determinant Problem,” Am. Math. Monthly, vol. 79, pp. 626-630, 1972.
[4] P.-Y. Calland and T. Risset, Precise Tiling for Uniform Loop Nests Application Specific Array Processors, P. Cappello, C. Mongenet, G.-R. Perrin, P. Quinton, and Y. Robert, eds., pp. 330-337, July 1995.
[5] Y-S. Chen, S-D. Wang, and C-M. Wang, “Tiling Nested Loops into Maximal Rectangular Blocks,” J. Parallel and Distributed Computing, vol. 35, no. 2, pp. 123-32, 1996.
[6] M. Cierniak and W. Li, “Unifying Data and Control Transformations for Distributed Shared Memory Machines,” Proc. SIGPLAN Conf. Programming Language Design and Implementation, June 1995.
[7] K. Högstedt, L. Carter, and J. Ferrante, “Determining the Idle Time of a Tiling,” Proc. Symp. Principles of Programming Languages, Jan. 1997.
[8] F. Irigoin and R. Triolet, “Supernode Partitioning,” Proc. 15th ACM Symp. Principles of Programming Languages, pp. 319-329, Jan. 1988.
[9] M. Kandemir, A. Choudhary, P. Banerjee, J. Ramanujam, and N. Shenoy, “Minimizing Data and Synchronization Costs in One-Way Communication,” IEEE Trans. Parallel and Distributed Systems, vol. 11, no. 12, pp. 1232-1251, Dec. 2000.
[10] M. Kandemir, A. Choudhary, J. Ramanujam, and M. Kandaswamy, “A Unified Framework for Optimizing Locality, Parallelism, and Communication in Out-of-Core Computations,” IEEE Trans. Parallel and Distributed Systems, vol. 11, no. 7, pp. 648-668, July 2000.
[11] M. Kandemir, A. Choudhary, N. Shenoy, P. Banerjee, and J. Ramanujam, “A Linear Algebra Framework for Automatic Determination of Optimal Data Layouts,” IEEE Trans. Parallel and Distributed Systems, vol. 10, no. 2, pp. 115-135, Feb. 1999.
[12] M. Kandemir and J. Ramanujam, “Data Relation Vectors: A New Abstraction for Data Optimizations,” IEEE Trans. Computers, vol. 50, no. 8, pp. 798-810, Aug. 2001.
[13] M. Kandemir, J. Ramanujam, and A. Choudhary, “Improving Cache Locality by a Combination of Loop and Data Transformations,” IEEE Trans. Computers, vol. 48, no. 2, pp. 159-167, Feb. 1999. A preliminary version appears in Proc. 11th ACM Int'l Conf. Supercomputing (ICS '97), pp. 269-276, July 1997.
[14] R. Schreiber and J.J. Dongarra, "Automatic Blocking of Nested Loops," Technical Report 90.38, RIACS, Aug. 1990.
[15] M. Wolf and M. Lam, “A Loop Transformation Theory and an Algorithm to Maximize Parallelism,” IEEE Trans. Parallel and Distributed Systems, vol. 2, no. 4, Oct. 1991.

Index Terms:
Compilation technique, hierarchical memory systems, loop partitioning, tiling, cache, data locality, footprint, out-of-core algorithms.
Citation:
Fabrice Rastello, Yves Robert, "Automatic Partitioning of Parallel Loops with Parallelepiped-Shaped Tiles," IEEE Transactions on Parallel and Distributed Systems, vol. 13, no. 5, pp. 460-470, May 2002, doi:10.1109/TPDS.2002.1003856
Usage of this product signifies your acceptance of the Terms of Use.