|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
| ASCII Text | x | ||
| Fabrice Rastello, Yves Robert, "Automatic Partitioning of Parallel Loops with Parallelepiped-Shaped Tiles," IEEE Transactions on Parallel and Distributed Systems, vol. 13, no. 5, pp. 460-470, May, 2002. | |||
| BibTex | x | ||
| @article{ 10.1109/TPDS.2002.1003856, author = {Fabrice Rastello and Yves Robert}, title = {Automatic Partitioning of Parallel Loops with Parallelepiped-Shaped Tiles}, journal ={IEEE Transactions on Parallel and Distributed Systems}, volume = {13}, number = {5}, issn = {1045-9219}, year = {2002}, pages = {460-470}, doi = {http://doi.ieeecomputersociety.org/10.1109/TPDS.2002.1003856}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - JOUR JO - IEEE Transactions on Parallel and Distributed Systems TI - Automatic Partitioning of Parallel Loops with Parallelepiped-Shaped Tiles IS - 5 SN - 1045-9219 SP460 EP470 EPD - 460-470 A1 - Fabrice Rastello, A1 - Yves Robert, PY - 2002 KW - Compilation technique KW - hierarchical memory systems KW - loop partitioning KW - tiling KW - cache KW - data locality KW - footprint KW - out-of-core algorithms. VL - 13 JA - IEEE Transactions on Parallel and Distributed Systems ER - | |||
In this paper, an efficient algorithm to implement
[1] A. Agarwal, D. Kranz, and V. Natarajan, “Automatic Partitioning of Parallel Loops and Data Arrays for Distributed Shared-Memory Multiprocessors,” IEEE Trans. Parallel and Distributed Systems, vol. 6, no. 9, pp. 943-962, Sept. 1995.
[2] P. Boulet, A. Darte, T. Risset, and Y. Robert, "(Pen)-Ultimate Tiling," Integration, VLSI J., vol. 17, pp. 33-51, 1994.
[3] J. Brenner and L. Cummings, “The Hadamard Maximum Determinant Problem,” Am. Math. Monthly, vol. 79, pp. 626-630, 1972.
[4] P.-Y. Calland and T. Risset, Precise Tiling for Uniform Loop Nests Application Specific Array Processors, P. Cappello, C. Mongenet, G.-R. Perrin, P. Quinton, and Y. Robert, eds., pp. 330-337, July 1995.
[5] Y-S. Chen, S-D. Wang, and C-M. Wang, “Tiling Nested Loops into Maximal Rectangular Blocks,” J. Parallel and Distributed Computing, vol. 35, no. 2, pp. 123-32, 1996.
[6] M. Cierniak and W. Li, “Unifying Data and Control Transformations for Distributed Shared Memory Machines,” Proc. SIGPLAN Conf. Programming Language Design and Implementation, June 1995.
[7] K. Högstedt, L. Carter, and J. Ferrante, “Determining the Idle Time of a Tiling,” Proc. Symp. Principles of Programming Languages, Jan. 1997.
[8] F. Irigoin and R. Triolet, “Supernode Partitioning,” Proc. 15th ACM Symp. Principles of Programming Languages, pp. 319-329, Jan. 1988.
[9] M. Kandemir, A. Choudhary, P. Banerjee, J. Ramanujam, and N. Shenoy, “Minimizing Data and Synchronization Costs in One-Way Communication,” IEEE Trans. Parallel and Distributed Systems, vol. 11, no. 12, pp. 1232-1251, Dec. 2000.
[10] M. Kandemir, A. Choudhary, J. Ramanujam, and M. Kandaswamy, “A Unified Framework for Optimizing Locality, Parallelism, and Communication in Out-of-Core Computations,” IEEE Trans. Parallel and Distributed Systems, vol. 11, no. 7, pp. 648-668, July 2000.
[11] M. Kandemir, A. Choudhary, N. Shenoy, P. Banerjee, and J. Ramanujam, “A Linear Algebra Framework for Automatic Determination of Optimal Data Layouts,” IEEE Trans. Parallel and Distributed Systems, vol. 10, no. 2, pp. 115-135, Feb. 1999.
[12] M. Kandemir and J. Ramanujam, “Data Relation Vectors: A New Abstraction for Data Optimizations,” IEEE Trans. Computers, vol. 50, no. 8, pp. 798-810, Aug. 2001.
[13] M. Kandemir, J. Ramanujam, and A. Choudhary, “Improving Cache Locality by a Combination of Loop and Data Transformations,” IEEE Trans. Computers, vol. 48, no. 2, pp. 159-167, Feb. 1999. A preliminary version appears in Proc. 11th ACM Int'l Conf. Supercomputing (ICS '97), pp. 269-276, July 1997.
[14] R. Schreiber and J.J. Dongarra, "Automatic Blocking of Nested Loops," Technical Report 90.38, RIACS, Aug. 1990.
[15] M. Wolf and M. Lam, “A Loop Transformation Theory and an Algorithm to Maximize Parallelism,” IEEE Trans. Parallel and Distributed Systems, vol. 2, no. 4, Oct. 1991.

