
This Article  
 
Share  
Bibliographic References  
Add to:  
Digg Furl Spurl Blink Simpy Del.icio.us Y!MyWeb  
Search  
 
ASCII Text  x  
Rumen Andonov, Stefan Balev, Sanjay Rajopadhye, Nicola Yanev, "Optimal SemiOblique Tiling," IEEE Transactions on Parallel and Distributed Systems, vol. 14, no. 9, pp. 944960, September, 2003.  
BibTex  x  
@article{ 10.1109/TPDS.2003.1233716, author = {Rumen Andonov and Stefan Balev and Sanjay Rajopadhye and Nicola Yanev}, title = {Optimal SemiOblique Tiling}, journal ={IEEE Transactions on Parallel and Distributed Systems}, volume = {14}, number = {9}, issn = {10459219}, year = {2003}, pages = {944960}, doi = {http://doi.ieeecomputersociety.org/10.1109/TPDS.2003.1233716}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }  
RefWorks Procite/RefMan/Endnote  x  
TY  JOUR JO  IEEE Transactions on Parallel and Distributed Systems TI  Optimal SemiOblique Tiling IS  9 SN  10459219 SP944 EP960 EPD  944960 A1  Rumen Andonov, A1  Stefan Balev, A1  Sanjay Rajopadhye, A1  Nicola Yanev, PY  2003 KW  2D uniform recurrences KW  biological sequence alignment KW  BSP model KW  communicationcompuation granularity KW  distributed memory machines KW  locality KW  loop blocking KW  MPI KW  perfect loop nests KW  SPMD. VL  14 JA  IEEE Transactions on Parallel and Distributed Systems ER   
Abstract—For 2D iteration space tiling, we address the problem of determining the tile parameters that minimize the total execution time on a parallel machine. We consider uniform dependency computations tiled so that (at least) one of the tile boundaries is parallel to the domain boundaries. We determine the optimal tile size as a
[1] A. Agarwal, D. Kranz, and V. Natarajan, “Automatic Partitioning of Parallel Loops and Data Arrays for Distributed SharedMemory Multiprocessors,” IEEE Trans. Parallel and Distributed Systems, vol. 6, no. 9, pp. 943962, Sept. 1995.
[2] R. Andonov, S. Balev, S. Rajopadhye, and N. Yanev, Optimal SemiOblique Tiling Proc. 13th ACM Symp. Parallel Algorithms and Architectures, pp. 153164, July 2001.
[3] R. Andonov, H. Bourzoufi, and S. Rajopadhye, TwoDimensional Orthogonal Tiling: From Theory to Practice Proc. Int'l Conf. High Performance Computing, pp. 225231, Dec. 1996.
[4] R. Andonov, P.Y. Calland, S. Niar, S. Rajopadhye, and N. Yanev, First Steps Towards Optimal Oblique Tiling of TwoDimensional Iterations Proc. Workshop Compilers for Parallel Computers, Jan. 2000.
[5] R. Andonov and S. Rajopadhye, Optimal Orthogonal Tiling of 2D Iterations J. Parallel and Distributed Computing, vol. 45, pp. 159165, Sept. 1997.
[6] R. Andonov, S. Rajopadhye, and N. Yanev, Optimal Orthogonal Tiling EuroPar'98 Parallel Processing, Lecture Notes in Computer Science, 1470, pp. 480490, 1998.
[7] P. Boulet, A. Darte, T. Risset, and Y. Robert, (Pen)Ultimate Tiling? Integration, the VLSI J., vol. 17, pp. 3351, 1994.
[8] P.Y. Calland and T. Risset, Precise Tiling for Uniform Loop Nests Application Specific Array Processors, P. Cappello, C. Mongenet, G.R. Perrin, P. Quinton, and Y. Robert, eds., pp. 330337, July 1995.
[9] S. Coleman and K.S. McKinley, Tile Size Selection Using Cache Organization and Data Layout Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation, June 1995.
[10] F. Desprez, J. Dongarra, F. Rastello, and Y. Robert, Determining the Idle Time of a Tiling: New Results J. Information Science and Eng., vol. 14, pp. 167190, 1998.
[11] J. Fickett, Fast Optimal Alignement Nucleic Acids Research, vol. 12, no. 1, pp. 175179, 1984.
[12] S. Hiranandani, K. Kennedy, and C.W. Tseng, Evaluating Compiler Optimizations for Fortran D J. Parallel and Distributed Computing, vol. 21, pp. 2745, 1994.
[13] E. Hodzic and W. Shang, On Supernode Transformation with Minimized Total Running Time IEEE Trans. Parallel and Distributed Systems, vol. 9, no. 5, pp. 417428, May 1998.
[14] K. Högstedt, Predicting Performance for Tiled Perfectly Nested Loops PhD thesis, Dept. of Computer Science and Eng., Univ. of California, San Diego, Dec. 1999.
[15] K. Högstedt, L. Carter, and J. Ferrante, Determining the Idle Time of a Tiling Principles of Programming Languages, Jan. 1997.
[16] K. Högstedt, L. Carter, and J. Ferrante, Selecting Tile Shape for Minimal Execution Time Proc. 11th ACM Symp. Parallel Algorithms and Architectures, pp. 201211, June 1999.
[17] K. Högstedt, L. Carter, and J. Ferrante, An Analysis of the Execution Time of Tiled Loops http://wwwcse.ucsd.edu/ferrantekarjour.ps , (journal submission), Mar. 2000.
[18] F. Irigoin and R. Triolet, Supernode Partitioning Proc. 15th ACM Symp. Principles of Programming Languages, pp. 319328, Jan. 1988.
[19] R.M. Karp, R.E. Miller, and S. Winograd, The Organization of Computations for Uniform Recurrence Equations J. ACM, vol. 14, no. 3, pp. 563590, July 1967.
[20] C.T. King, W.H. Chou, and L.M. Ni, "Pipelined DataParallel Algorithms: Part IConcept and Modeling," IEEE Trans. Parallel and Distributed Systems, vol. 1, no. 4, pp. 470485, Oct. 1990.
[21] C.T. King, W.H. Chou, and L.M. Ni, "Pipelined DataParallel Algorithms: Part IIDesign," IEEE Trans. Parallel and Distributed Systems, vol. 1, no. 4, pp. 486499, Oct. 1990.
[22] W.F. McColl, Scalable Computing Computer Science Today: Recent Trends and Developments, J. van Leeuwen, ed. Springer Verlag, vol. 1000, pp. 4661, 1995.
[23] D.I. Moldovan and J.A.B. Fortes, Partitioning and Mapping Algorithms into Fixed Size Systolic Arrays IEEE Trans. Computers, vol. 35, no. 1, pp. 112, Jan. 1986.
[24] J.J. Navarro, J.M. Llabería, and M. Valero, Computing SizeIndependent Matrix Problems on Systolic Array Processors Proc. Int'l Symp. Computer Architecture, no. 13, May 1986.
[25] H. Ohta, Y. Saito, M. Kainaga, and H. Ono, Optimal Tile Size Adjsutment in Compiling General DOACROSS Loop Nests Proc. Int'l Conf. Supercomputing, pp. 270279, July 1995.
[26] D. Palermo, E. Su, J. Chandy, and P. Banerjee, Communication Optimizations Used in the PARADIGM Compiler for Distributed Memory Multicomputers Proc. Int'l Conf. Parallel Processing, Aug. 1994.
[27] J. Ramanujam and P. Sadayappan, Tiling Multidimensional Iteration Spaces for Non SharedMemory Machines Supercomputing, pp. 111120, 1991.
[28] R. Schreiber and J. Dongarra, Automatic Blocking of Nested Loops Technical Report 90.38, RIACS, NASA Ames Research Center, Aug. 1990.
[29] J. Setubal and J. Meidanis, Introduction to Computational Molecular Biology. ITP, 1997.
[30] L.G. Valiant, A Bridging Model for Parallel Computation Comm. ACM, vol. 33, no. 8, pp. 103111, Aug. 1990.
[31] M.E. Wolf and M. Lam, A Data Locality Optimizing Algorithm Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation, June 1991.
[32] M. Wolfe, Iteration Space Tiling for Memory Hierarchies Parallel Processing for Scientific Computing, pp. 357361, 1987.
[33] D. Wonnacott, Time Skewing for Parallel Computers Technical Report TR388, Dept. of Computer Science, Rutgers Univ., June 1999.
[34] J. Xue, On Tiling as a Loop Transformation Parallel Processing Letters, vol. 7, no. 4, pp. 490424, 1997.