This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Using Elementary Linear Algebra to Solve Data Alignment for Arrays with Linear or Quadratic References
January 2004 (vol. 15 no. 1)
pp. 28-39

Abstract—Data alignment that facilitates data locality so that the data access communication costs can be minimized, helps distributed memory parallel machines improve their throughput. Most data alignment methods are devised mainly to align the arrays referenced using linear subscripts or quadratic subscripts with few (one or two) loop index variables. In this paper, we propose two communication-free alignment techniques to align the arrays referenced using linear subscripts or quadratic subscripts with multiple loop index variables. The experimental results from our techniques on Vector Loop and TRFD of the Perfect Benchmarks reveal that our techniques can improve the execution times of the subroutines in these benchmarks.

[1] J. Edmonds, Systems of Distinct Representative and Linear Algebra J. Research of Nat'l Bureau of Standards, Section B, vol. 71, no. 4, pp. 241-245, 1967.
[2] D.G. Luenberger, Linear and Nonlinear Programming. Addison-Wesley Publishing Company, 1984.
[3] J. Ramanujam and P. Sadayappan, “Compile-Time Techniques for Data Distribution in Distributed Memory Machines,” IEEE Trans. Parallel and Distributed Systems, vol. 2, no. 4, pp. 472-482, Oct. 1991.
[4] D. Levine, D. Callahan, and J. Dongarra, A Comparative Study of Automatic Vectorizing Compilers Parallel Computing, vol. 17, pp. 1223-1244, 1991.
[5] J. Dongarra, M. Furtney, S. Reinhardt, and J. Russell, Parallel Loops A Test Suite for Parallelizing Compilers: Description and Example Results Parallel Computing, vol. 17, pp. 1247-1255, 1991.
[6] P. Feautrier, Toward Automatic Partitioning of Arrays on Distributed Memory Computers Proc. ACM Int'l Conf. Supercomputing, pp. 175-184, 1993.
[7] D. Bau, I. Kodukula, V. Kotlyar, K. Pingali, and P. Stodghill, Solving Alignment Using Elementary Linear Algebra Proc. Conf. Record Seventh Workshop Languages and Compilers for Parallel Computing, pp. 46-60, Aug. 1994.
[8] M. Wolfe, High Performance Compilers for Parallel Computing. Redwood City: Addison-Wesley Publishing Company, 1996.
[9] P.M. Petersen and D.A. Padua, Static and Dynamic Evaluation of Data Dependence Analysis Techniques IEEE Trans. Parallel and Distributed Systems, vol. 7, no. 11, pp. 1121-1132, Nov. 1996.
[10] M. Dion and Y. Robert, Mapping Affine Loop Nests: New Results Parallel Computing, vol. 22, no. 10, pp. 1373-1397, Dec. 1996.
[11] P. Lee, “Efficient Algorithms for Data Distribution on Distributed Memory Parallel Computers,” IEEE Trans. Parallel and Distributed Systems, vol. 8, no. 8, pp. 825-839, 1997.
[12] R. Eigenmann, J. Hoeflinger, and D. Padua, "On the Automatic Parallelization of the Perfect Benchmarks," IEEE Trans. Parallel and Distributed Systems, vol. 9, no. 1, pp. 5-23, Jan. 1998.
[13] Y.-C. Chung, C.-H. Hsu, and S.-W. Bai, “A Basic-Cycle Calculation Technique for Efficient Dynamic Data Redistribution,” IEEE Trans. Parallel and Distributed Systems, vol. 9, no. 4, Apr. 1998.
[14] A.W. Lam and M.S. Lam, Maximizing Parallelism and Minimizing Synchronization with Affine Partitions Parallel Computing, vol. 24, nos. 3-4, pp. 445-475, May 1998.
[15] M. Kandemir, A. Choudhary, N. Shenoy, P. Banerjee, and J. Ramanujam, A Hyperplane Based Approach for Optimizing Spatial Locality in Loop Nests Proc. 12th ACM Int'l Conf. Supercomputing, pp. 69-76, July 1998.
[16] M. Kandemir, J. Ramanujam, A. Choudhary, and P. Banerjee, A Loop Transformation Algorithm Based on Explicit DADA Layout Representation for Optimizing Locality Proc. 11th Int'l Workshop Languages and Compilers for Parallel Computing, pp. 34-50, Aug. 1998.
[17] C.-P. Chu, W.-L. Chang, I. Chen, and P.-S. Chen, Communication-Free Alignment for Array References with Linear Subscripts in Two Loop Index Variables or Quadratic Subscripts Proc. Second IASTED Int'l Conf. Parallel and Distributed Computing and Networks, pp. 571-576, 1998.
[18] V. Boudet, F. Rastello, and Y. Robert, Alignment and Distribution is NOT (Always) NP-Hard Proc. Int'l Conf. Parallel and Distributed Systems, vol. 5, no. 9, pp. 648-657, Dec. 1998.
[19] C.-J. Liao and Y.-C. Chung, “Tree-Based Parallel Load-Balancing Methods for Solution Adaptive Finite Element Graphs on Distributed Memory Multicomputers,” IEEE Trans. Parallel and Distributed Systems, vol. 10, no. 4, Apr. 1999.
[20] G.-H. Hwang and J.K. Lee, An Expression-Rewriting Framework to Generate Communication Sets for HPF Programs with Block-Cyclic Distribution Parallel Computing, vol. 25, pp. 1105-1139, 1999.
[21] A.W. Lam, G.I. Cheong, and M.S. Lam, An Affine Partitioning Algorithm to Maximize Parallelism and Minimize Communication Proc. 13th ACM Int'l Conf. Supercomputing, pp. 228-237, June 1999.
[22] K.-P. Shih, J.-P. Sheu, and C.-H. Huang, Statement-Level Communication-Free Partitioning Techniques for Parallelizing Compilers J. Supercomputing, pp. 243-269, vol. 15, no. 3, Feb. 2000.
[23] C.-H. Hsu, S.-W. Bai, Y.-C. Chung, and C.-S. Yang, A Generalized Basic-Cycle Calculation Method for Array Redistribution IEEE Trans. Parallel and Distributed Systems, vol. 11, no. 12, pp. 1201-1216, Dec. 2000.
[24] W.-L. Chang, C.-P. Chu, and J.-H. Wu, Communication-Free Alignment for Array References with Linear Subscripts in Three Loop Index Variables or Quadratic Subscripts J. Supercomputing, vol. 20, no. 1, pp. 67-83, Aug. 2001.

Index Terms:
Parallel compiler, communication-free alignment, parallel computing, loop optimization, data dependence analysis, load balancing.
Citation:
Weng-Long Chang, Jih-Woei Huang, Chih-Ping Chu, "Using Elementary Linear Algebra to Solve Data Alignment for Arrays with Linear or Quadratic References," IEEE Transactions on Parallel and Distributed Systems, vol. 15, no. 1, pp. 28-39, Jan. 2004, doi:10.1109/TPDS.2004.1264783
Usage of this product signifies your acceptance of the Terms of Use.