
This Article  
 
Share  
Bibliographic References  
Add to:  
Digg Furl Spurl Blink Simpy Del.icio.us Y!MyWeb  
Search  
 
ASCII Text  x  
WengLong Chang, JihWoei Huang, ChihPing Chu, "Using Elementary Linear Algebra to Solve Data Alignment for Arrays with Linear or Quadratic References," IEEE Transactions on Parallel and Distributed Systems, vol. 15, no. 1, pp. 2839, January, 2004.  
BibTex  x  
@article{ 10.1109/TPDS.2004.1264783, author = {WengLong Chang and JihWoei Huang and ChihPing Chu}, title = {Using Elementary Linear Algebra to Solve Data Alignment for Arrays with Linear or Quadratic References}, journal ={IEEE Transactions on Parallel and Distributed Systems}, volume = {15}, number = {1}, issn = {10459219}, year = {2004}, pages = {2839}, doi = {http://doi.ieeecomputersociety.org/10.1109/TPDS.2004.1264783}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }  
RefWorks Procite/RefMan/Endnote  x  
TY  JOUR JO  IEEE Transactions on Parallel and Distributed Systems TI  Using Elementary Linear Algebra to Solve Data Alignment for Arrays with Linear or Quadratic References IS  1 SN  10459219 SP28 EP39 EPD  2839 A1  WengLong Chang, A1  JihWoei Huang, A1  ChihPing Chu, PY  2004 KW  Parallel compiler KW  communicationfree alignment KW  parallel computing KW  loop optimization KW  data dependence analysis KW  load balancing. VL  15 JA  IEEE Transactions on Parallel and Distributed Systems ER   
Abstract—Data alignment that facilitates data locality so that the data access communication costs can be minimized, helps distributed memory parallel machines improve their throughput. Most data alignment methods are devised mainly to align the arrays referenced using linear subscripts or quadratic subscripts with few (one or two) loop index variables. In this paper, we propose two communicationfree alignment techniques to align the arrays referenced using linear subscripts or quadratic subscripts with multiple loop index variables. The experimental results from our techniques on Vector Loop and TRFD of the Perfect Benchmarks reveal that our techniques can improve the execution times of the subroutines in these benchmarks.
[1] J. Edmonds, Systems of Distinct Representative and Linear Algebra J. Research of Nat'l Bureau of Standards, Section B, vol. 71, no. 4, pp. 241245, 1967.
[2] D.G. Luenberger, Linear and Nonlinear Programming. AddisonWesley Publishing Company, 1984.
[3] J. Ramanujam and P. Sadayappan, “CompileTime Techniques for Data Distribution in Distributed Memory Machines,” IEEE Trans. Parallel and Distributed Systems, vol. 2, no. 4, pp. 472482, Oct. 1991.
[4] D. Levine, D. Callahan, and J. Dongarra, A Comparative Study of Automatic Vectorizing Compilers Parallel Computing, vol. 17, pp. 12231244, 1991.
[5] J. Dongarra, M. Furtney, S. Reinhardt, and J. Russell, Parallel Loops A Test Suite for Parallelizing Compilers: Description and Example Results Parallel Computing, vol. 17, pp. 12471255, 1991.
[6] P. Feautrier, Toward Automatic Partitioning of Arrays on Distributed Memory Computers Proc. ACM Int'l Conf. Supercomputing, pp. 175184, 1993.
[7] D. Bau, I. Kodukula, V. Kotlyar, K. Pingali, and P. Stodghill, Solving Alignment Using Elementary Linear Algebra Proc. Conf. Record Seventh Workshop Languages and Compilers for Parallel Computing, pp. 4660, Aug. 1994.
[8] M. Wolfe, High Performance Compilers for Parallel Computing. Redwood City: AddisonWesley Publishing Company, 1996.
[9] P.M. Petersen and D.A. Padua, Static and Dynamic Evaluation of Data Dependence Analysis Techniques IEEE Trans. Parallel and Distributed Systems, vol. 7, no. 11, pp. 11211132, Nov. 1996.
[10] M. Dion and Y. Robert, Mapping Affine Loop Nests: New Results Parallel Computing, vol. 22, no. 10, pp. 13731397, Dec. 1996.
[11] P. Lee, “Efficient Algorithms for Data Distribution on Distributed Memory Parallel Computers,” IEEE Trans. Parallel and Distributed Systems, vol. 8, no. 8, pp. 825839, 1997.
[12] R. Eigenmann, J. Hoeflinger, and D. Padua, "On the Automatic Parallelization of the Perfect Benchmarks," IEEE Trans. Parallel and Distributed Systems, vol. 9, no. 1, pp. 523, Jan. 1998.
[13] Y.C. Chung, C.H. Hsu, and S.W. Bai, “A BasicCycle Calculation Technique for Efficient Dynamic Data Redistribution,” IEEE Trans. Parallel and Distributed Systems, vol. 9, no. 4, Apr. 1998.
[14] A.W. Lam and M.S. Lam, Maximizing Parallelism and Minimizing Synchronization with Affine Partitions Parallel Computing, vol. 24, nos. 34, pp. 445475, May 1998.
[15] M. Kandemir, A. Choudhary, N. Shenoy, P. Banerjee, and J. Ramanujam, A Hyperplane Based Approach for Optimizing Spatial Locality in Loop Nests Proc. 12th ACM Int'l Conf. Supercomputing, pp. 6976, July 1998.
[16] M. Kandemir, J. Ramanujam, A. Choudhary, and P. Banerjee, A Loop Transformation Algorithm Based on Explicit DADA Layout Representation for Optimizing Locality Proc. 11th Int'l Workshop Languages and Compilers for Parallel Computing, pp. 3450, Aug. 1998.
[17] C.P. Chu, W.L. Chang, I. Chen, and P.S. Chen, CommunicationFree Alignment for Array References with Linear Subscripts in Two Loop Index Variables or Quadratic Subscripts Proc. Second IASTED Int'l Conf. Parallel and Distributed Computing and Networks, pp. 571576, 1998.
[18] V. Boudet, F. Rastello, and Y. Robert, Alignment and Distribution is NOT (Always) NPHard Proc. Int'l Conf. Parallel and Distributed Systems, vol. 5, no. 9, pp. 648657, Dec. 1998.
[19] C.J. Liao and Y.C. Chung, “TreeBased Parallel LoadBalancing Methods for Solution Adaptive Finite Element Graphs on Distributed Memory Multicomputers,” IEEE Trans. Parallel and Distributed Systems, vol. 10, no. 4, Apr. 1999.
[20] G.H. Hwang and J.K. Lee, An ExpressionRewriting Framework to Generate Communication Sets for HPF Programs with BlockCyclic Distribution Parallel Computing, vol. 25, pp. 11051139, 1999.
[21] A.W. Lam, G.I. Cheong, and M.S. Lam, An Affine Partitioning Algorithm to Maximize Parallelism and Minimize Communication Proc. 13th ACM Int'l Conf. Supercomputing, pp. 228237, June 1999.
[22] K.P. Shih, J.P. Sheu, and C.H. Huang, StatementLevel CommunicationFree Partitioning Techniques for Parallelizing Compilers J. Supercomputing, pp. 243269, vol. 15, no. 3, Feb. 2000.
[23] C.H. Hsu, S.W. Bai, Y.C. Chung, and C.S. Yang, A Generalized BasicCycle Calculation Method for Array Redistribution IEEE Trans. Parallel and Distributed Systems, vol. 11, no. 12, pp. 12011216, Dec. 2000.
[24] W.L. Chang, C.P. Chu, and J.H. Wu, CommunicationFree Alignment for Array References with Linear Subscripts in Three Loop Index Variables or Quadratic Subscripts J. Supercomputing, vol. 20, no. 1, pp. 6783, Aug. 2001.