This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Efficient Representation Scheme for Multidimensional Array Operations
March 2002 (vol. 51 no. 3)
pp. 327-345

Array operations are used in a large number of important scientific codes, such as molecular dynamics, finite element methods, climate modeling, etc. To implement these array operations efficiently, many methods have been proposed in the literature. However, the majority of these methods are focused on the two-dimensional arrays. When extended to higher dimensional arrays, these methods usually do not perform well. Hence, designing efficient algorithms for multidimensional array operations becomes an important issue. In this paper, we propose a new scheme, extended Karnaugh map representation (EKMR), for the multidimensional array representation. The main idea of the EKMR scheme is to represent a multidimensional array by a set of two-dimensional arrays. Hence, efficient algorithm design for multidimensional array operations becomes less complicated. To evaluate the proposed scheme, we design efficient algorithms for multidimensional array operations, matrix-matrix addition/subtraction and matrix-matrix multiplications, based on the EKMR and the traditional matrix representation (TMR) schemes. Both theoretical analysis and experimental test for these array operations were conducted. Since Fortran 90 provides a rich set of intrinsic functions for multidimensional array operations, in the experimental test, we also compare the performance of intrinsic functions provided by the Fortran 90 compiler and those based on the EKMR scheme. The experimental results show that the algorithms based on the EKMR scheme outperform those based on the TMR scheme and those provided by the Fortran 90 compiler.

[1] J.C. Adams, W.S. Brainerd, J.T. Martin, B.T. Smith, and J.L. Wagener, Fortran 90 Handbook. Intertext Publications/McGraw-Hill Inc. 1992.
[2] I. Banicescu and S.F. Hummel, “Balancing Processor Loads and Exploiting Data Locality in N-Body Simulations,” Proc. 1995 ACM/IEEE Supercomputing Conf., Dec. 1995.
[3] D. Callahan, S. Carr, and K. Kennedy, “Improving Register Allocation for Subscripted Variables,” Proc. ACM SIGPLAN 1990 Conf. Programming Language Design and Implementation, pp. 53-65, June 1990.
[4] S. Carr, K.S. McKinley, and C.-W. Tseng, “Compiler Optimizations for Improving Data Locality,” Proc. Sixth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 252-262, Oct. 1994.
[5] L. Carter, J. Ferrante, and S.F. Hummel, “Hierarchical Tiling for Improved Superscalar Performance,” Proc. Nineth Int'l Symp. Parallel Processing, pp. 239-245, Apr. 1995.
[6] S. Chatterjee, A.R. Lebeck, P.K. Patnala, and M. Thottethodi, “Recursive Array Layouts and Fast Parallel Matrix Multiplication,” Proc. Eleventh Ann. ACM Symp. Parallel Algorithms and Architectures, pp. 222-231, June 1999.
[7] S. Chatterjee, V.V. Jain, A.R. Lebeck, S. Mundhra, and M. Thottethodi, “Nonlinear Array Layouts for Hierarchical Memory Systems,” Proc. 1999 ACM Int'l Conf. Supercomputing, pp. 444-453, June 1999.
[8] M. Cierniak and W. Li, “Unifying Data and Control Transformations for Distributed Shared Memory Machines,” Technical Report TR-542, Dept. of Computer Science, Univ. of Rochester, Nov. 1994.
[9] S. Coleman and K. McKinley, “Tile Size Selection Using Cache Organization and Data Layout,” Proc. SIGPLAN Conf. Programming Language Design and Implementation, June 1995.
[10] J.K. Cullum and R.A. Willoughby, Algorithms for Large Symmetric Eignenvalue Computations, vol. 1.Boston, Mass.: Birkhauser, 1985.
[11] B.B. Fraguela, R. Doallo, and E.L. Zapata, “Cache Misses Prediction for High Performance Sparse Algorithms,” Proc. Fourth Int'l Euro-Par Conf. (Euro-Par '98), pp. 224-233, Sept. 1998.
[12] B.B. Fraguela, R. Doallo, and E.L. Zapata, “Cache Probabilistic Modeling for Basic Sparse Algebra Kernels Involving Matrices with a Non-Uniform Distribution,” Proc. 24th IEEE Euromicro Conf., pp. 345-348, Aug. 1998.
[13] B.B. Fraguela, R. Doallo, and E.L. Zapata, “Modeling Set Associative Caches Behaviour for Irregular Computations,” ACM Int'l Conf. Measurement and Modeling of Computer Systems (SIGMETRICS '98), pp. 192-201, June 1998.
[14] B.B. Fraguela, R. Doallo, and E.L. Zapata, “Automatic Analytical Modeling for the Estimation of Cache Misses,” Proc. Int'l Conf. Parallel Architectures and Compilation Techniques (PACT '99), Oct. 1999.
[15] J.D. Frens and D.S. Wise, “Auto-Blocking Matrix-Multiplication or Tracking BLAS3 Performance from Source Code,” Proc. Sixth ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, June 1997.
[16] G.H. Golub and C.F. Van Loan, Matrix Computations, Second ed. Baltimore, Md: Johns Hopkins Univ. Press, 1989.
[17] M. Kandemir, J. Ramanujam, and A. Choudhary, “Improving Cache Locality by a Combination of Loop and Data Transformations,” IEEE Trans. Computers, vol. 48, no. 2, pp. 159-167, Feb. 1999. A preliminary version appears in Proc. 11th ACM Int'l Conf. Supercomputing (ICS '97), pp. 269-276, July 1997.
[18] M. Kandemir, J. Ramanujam, and A. Choudhary, “A Compiler Algorithm for Optimizing Locality in Loop Nests,” Proc. 1997 ACM Int'l Conf. Supercomputing, pp. 269-276, July 1997.
[19] C.W. Kebler and C.H. Smith, “The SPARAMAT Approach to Automatic Comprehension of Sparse Matrix Computations,” Proc. Seventh Int'l Workshop Program Comprehension, pp. 200-207, 1999.
[20] V. Kotlyar, K. Pingali, and P. Stodghill, “Compiling Parallel Sparse Code for User-Defined Data Structures,” Proc. Eighth SIAM Conf. Parallel Processing for Scientific Computing, Mar. 1997.
[21] V. Kotlyar, K. Pingali, and P. Stodghill, “A Relation Approach to the Compilation of Sparse Matrix Programs,” Euro Par, Aug. 1997.
[22] V. Kotlyar, K. Pingali, and P. Stodghill, “Compiling Parallel Code for Sparse Matrix Applications,” Proc. Supercomputing Conf., Aug. 1997.
[23] B. Kumar, C.-H. Huang, R.W. Johnson, and P. Sadayappan, “A Tensor Product Formulation of Strassen's Matrix Multiplication Algorithm with Memory Reduction,” Proc. Seventh Int'l Parallel Processing Symp., pp. 582-588, Apr. 1993.
[24] M. Lam, E. Rothberg, and M. Wolf, “The Cache Performance and Optimizations of Blocked Algorithms,” Proc. Fourth Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS '91), 1991.
[25] W. Li and K. Pingali, “A Singular Loop Transformation Framework Based on Non-Singular Matrices,” Proc. Fifth Workshop Languages and Compilers for Parallel Computers, pp. 249-260, 1992.
[26] J.-S. Liu, J.-Y. Lin, and Y.-C. Chung, “Efficient Representation for Multi-Dimensional Matrix Operations,” Proc. Workshop Compiler Techniques for High Performance Computing (CTHPC), pp. 133-142, Mar. 2000.
[27] J.-S. Liu, J.-Y. Lin, and Y.-C. Chung, “Efficient Parallel Algorithms for Multi-Dimensional Matrix Operations,” Proc. IEEE Int'l Symp. Parallel Architectures, Algorithms and Networks (I-SPAN), pp.224-229, Dec. 2000.
[28] K. McKinley, S. Carr, and C.W. Tseng, “Improving Data Locality with Loop Transformations,” ACM Trans. Programming Languages and Systems, vol. 18, no. 4, pp. 424-453, July 1996.
[29] M. O'Boyle and P. Knijnenburg, “Integrating Loop and Data Transformations for Global Optimisation,” Proc. Int'l Conf. Parallel Architectures and Compilation Techniques (PACT '98), Oct. 1998.
[30] W.H. Press, S.A. Teukolsky, W.T. Vetterling, and B.P. Flannery, Numerical Recipes in Fortran 90: The Art of Parallel Scientific Computing. Cambridge Univ. Press, 1996.
[31] P.D. Sulatycke and K. Ghose, “Caching Efficient Multithreaded Fast Multiplication of Sparse Matrices,” Proc. First Merged Int'l Parallel Processing Symp. and Symp. Parallel and Distributed Processing, pp. 117-123, 1998.
[32] M. Thottethodi, S. Chatterjee, and A.R. Lebeck, “Turing Strassen's Matrix Multiplication for Memory Efficiency,” Proc. ACM/IEEE SC98 Conf. High Performance Networking and Computing, Nov. 1998.
[33] M. Ujaldon, E.L. Zapata, S.D. Sharma, and J. Saltz, “Parallelization Techniques for Sparse Matrix Applications,” J. Parallel and Distribution Computing, 1996.
[34] M. Wolf and M. Lam, “A Data Locality Optimizing Algorithm,” Proc. SIGPLAN Conf. Programming Language Design and Implementation, pp. 30-44, June 1991.
[35] L.H. Ziantz, C.C. Ozturan, and B.K. Szymanski, “Run-Time Optimization of Sparse Matrix-Vector Multiplication on SIMD Machines,” Proc. Int'l Conf. Parallel Architectures and Languages, pp. 313-322, July 1994.

Index Terms:
array operations, multidimensional arrays, data structure, extended Karnaugh map representation, traditional matrix representation
Citation:
C.Y. Lin, J.S. Liu, Y.C. Chung, "Efficient Representation Scheme for Multidimensional Array Operations," IEEE Transactions on Computers, vol. 51, no. 3, pp. 327-345, March 2002, doi:10.1109/12.990130
Usage of this product signifies your acceptance of the Terms of Use.