|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
| ASCII Text | x | ||
| C.Y. Lin, J.S. Liu, Y.C. Chung, "Efficient Representation Scheme for Multidimensional Array Operations," IEEE Transactions on Computers, vol. 51, no. 3, pp. 327-345, March, 2002. | |||
| BibTex | x | ||
| @article{ 10.1109/12.990130, author = {C.Y. Lin and J.S. Liu and Y.C. Chung}, title = {Efficient Representation Scheme for Multidimensional Array Operations}, journal ={IEEE Transactions on Computers}, volume = {51}, number = {3}, issn = {0018-9340}, year = {2002}, pages = {327-345}, doi = {http://doi.ieeecomputersociety.org/10.1109/12.990130}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - JOUR JO - IEEE Transactions on Computers TI - Efficient Representation Scheme for Multidimensional Array Operations IS - 3 SN - 0018-9340 SP327 EP345 EPD - 327-345 A1 - C.Y. Lin, A1 - J.S. Liu, A1 - Y.C. Chung, PY - 2002 KW - array operations KW - multidimensional arrays KW - data structure KW - extended Karnaugh map representation KW - traditional matrix representation VL - 51 JA - IEEE Transactions on Computers ER - | |||
Array operations are used in a large number of important scientific codes, such as molecular dynamics, finite element methods, climate modeling, etc. To implement these array operations efficiently, many methods have been proposed in the literature. However, the majority of these methods are focused on the two-dimensional arrays. When extended to higher dimensional arrays, these methods usually do not perform well. Hence, designing efficient algorithms for multidimensional array operations becomes an important issue. In this paper, we propose a new scheme,
[1] J.C. Adams, W.S. Brainerd, J.T. Martin, B.T. Smith, and J.L. Wagener, Fortran 90 Handbook. Intertext Publications/McGraw-Hill Inc. 1992.
[2] I. Banicescu and S.F. Hummel, “Balancing Processor Loads and Exploiting Data Locality in N-Body Simulations,” Proc. 1995 ACM/IEEE Supercomputing Conf., Dec. 1995.
[3] D. Callahan, S. Carr, and K. Kennedy, “Improving Register Allocation for Subscripted Variables,” Proc. ACM SIGPLAN 1990 Conf. Programming Language Design and Implementation, pp. 53-65, June 1990.
[4] S. Carr, K.S. McKinley, and C.-W. Tseng, “Compiler Optimizations for Improving Data Locality,” Proc. Sixth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 252-262, Oct. 1994.
[5] L. Carter, J. Ferrante, and S.F. Hummel, “Hierarchical Tiling for Improved Superscalar Performance,” Proc. Nineth Int'l Symp. Parallel Processing, pp. 239-245, Apr. 1995.
[6] S. Chatterjee, A.R. Lebeck, P.K. Patnala, and M. Thottethodi, “Recursive Array Layouts and Fast Parallel Matrix Multiplication,” Proc. Eleventh Ann. ACM Symp. Parallel Algorithms and Architectures, pp. 222-231, June 1999.
[7] S. Chatterjee, V.V. Jain, A.R. Lebeck, S. Mundhra, and M. Thottethodi, “Nonlinear Array Layouts for Hierarchical Memory Systems,” Proc. 1999 ACM Int'l Conf. Supercomputing, pp. 444-453, June 1999.
[8] M. Cierniak and W. Li, “Unifying Data and Control Transformations for Distributed Shared Memory Machines,” Technical Report TR-542, Dept. of Computer Science, Univ. of Rochester, Nov. 1994.
[9] S. Coleman and K. McKinley, “Tile Size Selection Using Cache Organization and Data Layout,” Proc. SIGPLAN Conf. Programming Language Design and Implementation, June 1995.
[10] J.K. Cullum and R.A. Willoughby, Algorithms for Large Symmetric Eignenvalue Computations, vol. 1.Boston, Mass.: Birkhauser, 1985.
[11] B.B. Fraguela, R. Doallo, and E.L. Zapata, “Cache Misses Prediction for High Performance Sparse Algorithms,” Proc. Fourth Int'l Euro-Par Conf. (Euro-Par '98), pp. 224-233, Sept. 1998.
[12] B.B. Fraguela, R. Doallo, and E.L. Zapata, “Cache Probabilistic Modeling for Basic Sparse Algebra Kernels Involving Matrices with a Non-Uniform Distribution,” Proc. 24th IEEE Euromicro Conf., pp. 345-348, Aug. 1998.
[13] B.B. Fraguela, R. Doallo, and E.L. Zapata, “Modeling Set Associative Caches Behaviour for Irregular Computations,” ACM Int'l Conf. Measurement and Modeling of Computer Systems (SIGMETRICS '98), pp. 192-201, June 1998.
[14] B.B. Fraguela, R. Doallo, and E.L. Zapata, “Automatic Analytical Modeling for the Estimation of Cache Misses,” Proc. Int'l Conf. Parallel Architectures and Compilation Techniques (PACT '99), Oct. 1999.
[15] J.D. Frens and D.S. Wise, “Auto-Blocking Matrix-Multiplication or Tracking BLAS3 Performance from Source Code,” Proc. Sixth ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, June 1997.
[16] G.H. Golub and C.F. Van Loan, Matrix Computations, Second ed. Baltimore, Md: Johns Hopkins Univ. Press, 1989.
[17] M. Kandemir, J. Ramanujam, and A. Choudhary, “Improving Cache Locality by a Combination of Loop and Data Transformations,” IEEE Trans. Computers, vol. 48, no. 2, pp. 159-167, Feb. 1999. A preliminary version appears in Proc. 11th ACM Int'l Conf. Supercomputing (ICS '97), pp. 269-276, July 1997.
[18] M. Kandemir, J. Ramanujam, and A. Choudhary, “A Compiler Algorithm for Optimizing Locality in Loop Nests,” Proc. 1997 ACM Int'l Conf. Supercomputing, pp. 269-276, July 1997.
[19] C.W. Kebler and C.H. Smith, “The SPARAMAT Approach to Automatic Comprehension of Sparse Matrix Computations,” Proc. Seventh Int'l Workshop Program Comprehension, pp. 200-207, 1999.
[20] V. Kotlyar, K. Pingali, and P. Stodghill, “Compiling Parallel Sparse Code for User-Defined Data Structures,” Proc. Eighth SIAM Conf. Parallel Processing for Scientific Computing, Mar. 1997.
[21] V. Kotlyar, K. Pingali, and P. Stodghill, “A Relation Approach to the Compilation of Sparse Matrix Programs,” Euro Par, Aug. 1997.
[22] V. Kotlyar, K. Pingali, and P. Stodghill, “Compiling Parallel Code for Sparse Matrix Applications,” Proc. Supercomputing Conf., Aug. 1997.
[23] B. Kumar, C.-H. Huang, R.W. Johnson, and P. Sadayappan, “A Tensor Product Formulation of Strassen's Matrix Multiplication Algorithm with Memory Reduction,” Proc. Seventh Int'l Parallel Processing Symp., pp. 582-588, Apr. 1993.
[24] M. Lam, E. Rothberg, and M. Wolf, “The Cache Performance and Optimizations of Blocked Algorithms,” Proc. Fourth Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS '91), 1991.
[25] W. Li and K. Pingali, “A Singular Loop Transformation Framework Based on Non-Singular Matrices,” Proc. Fifth Workshop Languages and Compilers for Parallel Computers, pp. 249-260, 1992.
[26] J.-S. Liu, J.-Y. Lin, and Y.-C. Chung, “Efficient Representation for Multi-Dimensional Matrix Operations,” Proc. Workshop Compiler Techniques for High Performance Computing (CTHPC), pp. 133-142, Mar. 2000.
[27] J.-S. Liu, J.-Y. Lin, and Y.-C. Chung, “Efficient Parallel Algorithms for Multi-Dimensional Matrix Operations,” Proc. IEEE Int'l Symp. Parallel Architectures, Algorithms and Networks (I-SPAN), pp.224-229, Dec. 2000.
[28] K. McKinley, S. Carr, and C.W. Tseng, “Improving Data Locality with Loop Transformations,” ACM Trans. Programming Languages and Systems, vol. 18, no. 4, pp. 424-453, July 1996.
[29] M. O'Boyle and P. Knijnenburg, “Integrating Loop and Data Transformations for Global Optimisation,” Proc. Int'l Conf. Parallel Architectures and Compilation Techniques (PACT '98), Oct. 1998.
[30] W.H. Press, S.A. Teukolsky, W.T. Vetterling, and B.P. Flannery, Numerical Recipes in Fortran 90: The Art of Parallel Scientific Computing. Cambridge Univ. Press, 1996.
[31] P.D. Sulatycke and K. Ghose, “Caching Efficient Multithreaded Fast Multiplication of Sparse Matrices,” Proc. First Merged Int'l Parallel Processing Symp. and Symp. Parallel and Distributed Processing, pp. 117-123, 1998.
[32] M. Thottethodi, S. Chatterjee, and A.R. Lebeck, “Turing Strassen's Matrix Multiplication for Memory Efficiency,” Proc. ACM/IEEE SC98 Conf. High Performance Networking and Computing, Nov. 1998.
[33] M. Ujaldon, E.L. Zapata, S.D. Sharma, and J. Saltz, “Parallelization Techniques for Sparse Matrix Applications,” J. Parallel and Distribution Computing, 1996.
[34] M. Wolf and M. Lam, “A Data Locality Optimizing Algorithm,” Proc. SIGPLAN Conf. Programming Language Design and Implementation, pp. 30-44, June 1991.
[35] L.H. Ziantz, C.C. Ozturan, and B.K. Szymanski, “Run-Time Optimization of Sparse Matrix-Vector Multiplication on SIMD Machines,” Proc. Int'l Conf. Parallel Architectures and Languages, pp. 313-322, July 1994.

