|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
| ASCII Text | x | ||
| Yonghong Song, Rong Xu, Cheng Wang, Zhiyuan Li, "Improving Data Locality by Array Contraction," IEEE Transactions on Computers, vol. 53, no. 9, pp. 1073-1084, September, 2004. | |||
| BibTex | x | ||
| @article{ 10.1109/TC.2004.62, author = {Yonghong Song and Rong Xu and Cheng Wang and Zhiyuan Li}, title = {Improving Data Locality by Array Contraction}, journal ={IEEE Transactions on Computers}, volume = {53}, number = {9}, issn = {0018-9340}, year = {2004}, pages = {1073-1084}, doi = {http://doi.ieeecomputersociety.org/10.1109/TC.2004.62}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - JOUR JO - IEEE Transactions on Computers TI - Improving Data Locality by Array Contraction IS - 9 SN - 0018-9340 SP1073 EP1084 EPD - 1073-1084 A1 - Yonghong Song, A1 - Rong Xu, A1 - Cheng Wang, A1 - Zhiyuan Li, PY - 2004 KW - Compiler KW - memory KW - optimization KW - performance KW - array contraction KW - data locality KW - loop shifting KW - optimizing compilers. VL - 53 JA - IEEE Transactions on Computers ER - | |||
[1] R. Ahuja, T. Magnanti, and J. Orlin, Network Flows: Theory, Algorithms, and Applications. Englewood, N.J.: Prentice Hall, 1993.
[2] V. Allan, R. Jones, R. Lee, and S. Allan, Software Pipelining ACM Computing Surveys, vol. 27, no. 3, pp. 367-432, Sept. 1995.
[3] D. Bacon, S. Graham, and O. Sharp, Compiler Transformations for High-Performance Computing ACM Computing Surveys, vol. 26, no. 4, pp. 345-420, Dec. 1994.
[4] D. Burger and T. Austin, The SimpleScalar Tool Set, Version 2.0 Technical Report TR-1342, Dept. of Computer Sciences, Univ. of Wisconsin, Madison, June 1997.
[5] B. Creusillet and F. Irigoin, Interprocedural Array Region Analyses Int'l J. Parallel Programming, vol. 24, no. 6, pp. 513-546, Dec. 1996.
[6] A. Darte, On the Complexity of Loop Fusion Proc. Int'l Conf/ Parallel Architecture and Compilation Techniques. pp. 149-157, Oct. 1999.
[7] C. Ding and K. Kennedy, Improving Effective Bandwidth through Compiler Enhancement of Global Cache Reuse Proc. Int'l Parallel and Distributed Processing Symp., 2001.
[8] P. Feautrier, Array Dataflow Analysis Proc. Compiler Optimizations for Scalable Parallel Systems Languages, 2001.
[9] P. Feautrier, Dataflow Analysis of Array and Scalar References Int'l J. Parallel Programming, vol. 20, no. 1, pp. 23-53, Jan. 1991.
[10] A. Fraboulet, G. Huard, and A. Mignotte, Loop Alignment for Memory Accesses Optimization Proc. 12th Int'l Symp. System Synthesis, Nov. 1999.
[11] G.R. Gao, R. Olsen, V. Sarkar, and R. Thekkath, Collective Loop Fusion for Array Contraction Proc. Fifth Workshop Languages and Compilers for Parallel Computing, pp. 281-295, 1992.
[12] T. Gross and P. Steenkiste, Structured Dataflow Analysis for Arrays and Its Use in an Optimizing Compiler Software-Practice and Experience, vol. 20, no. 2, Feb. 1990.
[13] J. Gu, Z. Li, and G. Lee, An Evaluation of the Potential Benefits of Register Allocation for Array References Proc. Workshop Interaction between Compilers and Computer Architectures, Feb. 1996.
[14] J. Gu, Z. Li, and G. Lee, Experience with Efficient Array Data Flow Analysis for Array Privatization Proc. Sixth ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, pp. 157-167, June 1997.
[15] J. Hennessy and D. Patterson, Computer Architecture: A Quantitative Approach. Morgan Kaufmann, 1996.
[16] M. Kandemir, A. Choudhary, J. Ramanujam, and P. Banerjee, A Matrix-Based Approach to Global Locality Optimization J. Parallel and Distributed Computing, vol. 58, no. 2, pp. 190-235, 1999.
[17] K. Kennedy, Fast Greedy Weighted Fusion Proc. 2000 Int'l Conf. Supercomputing, May 2000.
[18] K. Kennedy and K.S. McKinley, Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution Proc. Sixth Workshop Languages and Compilers for Parallel Computing, Aug. 1993.
[19] J. Laudon and D. Lenoski, “The SGI Origin: A CC-NUMA Highly Scalable Server,” Proc. 24th Ann. Int'l Symp. Computer Architecture (ISCA '97), May 1997.
[20] V. Lefebvre and P. Feautrier, Automatic Storage Management for Parallel Programs Parallel Computing, vol. 24, nos. 3-4, pp. 649-671, May 1998.
[21] A.W. Lim, S.-W. Liao, and M.S. Lam, Blocking and Array Contraction across Arbitrarily Nested Loops Using Affine Partitioning Proc. 2001 ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, pp. 103-112, June 2001.
[22] V. Loechner, B. Meister, and P. Clauss, Precise Data Locality Optimization of Nested Loops J. Supercomputing, vol. 21, no. 1, pp. 37-76, 2002.
[23] N. Manjikian and T.S. Abdelrahman, “Fusion of Loops for Parallelism and Locality,” IEEE Trans. Parallel and Distributed Systems vol. 8, no. 2, pp. 193-209, Feb. 1997.
[24] K.S. McKinley and O. Teman, Quantifying Loop Nest Locality Using SPEC'95 and the Perfect Benchmarks ACM Trans. Computer Systems, vol. 17, no. 4, Nov. 1999.
[25] A.G. Mohamed, G.C. Fox, G. von Laszewski, M. Parashar, T. Haupt, K. Mills, Y.-H. Lu, N.-T. Lin, and N.-K. Yeh, Applications Benchmark Set for Fortran-D and High Performance Fortran Technical Report CRPS-TR92260, Center for Research on Parallel Computation, Rice Univ., June 1992.
[26] T. Mowry, M.S. Lam, and A. Gupta, Design and Evaluation of a Compiler Algorithm for Prefetching Proc. Fifth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 62-73, Oct. 1992.
[27] J. Rice and J. Jing, Problems to Test Parallel and Vector Languages Technical Report CSD-TR-1016, Dept. of Computer Science, Purdue Univ., 1990.
[28] G. Rivera and C.-W. Tseng, Eliminating Conflict Misses for High Performance Architectures Proc. 1998 ACM Int'l Conf. Supercomputing, pp. 353-360, July 1998.
[29] G. Rivera and C.-W. Tseng, A Comparison of Compiler Tiling Algorithms Proc. Eighth Int'l Conf. Compiler Construction, Mar. 1999.
[30] V. Sarkar, Optimized Unrolling of Nested Loops Proc. ACM Int'l Conf. Supercomputing, pp. 153-166, May 2000.
[31] S.K. Singhai and K.S. McKinley, A Parameterized Loop Fusion Algorithm for Improving Parallelism and Cache Locality The Computer J., vol. 40, no. 6, 1997.
[32] Y. Song, R. Xu, C. Wang, and Z. Li, Performance Enhancement by Memory Reduction Technical Report CSD-TR-00-016, Dept. of Computer Science, Purdue Univ., 2000, http://www.cs.purdue. edu/homes/songyhacademic.html .
[33] Y. Song, R. Xu, C. Wang, and Z. Li, Data Locality Enhancement by Memory Reduction Proc. 15th ACM Int'l Conf. Supercomputing, June 2001.
[34] W. Tembe and S. Pande, Data I/O Minimization for Loops on Limited On-Chip Memory Processors IEEE Trans. Computers, vol. 51, no. 10, pp. 1269-1280, Oct. 2002.
[35] W. Thies, F. Vivien, J. Sheldon, and S. Amarasinghe, A Unified Framework for Schedule and Storage Optimization Proc. 2001 ACM SIGPLAN Conf. Programming Language Design and Implementation, pp. 232-242, June 2001.
[36] M. Wolfe, High Performance Compilers for Parallel Computing. Addison-Wesley, 1995.

