Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques (1996)
Oct. 20, 1996 to Oct. 23, 1996
Steve Carr , Michigan Technological University
Current architectural trends in instruction-level parallelism (ILP) have significantly increased the computational power of microprocessors. As a result, the demands on the memory system have increased dramatically. Not only do compilers need to be concerned with finding ILP to utilize machine resources effectively, but they also need to be concerned with ensuring that the resulting code has a high degree of cache locality. Previous work has concentrated either on improving ILP in nested loops or on improving cache performance. This paper presents a performance metric that can be used to guide the optimization of nested loops considering the combined effects of ILP, data reuse and latency hiding techniques. Preliminary experiments reveal that dramatic performance improvements for nested loops are obtainable (we regularly get at least a factor of 2 on kernels run on two different architectures).
S. Carr, "Combining Optimization for Cache and Instruction-Level Parallelism," Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques(PACT), Boston, MA, 1996, pp. 0238.