Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques (2001)
Sept. 8, 2001 to Sept. 12, 2001
Xianglong Huang , University of Massachusetts, Amherst
Zhenlin Wang , University of Massachusetts, Amherst
Kathryn S. McKinley , University of Massachusetts, Amherst
Abstract: The Impulse memory controller provides an interface for remapping irregular or sparse memory accesses into dense accesses in the cache memory. This capability significantly increases processor cache and system bus utilization, and previous work shows performance improvements from a factor of 1.2 to 5 with current technology models for hand-coded kernels in a cycle-level simulator. To attain widespread use of any specialized hardware feature requires automating its use in a compiler. In this paper, we present compiler cost models using dependence and locality analysis that determine when to use Impulse to improve performance based on the reduction in misses, the additional cost for misses in Impulse, and the fixed cost for setting up a remapping. We implement the cost models and generate the appropriate Impulse system calls in the Scale compiler framework. Our results demonstrate that our cost models correctly choose when and when not to use Impulse. We also combine and compare Impulse with our implementation of loop permutation for improving locality. If loop permutation can achieve the same dense access pattern as Impulse, we prefer it, since it has no overheads, but we show that the combination can yield better performance.
Xianglong Huang, Zhenlin Wang, Kathryn S. McKinley, "Compiling for the Impulse Memory Controller", Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, vol. 00, no. , pp. 0141, 2001, doi:10.1109/PACT.2001.953295