2003 International Conference on Parallel Processing (ICPP'03)
Code Tiling for Improving the Cache Performance of PDE Solvers
Kaohsiung, Taiwan
October 06-October 09
ISBN: 0-7695-2017-0
For SOR-like PDE solvers, loop tiling either helps little in improving data locality or hurts their performance. This paper presents a novel compiler technique called code tiling for generating fast tiled codes for these solvers on uniprocessors with a memory hierarchy. Code tiling combines loop tiling with a new array layout transformation called data tiling in such a way that a significant amount of cache misses that would otherwise be present in tiled codes are eliminated. Compared to nine existing loop tiling algorithms, our technique delivers impressive performance speedups (faster by factors of 1.55 - 2.62) and smooth performance curves across a range of problem sizes on representative machine architectures. The synergy of loop tiling and data tiling allows us to find a problem-size-independent tile size that minimises a cache miss objective function independently of the problem size parameters. This "one-size-fits-all" scheme makes our approach attractive for designing fast SOR solvers without having to generate a multitude of versions specialised for different problem sizes.
Citation:
Qingguang Huang, Jingling Xue, Xavier Vera, "Code Tiling for Improving the Cache Performance of PDE Solvers," icpp, pp.615, 2003 International Conference on Parallel Processing (ICPP'03), 2003