W.K. Kaplow, Dept. of Comput. Sci., Rensselaer Polytech. Inst., Troy, NY, USA
W.A. Maniatty, Dept. of Comput. Sci., Rensselaer Polytech. Inst., Troy, NY, USA
B.K. Szymanski, Dept. of Comput. Sci., Rensselaer Polytech. Inst., Troy, NY, USA
Presents a method for determining the cache performance of the loop nests in a program. The cache-miss data are produced by simulating the loop nest execution on an architecturally parameterized cache simulator. We show that the cache-miss rates are highly non-linear with respect to the ranges of the loops, and correlate well with the performance of the loop nests on actual target machines. The cache-miss ratio is used to guide program optimizations such as loop interchange and iteration-space blocking. It can also be used to provide an estimate for the runtime of a program. Both applications are important in scheduling programs for parallel execution. We present examples of program optimization for several popular processors, such as the IBM 9076 SP1, the SuperSPARC and the Intel i860.
Index Terms:
parallel programming; optimisation; scheduling; cache storage; program control structures; processor scheduling; software performance evaluation; memory architecture; memory hierarchy; program partitioning; parallel program scheduling; cache performance; nonlinear cache-miss rates; loop nest execution simulation; architecturally parameterized cache simulator; loop range; cache-miss ratio; program optimization; loop interchange; iteration-space blocking; program runtime estimation; IBM 9076 SP1; SuperSPARC; Intel i860
Citation:
W.K. Kaplow, W.A. Maniatty, B.K. Szymanski, "Impact of memory hierarchy on program partitioning and scheduling," hicss, pp.93, 28th Hawaii International Conference on System Sciences (HICSS'95), 1995