This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
An Iteration Partition Approach for Cache or Local Memory Thrashing on Parallel Processing
May 1993 (vol. 42 no. 5)
pp. 529-546

Parallel processing systems with cache or local memory in the memory hierarchies are considered. These systems have a local cache memory in each processor and usually employ a write-invalidate protocol for the cache coherence. In such systems, a problem called 'cache or local memory thrashing' can arise in executions of parallel programs, when the data unnecessarily moves back and forth between the caches or local memories in different processors. An approach to eliminate, or at least to reduce, such movement for nested parallel loops is presented. It is based on relations between array element accesses and enclosed loop indexes in the loops. The relations can be used to assign processors to execute the appropriate iterations for parallel loops in the loop nests with respect to the data in their caches or local memories. An algorithm for calculating the correct iteration of the parallel loop in terms of loop indexes of the previous iterations executed in the processor is presented. This method benefits parallel code with nested loop structures in a wide range of applications. The experimental results show that the technique can achieve speedups up to 2.

[1] D. J. Kuck, R. H. Kuhn, B. Leasure, and M. Wolfe, "The structure of an advanced vectorizer for pipeline processor," inProc. IEEE Comput. Soc. Fourth Int. Comput. Software and Appl. Conf., Oct. 1980.
[2] J. R. Allen and K. Kennedy, "PFC: A program to convert Fortran to parallel form," Rep. MASC-TR82-6, Rice Univ. Mar. 1982.
[3] D. A. Padua and M. J. Wolfe, "Advanced compiler optimizations for supercomputers,"Common. ACM, vol. 29, no. 12, pp. 1184- 1201, Dec. 1986.
[4] W. Abu-Sufah, D. Kuck, and D. Lawrie, "On the performance enhancement of paging systems through program analysis and transformations,"IEEE Trans. Comput., vol. C-30, no. 5, May 1981.
[5] M. Wolfe, "Iteration space tiling for memory hierarchies," inProc. Third SIAM Conf. Parallel Processing, Los Angeles, CA, Dec. 1-4, 1987.
[6] D. Callahan, S. Carr, and K. Kennedy, "Improving register allocation for subscripted variables," inProc. ACM SIGPLAN'90 Conf. Programming Language Design and Implementation, White Plains, NY, June 20-22, 1990.
[7] Baer, J.L., and W.-H. Wang, "Multilevel Cache Hierarchies: Organizations, Protocols and Performance,"J. Parallel and Distributed Computing, Vol. 6, 1989, pp. 451-476.
[8] D. Kucket al., "Parallel supercomputing today and Cedar approach,"Science, pp. 967-974, Feb. 1986.
[9] B. Leasureet al., "PCF Fortran: Language definition by the parallel computing forum," inProc. Int. Conf. Parallel Processing, Aug. 1988.
[10] M. Burke and R. Cytron, "Interprocedural dependence analysis and parallelization," inProc. SIG-PLAN '86 Symp. Comp. Construct., Palo Alto, CA, June 1986, pp. 162-175.
[11] D. Calahan, "A global approach to detection of parallelism," Ph.D. dissertation, Comput. Science Department, Rice Univ., Houston, TX, Feb. 1987.
[12] J. Dongarra, D. Sorensen, and O. Brewer, "Tools and methodology for programming parallel processors," inAspects of Computation on Asynchronous Parallel Processors, IFIP, 1989, pp. 125-137.
[13] Z. Fang and M. Lu, "A solution of cache thrashing problem in RISC based parallel processing systems," inProc. Int. Conf. Parallel Processing 1991, St. Chalse, to be published.
[14] Z. Fang, C. Yew, T. Tang, and C. Zhu, "Dynamic processor self-scheduling for general parallel nested Loops,"IEEE. Trans. Comput., vol. 39, no. 7, pp. 919-929, July 1990.

Index Terms:
iteration partition approach; memory hierarchies; local cache memory; write-invalidate protocol; cache coherence; local memory thrashing; parallel programs; nested parallel loops; array element accesses; enclosed loop indexes; parallel loops; loop nests; local memories; correct iteration; parallel code; nested loop structures; iterative methods; memory architecture; parallel programming; storage management.
Citation:
J.Z. Fang, M. Lu, "An Iteration Partition Approach for Cache or Local Memory Thrashing on Parallel Processing," IEEE Transactions on Computers, vol. 42, no. 5, pp. 529-546, May 1993, doi:10.1109/12.223672
Usage of this product signifies your acceptance of the Terms of Use.