This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Compile-Time Partitioning of Iterative Parallel Loops to Reduce Cache Coherency Traffic
July 1991 (vol. 2 no. 3)
pp. 318-328

Adaptive data partitioning (ADP) which reduces the execution time of parallel programs by reducing interprocessor communication for iterative parallel loops is discussed. It is shown that ADP can be integrated into a communication-reducing back end for existing parallelizing compilers or as part of a machine-specific partitioner for parallel programs. A multiprocessor model to analyze program execution factors that lead to interprocessor communication and a model for the iterative parallel loop to quantify communication patterns within a program are defined. A vector notation is chosen to quantify communication across a global data set. Communication parameters are computed by examining the indexes of array accesses and are adjusted to reflect the underlying system architecture by compensating for cache line sizes. These values are used to generate rectangular and hexagonal partitions that reduce interprocessor communication.

[1] D. A. Padua, "Multiprocessors: Discussion of theoretical and practical problems," Ph.D. dissertation, Univ. of Illinois at Urbana-Champaign, Rep. UIUCDCS-R-79-990, Nov. 1979.
[2] J. Peir and R. Cytron, "Minimum distance: A method for partitioning recurrences for multiprocessors,"IEEE Trans. Comput., vol. 38, pp. 1203-1211, Aug. 1988.
[3] W. Shang and J. A. B. Fortes, "Independent partitioning of algorithms with uniform dependencies," inProc. Int. Conf. Parallel Processing, 1987, pp. 26-33.
[4] E.H. D'Hollander, "Partitioning and labeling of index sets in do loops with constant dependence vectors," inProc. Int. Conf. Parallel Processing, 1989, pp. 139-144.
[5] M. Wolfe, "Iteration space tiling for memory hierarchies," inProc. Third SIAM Conf. Parallel Processing, Los Angeles, CA, Dec. 1-4, 1987.
[6] M. Wolfe, "More iteration space tiling," inProc. Supercomputing '89, 1989, pp. 655-664.
[7] J. Ramanujam and P. Sadayappan, "Tiling of iteration spaces for multicomputers," inProc. Int. Conf. Parallel Processing, 1990, pp. 23-36.
[8] C. King and L. M. Ni, "Grouping in nested loops for parallel execution on multicomputers," inProc. Int. Conf. Parallel Processing, 1989, pp. 31-38.
[9] F. Irigoin and R. Triolet, "Supernode partitioning," inProc. Fifteenth Annu. ACM. SIGACT-SIGPLAN Symp. Principles Programming Languages, Jan. 1988, pp. 319-329.
[10] J.L. Hennessy and David A. Patterson,Computer Architecture: A Quantitative Approach, Morgan Kaufmann, San Mateo, Calif., 1990.
[11] K. Gallivan, W. Jalby, and D. Gannon, "On the problem of optimizing data transfers for complex memory systems," inProc. 1988 ACM Int. Conf. Supercomput., St. Malo France, July 1988, pp. 238,253.
[12] K. Gallivan, W. Jalby, U. Meier, and A. H. Sameh, "Impact of hierarchical memory systems on linear algebra algorithm design,"Int. J. Supercomput. Appl., vol. 2, no. 1, pp. 12-48, Spring 1988.
[13] G. C. Fox and S. W. Otto, "Algorithms for concurrent processors,"Physics Today, vol. 37, pp. 50-59, May 1984.
[14] D. Vrsalovicet al., "The influence of parallel decomposition strategies on the performance of multiprocessor systems," inProc. 1985 Comput. Architecture Symp.
[15] J. H. Saltz, V. K. Naik, and D. M. Nicol, "Reduction of the effects of the communication delays in scientific algorithms on message passing MIMD architectures," Tech. Rep. 86-4, NASA Langley Research Center, 1987.
[16] D. A. Reed, L. M. Adams, and M. L. Patrick, "Stencils and problem partitionings: Their influence on the performance of multiple processor systems,"IEEE Trans. Comput., vol. C-36, pp. 845-858, July 1987.
[17] S. J. Eggers and R. H. Katz, "The effect of sharing on the cache and bus performance of parallel programs," inProc. 3rd Int. Conf. Architectural Support Programming Languages Oper. Syst., Boston, MA, Apr. 1989, pp. 257-270.
[18] Stone, H. S. 1987.High-Performance Computer Architecture. Reading, Mass., Addison-Wesley.
[19] B. G. Schunk, "Image flow segmentation and estimation by constraint line clustering,"IEEE Trans. Pattern Anal. Machine Intell., vol. 11, pp. 1010-1027, Oct. 1989.
[20] C. D. Polychronopoulos, "On program restructuring, scheduling, an communication for parallel processor systems," Ph.D. dissertation, CSRD 595, Center of Supercomput. Res. Develop., University of Illinois, Aug. 1986.
[21] S. G. Abraham, "Reducing interprocessor communication in parallel architectures: System configuration and task assignment," Ph.D. dissertation, Univ. Illinosis at Urbana-Champaign, CSRD TR 726, Jan. 1988.
[22] D. E. Hudak and S. G. Abraham, "Multidimension extensions to adaptive data partitioning," Tech. Rep. CSE-TR-85-91, The University of Michigan, 1991.

Index Terms:
Index Termsadaptive data partitioning; iterative parallel loops; cache coherency traffic; parallelprograms; interprocessor communication; ADP; communication-reducing back end;parallelizing compilers; machine-specific partitioner; multiprocessor model; programexecution factors; communication patterns; vector notation; global data set; arrayaccesses; underlying system architecture; cache line sizes; hexagonal partitions;interprocessor communication; buffer storage; parallel machines; parallel programming;program compilers
Citation:
S.G. Abraham, D.E. Hudak, "Compile-Time Partitioning of Iterative Parallel Loops to Reduce Cache Coherency Traffic," IEEE Transactions on Parallel and Distributed Systems, vol. 2, no. 3, pp. 318-328, July 1991, doi:10.1109/71.86107
Usage of this product signifies your acceptance of the Terms of Use.