Automatic Partitioning of Parallel Loops and Data Arrays for Distributed Shared-Memory Multiprocessors
September 1995 (vol. 6 no. 9)
pp. 943-962
DOI Bookmark:
http://doi.ieeecomputersociety.org/10.1109/71.466632
[1] C.D. Polychronopoulos and D.J. Kuck, “Guided Self-Scheduling: A Practical Scheduling Scheme for Parallel Supercomputers,” IEEE Trans. Computers, vol. 36, no. 12, pp. 1425-1439, Dec. 1987.[2] E. Mohr,D. Kranz,, and R. Halstead,“Lazy task creation: A technique for increasing the granularity of parallelprograms,” IEEE Trans. Parallel and Distributed Systems, vol. 2, no. 3, pp. 264-280, July 1991.[3] M. Wolf and M. Lam, “A Data Locality Optimizing Algorithm,” Proc. SIGPLAN Conf. Programming Language Design and Implementation, pp. 30-44, June 1991.[4] D. Gannon, W. Jalby, and K. Gallivan, "Strategies for Cache and Local Memory Management by Global Program Transformations," J. Parallel and Distributed Computing, vol. 5, no. 5, pp. 587-616, Oct. 1988.[5] H.S. Stone and D. Thiebaut,“Footprints in the cache,” Proc. ACM SIGMETRICS 1986, pp. 4-8, May 1986.[6] F. Irigoin and R. Triolet, “Supernode Partitioning,” Proc. 15th ACM Symp. Principles of Programming Languages, pp. 319-329, Jan. 1988.[7] S. Abraham and D. Hudak, "Compile-Time Partitioning of Iterative Parallel Loops to Reduce Cache Coherence Traffic," IEEE Trans. Parallel and Distributed Systems, vol. 2, no. 3, July 1991.[8] J. Ramanujam and P. Sadayappan, “Compile-Time Techniques for Data Distribution in Distributed Memory Machines,” IEEE Trans. Parallel and Distributed Systems, vol. 2, no. 4, pp. 472-482, Oct. 1991.[9] J. Anderson and M. Lam, "Global Optimizations for Parallelism and Locality on Scalable Parallel Machines," Proc. SIGPLAN Conf. Programming Language Design and Implementation, pp. 112-125,Albuquerque, N.M., June 1993.[10] M. Gupta and P. Banerjee, “Demonstration of Automatic Data Partitioning Techniques for Parallelizing Compilers on Multicomputers,” IEEE Trans. Parallel and Distributed Systems, vol. 3, no. 2, pp. 179-193, Mar. 1992.[11] R. Schreiber and J. Dongarra,“Automatic blocking of nested loops,” Technical report, RIACS, NASA Ames Research Center and Oak Ridge Nat’l Laboratory, May 1990.[12] J. Ferrante, V. Sarkar, and W. Thrash, “On Estimating and Enhancing Cache Effectiveness,” Proc. Fourth Int'l Workshop Languages and Compilers for Parallel Computing, pp. 328-343, Aug. 1991.[13] J. Ramanujam and P. Sadayappan,“Tiling multidimensional iteration spaces for nonshared memorymachines,” Proc. Supercomputing’91, IEEE CS Press, 1991.[14] G.N.S. Prasanna, A. Agarwal, and B.R. Musicus, "Hierarchical Compilation of Macro Dataflow Graphs for Multiprocessors with Local Memory," IEEE Trans. Parallel and Distributed Systems, vol. 5, no. 7, pp. 720-736, July 1994.[15] R. Barua,D. Kranz,, and A. Agarwal,“Global partitioning of parallel loops and data arrays for caches and distributed memory inmultiprocessors,” Technical Memo MIT-LCS TM-538, Massachusetts Institute of Tech nology, 1995.[16] A. Agarwal,J.V. Guttag,C.N. Hadjicostis,, and M.C. Papaefthymiou,“Memory assignment for multiprocessor caches through grey coloring,” PARLE’94 Parallel Architectures and Languages Europe, pp. 351-362, Springer Verlag Lecture Notes in Computer Science 817, July 1994.[17] M. Lam, E. Rothberg, and M. Wolf, “The Cache Performance and Optimizations of Blocked Algorithms,” Proc. Fourth Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS '91), 1991.[18] A. Carnevali,V. Natarajan,, and A. Agarwal,“A relationship between the number of lattice points within hyperparallelepipes and theirvolume,” Motorola Cambridge Research Center, in preparation, Aug. 1993.[19] G. Strang,Linear Algebra and Its Applications, third edition. San Diego, Calif.: Harcourt Brace Jova novich, 1988.[20] A. Schrijver, Theory of Linear and Integer Programming. John Wiley, 1986.[21] G. Arfken,Mathematical Methods for Physics. Academic Press, 1985.[22] A. Agarwal,R. Bianchini,D. Chaiken,K. Johnson,D. Kranz,J. Kubiatowicz,B.-H. Lim,K. Mackenzie,, and D. Yeung,“The MIT Alewife machine: Architecture and performance,” Proc. 22nd Ann. Int’l Symp. Computer Architecture (ISCA’95), June 1995.[23] P.S. Barth, R.S. Nikhil, and Arvind, "M-Structures: Extending a Parallel, Non-Strict, Functional Language with State," Proc. Fifth Conf. Functional Programming Languages and Computer Architecture, pp. 538-568, Aug. 1991.[24] B.J. Smith,“Architecture and applications of the HEP multiprocessor computersystem,” Society Photo-Optical Instrumentation Engineers, vol. 298, pp. 241-248, 1981.
Index Terms:
Automatic loop partitioning, shared-memory multiprocessors, compilers, tiling, minimizing communication.
Citation:
Anant Agarwal, David A. Kranz, Venkat Natarajan, "Automatic Partitioning of Parallel Loops and Data Arrays for Distributed Shared-Memory Multiprocessors," IEEE Transactions on Parallel and Distributed Systems, vol. 6, no. 9, pp. 943-962, Sept. 1995, doi:10.1109/71.466632
Usage of this product signifies your acceptance of the
Terms of Use.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||