This Article 
 Bibliographic References 
 Add to: 
Compiler Optimizations for Enhancing Parallelism and Their Impact on Architecture Design
August 1988 (vol. 37 no. 8)
pp. 991-1004
By examining the structure and characteristics of parallel programs the author isolates potential overhead sources. The first compiler optimization considered is cycle shrinking which can be used to parallelize certain types of serial loops. A run-time dependence analysis is then considered along with how it can be performed through compiler-inserted bookkeeping and control statements. Loops wi

[1] A. V. Aho, R. Sethi, and J. D. Ullman,Compilers: Principles, Techniques, and Tools. Reading, MA: Addison-Wesley, 1986.
[2] F. E. Allen and J. Cocke, "A catalogue of optimizing transformations," inDesign and Optimization of Compilers, R. Rustin, Ed. Englewood Cliffs, NJ: Prentice-Hall, 1972, pp. 1-30.
[3] J. R. Allen and K. Kennedy, "PFC: A program to convert Fortran to parallel form," Tech. Rep. MASC-TR82-6, Rice University, Houston, TX, Mar. 1982.
[4] R. Allen and K. Kennedy, "Automatic translation of FORTRAN to vector form,"ACM Trans. Programming Languages Syst., vol. 9, no. 4, pp. 491-524, 1987.
[5] Alliant Computer Systems Corp.,FX/Series Architecture Manual, Acton, MA, 1985.
[6] American National Standards Institute,American National Standard for Information Systems Programming Language Fortran S8 (X3.9-198x), Revision of X3.9-1978, Draft S8, Version 99, ANSI, New York, Apr. 1986.
[7] Arvind and R. S. Nikhil, "Executing a program on the MIT tagged-token dataflow architecture," inProc. Parallel Architectures and Languages in Europe (PARLE), Springer-Verlag LNCS no. 259, June 1987, pp. 1-29.
[8] U. Banerjee, "Speedup of ordinary programs," Ph.D. dissertation, Univ. Illinois at Urbana-Champaign, DCS Rep. UIUCDCS-R-79-989, Oct. 1979.
[9] B. Brode, "Precompilation of Fortran programs to facilitate array processing,"Computer, vol. 14, pp. 46-51, Sept. 1981.
[10] S. Chen, "Large-scale and high-speed multiprocessor system for scientific applications--Cray-X-MP-2 Series," inProc. NATO Advanced Res. Workshop High Speed Computing, Kawalik, Ed., June 1983, pp. 59-67.
[11] R. G. Cytron, "Doacross: Beyond vectorization for multiprocessors (extended abstract)," inProc. 1986 Int. Conf. Parallel Processing, St. Charles, IL, Aug. 1986, pp. 836-844.
[12] A. Gottlieb, R. Grishman, C. P. Kruskal, K. P. McAuliffe, L. Rudolph, and M. Snir, "The NYU Ultracomputer-Designing an MIMD shared-memory parallel machine,"IEEE Trans. Comput., vol. C-32, Feb. 1983.
[13] K. Kennedy, "Automatic vectorization of Fortran programs to vector form," Tech. Rep., Rice Univ., Houston, TX, Oct. 1980.
[14] D. J. Kuck, R. H. Kuhn, B. Leasure, and M. Wolfe, "The structure of an advanced vectorizer for pipelined processors," inProc. Fourth Int. Comput. Software Appl. Conf., Oct. 1980.
[15] D. J. Kuck, R.H. Kuhn, B. Leasure, D.A. Padua, and M. Wolfe, "Compiler transformation of dependence graphs," inConf. Rec. 8th ACM Symp. Principles Program. Languages, Williamsburg, VA, Jan. 1981.
[16] D. J. Kuck, E. S. Davidson, D. H. Lawrie, and A. H. Sameh, "Parallel supercomputing today and the cedar approach,"Science, vol. 231, pp. 967-974, Feb. 28, 1986.
[17] D. J. Kuck,The Structure of Computers and Computations, vol. 1. New York: Wiley, 1978.
[18] G. Lee, C. Kruskal, and D. J. Kuck, "The effectiveness of combining in shared memory parallel computers in the presence of 'hot spot'," inProc. 1986 Int. Conf. Parallel Processing, St. Charles, IL, Aug. 1986.
[19] P. Mehrotra and J. Van Rosendale, "The Blaze language: A parallel language for scientific programming," Rep. 85-29, Instit. Comput. Appl. Sci. Eng., NASA Langley Research Center, Hampton, VA, May 1985.
[20] K. Miura and K. Uchida, "Facom vector processor VP-100/VP-200," inHigh Speed Computation, NATO ASI Series, Vol. F7, J. S. Kowalik Ed., New York: Springer-Verlag, 1984.
[21] A. Nicolau, "Parallelism, memory anti-aliasing and correctness for trace scheduling compilers," Ph.D. dissertation, Yale Univ., June 1984.
[22] S. Nagashima, Y. Inagami, T. Odaka, and S. Kawabe, "Design consideration for a high-speed vector processor: The Hitachi S-810," inProc. IEEE Int. Conf. Comput. Design: VLSI Comput., ICCD 84, IEEE, 1984.
[23] D. A. Padua Haiek, D. J. Kuck, and D. H. Lawrie, "High-speed multiprocessors and compilation techniques,"IEEE Trans. Comput., vol. C-29, Sept. 1980.
[24] D. A. Padua and M. J. Wolfe, "Advanced compiler optimizations for supercomputers,"Common. ACM, vol. 29, no. 12, pp. 1184- 1201, Dec. 1986.
[25] G. F. Pfister and V. A. Norton, "'Hot spot' contention and combining in multistage interconnection networks," inProc. 1985 Int. Conf. Parallel Processing, St. Charles, IL, August. 1985.
[26] C. Polychronopoulos and D. Kuck, "Guided self-scheduling: A practical scheduling scheme for parallel supercomputers,"IEEE Tran. Comput., 1987.
[27] C. D. Polychronopoulos, "On program restructuring, scheduling, an communication for parallel processor systems," Ph.D. dissertation, CSRD 595, Center of Supercomput. Res. Develop., University of Illinois, Aug. 1986.
[28] C. D. Polychronopoulos and C. Beckmann, "Compiler and hardware issues for fast synchronization in parallel computers," Tech. Rep., Center for Supercomput. Res. Develop., University of Illinois, 1988, in preparation.
[29] C. P. Polychronopoulos, "Advanced loop optimizations for parallel computers," inProc. 1987 Int. Conf. Supercomput., June 8-12, Athens, Greece, Springer-Verlag LNCS.
[30] R. M. Tomasulo, "An efficient algorithm for exploiting multiple arithmetic units,"IBM J. Res. Develop., vol. 11, Jan. 1967.
[31] A. K. Uht, C. D. Polychronopoulos, and J. F. Kolen, "On the combination of hardware and software concurrency extraction methods," inProc. Twentieth Annu. Workshop Microprogramming (MICRO-20), ACM, Dec. 1987, pp. 133-141.
[32] A. K. Uht, Hardware extraction of low-level concurrency from sequential instruction streams," Ph.D. dissertation, Carnegie-Mellon University, Pittsburgh, PA, Dec. 1985. Available from University Microfilms International, Ann Arbor, MI.
[33] M. J. Wolfe, "Optimizing supercompilers for supercomputers," Ph.D. thesis, Ctr. Supercomput. Res. and Development, Univ. Illinois, Urbana-Champaign, 1980.
[34] C. Q. Zhu and P. C. Yew, "A synchronization scheme and its applications for large multiprocessor systems," inProc. 1984 Int. Conf. Distributed Comput. Syst., May 1984, pp. 486-493.

Index Terms:
parallel programs; cycle shrinking; run-time dependence analysis; compiler-inserted bookkeeping; barrier synchronization; run-time overhead; distributed barriers; shared registers; optimisation; parallel programming; program compilers.
C.D. Polychronopoulos, "Compiler Optimizations for Enhancing Parallelism and Their Impact on Architecture Design," IEEE Transactions on Computers, vol. 37, no. 8, pp. 991-1004, Aug. 1988, doi:10.1109/12.2249
Usage of this product signifies your acceptance of the Terms of Use.