This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
On the Automatic Parallelization of the Perfect Benchmarks®
January 1998 (vol. 9 no. 1)
pp. 5-23

Abstract—This paper presents the results of the Cedar Hand-Parallelization Experiment, conducted from 1989 through 1992, within the Center for Supercomputing Research and Development (CSRD) at the University of Illinois. In this experiment, we manually transformed the Perfect Benchmarks® into parallel program versions. In doing so, we used techniques that may be automated in an optimizing compiler. We then ran these programs on the Cedar multiprocessor (built at CSRD during the 1980s) and measured the speed improvement due to each technique.

The results presented here extend the findings previously reported in [11]. The techniques credited most for the performance gains include array privatization, parallelization of reduction operations, and the substitution of generalized induction variables. All these techniques can be considered extensions of transformations that were available in vectorizers and commercial restructuring compilers of the late 1980s. We applied these transformations by hand to the given programs, in a mechanical manner, similar to that of a parallelizing compiler. Because of our success with these transformations, we believed that it would be possible to implement many of these techniques in a new parallelizing compiler. Such a compiler has been completed in the meantime and we show preliminary results.

[1] A.V. Aho, R. Sethi, and J.D. Ullman, Compilers, Principles, Techniques and Tools.New York: Addison-Wesley, 1985.
[2] W. Blume, R. Doallo, R. Eigenmann, J. Grout, J. Hoeflinger, T. Lawrence, J. Lee, D. Padua, Y. Paek, W. Pottenger, L. Rauchwerger, and P. Tu, "Advanced Program Restructuring for High-Performance Computers with Polaris," Technical Report 1473, Univ. of Illinois at Urbana-Champaign, Center for Supercomputing Research&Development, http://www.netlib.org/ncwn/pvmsystem.pshttp:/ /www.netlib.org/ncwn/pvmsystem.pshttp:/ /www.umiacs.umd.edu/research/EXPAR/ papers/3548.htmlhttp://www.netlib.org/ ncwn/pvm-pc-evol.pshttp://polaris.cs.uiuc.edu tech_reports.html, Jan. 1996.
[3] W. Blume and R. Eigenmann, “Performance Analysis of Parallelizing Compilers on the Perfect Benchmark Programs,” IEEE Trans. Parallel and Distributed Systems, vol. 3, pp. 643–656, Nov. 1992.
[4] W. Blume and R. Eigenmann, "The Range Test: A Dependence Test for Symbolic, Non-Linear Expressions," Proc. Supercomputing '94, pp. 528-537,Washington D.C., Nov. 1994.
[5] W. Blume and R. Eigenmann, "An Overview of Symbolic Analysis Techniques Needed for the Effective Parallelization of the Perfect Benchmarks," Proc. 1994 Int'l Conf. Parallel Processing, vol. II, pp. 233-238, Aug. 1994.
[6] W. Blume and R. Eigenmann, "Symbolic Range Propagation," Proc. Ninth Int'l Parallel Processing Symp., pp. 357-363, Apr. 1995.
[7] W. Blume, R. Eigenmann, J. Hoeflinger, D. Padua, P. Petersen, L. Rauchwerger, and P. Tu, "Automatic Detection of Parallelism: A Grand Challenge for High-Performance Computing," IEEE Parallel and Distributed Technology, vol. 2, no. 3, pp. 37-47, Fall 1994.
[8] U. Banerjee, R. Eigenmann, A. Nicolau, and D.A. Padua, "Automatic Program Parallelization," Proc. IEEE, vol. 81, Feb. 1993.
[9] R. Eigenmann and S. Hassanzadeh, "Benchmarking with Real Industrial Applications: The SPEC High-Performance Group," IEEE Computational Science&Eng., vol. 3, no. 1, Spring 1996, pp. 18-23.
[10] R. Eigenmann, J. Hoeflinger, G. Jaxon, Z. Li, and D. Padua, "Restructuring Fortran Programs for Cedar," Concurrency: Practice and Experience, vol. 5, no. 7, pp. 553-573, Oct. 1993.
[11] R. Eigenmann, J. Hoeflinger, Z. Li, and D. Padua, “Experience in the Automatic Parallelization of Four Perfect-Benchmark Programs,” Proc. Fourth Workshop Languages and Compilers for Parallel Computing, Aug. 1991.
[12] R. Eigenmann, "Toward a Methodology of Optimizing Programs for High-Performance Computers," Proc. Int'l Conf. Supercomputing 1993, pp. 27-36,Tokyo, July20-22, 1993.
[13] R. Eigenmann and P. McClaughry, "Practical Tools for Optimizing Parallel Programs," Proc. 1993 Simulation Multiconference High-Performance Computing Symp.,Arlington, Va., Mar. 27- Apr.1, 1993.
[14] R. Eigenmann, I. Park, and M.J. Voss, "Are Parallel Workstations the Right Target for Parallelizing Compilers?" Proc. Ninth Workshop Languages and Compilers for Parallel Computing, Aug. 1996.
[15] High Performance Fortran Forum, "High Performance Fortran Language Specification, Version 1.0," technical report, Rice Univ., Houston, Texas, May 1993.
[16] K. Gallivan, W. Jalby, S. Turner, A. Veidenbaum, and H. Wijshoff, "Preliminary Basic Performance Analysis of the Cedar Multiprocessor Memory Systems," Proc. Int'l Conf. Parallel Processing 1991, vol. I, pp. 71-75,St. Charles, Ill., Aug.12-16, 1991.
[17] G. Goff, K. Kennedy, and C. Tseng, "Practical Dependence Testing," Proc. SIGPLAN '91 Conf. Programming Language Design and Implementation, pp. 15-29,Toronto, Canada, June 1991.
[18] M. Gupta et al., "An HPF Compiler for the IBM SP2," Proc. Supercomputing '95 (CD-ROM), IEEE Computer Society Press, Los Alamitos, Calif., 1995.
[19] J. Hoeflinger, "Coalescing Triangular Loops," Technical Report 1364, Univ. of Illinois at Urbana-Champaign, Center for Supercomputing Research&Development, Jan. 1992.
[20] J. Hoeflinger, "Run-Time Dependence Testing by Integer Sequence Analysis," Technical Report 1194, Univ. of Illinois at Urbana-Champaign, Center for Supercomputing Research&Development, Jan. 1992.
[21] M. Haghighat and C. Polychronopoulos, "Symbolic Dependence Analysis for High-Performance Parallelizing Compilers," Parallel and Distributed Computing: Advances in Languages and Compilers for Parallel Processing, pp. 310-330.Cambridge, Mass.: MIT Press, 1991.
[22] M. Haghighat and C. Polychronopoulos, "Symbolic Analysis: A Basis for Parallelization, Optimization, and Scheduling of Programs," Proc. Sixth Ann. Workshop Languages and Compilers for Parallel Computing,Portland, Ore., Aug. 1993.
[23] C.A. Huson, "An In-Line Subroutine Expander for Parafrase," master's thesis, Univ. of Illinois at Urbana-Champaign, Dept. of Computer Science, Dec. 1982.
[24] D. Kuck, P. Budnik, S.-C. Chen, E. Davis Jr., J. Han, P. Kraska, D. Lawrie, Y. Muraoka, R. Strebendt, and R. Towle, "Measurements of Parallelism in Ordinary FORTRAN Programs," Computer, vol. 7, no. 1, pp. 37-46, Jan., 1974.
[25] D. Kuck, E. Davidson, D. Lawrie, A. Sameh, C.-Q. Zhu, A. Veidenbaum, J. Konicek, P. Yew, K. Gallivan, W. Jalby, H. Wijshoff, R. Bramley, U.M. Yang, P. Emrath, D. Padua, R. Eigenmann, J. Hoeflinger, G. Jaxon, Z. Li, T. Murphy, J. Andrews, and S. Turner, "The Cedar System and an Initial Performance Study," Proc. 20th Int'l Symp. Computer Architecture,San Diego, Calif., May 1993.
[26] Z. Li, “Array Privatization for Parallel Execution of Loops,” Proc. ACM Int'l. Conf. Supercomputing, pp. 313-322, July 1992.
[27] D.E. Maydan, S.P. Amarasinghe, and M.S. Lam, "Data Dependence and Data-Flow Analysis of Arrays," Proc. Fourth Workshop Programming Languages and Compilers for Parallel Computing, Aug. 1992.
[28] D. Maydan, J. Hennessy, and M. Lam,“Efficient and exact data dependence analysis,”inProc. ACM SIGPLAN 91' Conf. Progr. Lang. Des., Implement., Toronto, Canada, June 1991, pp. 1–14.
[29] B. Pottenger and R. Eigenmann, "Idiom Recognition in the Polaris Parallelizing Compiler," Proc. Ninth Int'l Conf. Supercomputing, ACM Press, New York, 1995, pp. 444-448.
[30] L. Pointer, "Perfect: Performance Evaluation for Cost-Effective Transformations Report 2," Technical Report 964, Univ. of Illinois at Urbana-Champaign, Center for Supercomputing Research&Development, Mar. 1990.
[31] Y. Paek and D. Padua, "Automatic Parallelization for Noncoherent Cache Multiprocessors," Proc. Ninth Workshop Languages and Compilers for Parallel Computers, Aug. 1996.
[32] W. Pugh, “The Omega Test: A Fast and Practical Integer Programming Algorithm for Dependence Analysis,” Comm. ACM, vol. 8, pp. 102–114, Aug. 1992.
[33] L. Rauchwerger, N. Amato, and D. Padua, “Run-Time Methods for Parallelizing Partially Parallel Loops,” Proc. Supercomputing 1995, pp. 137-145, 1995.
[34] L. Rauchwerger and D. Padua, “Parallelizing While Loops for Multiprocessor Systems,” Proc. Ninth Int'l Parallel Processing Symp., pp. 347-356, Apr. 1995.
[35] J.P. Singh and J.L. Hennessy, "An Empirical Investigation of the Effectiveness and Limitations of Automatic Parallelization," Proc. Int'l Symp. Shared Memory Multiprocessing,Tokyo, Apr. 1991.
[36] P. Tu and D. Padua, “Automatic Array Privatization,” Proc. Sixth Int'l Workshop Languages and Compilers for Parallel Computing, pp. 500–521, Aug. 1993.
[37] M. Wolfe,“Beyond induction variables,” Proc. SIGPLAN Conf. Programming Language Design and Implementation, pp. 162-174,San Francisco, Calif., June 1992.

Index Terms:
Program parallelization, parallelization techniques, restructuring compilers, performance evaluation.
Citation:
Rudolf Eigenmann, Jay Hoeflinger, David Padua, "On the Automatic Parallelization of the Perfect Benchmarks®," IEEE Transactions on Parallel and Distributed Systems, vol. 9, no. 1, pp. 5-23, Jan. 1998, doi:10.1109/71.655238
Usage of this product signifies your acceptance of the Terms of Use.