This Article 
 Bibliographic References 
 Add to: 
Symbolic Performance Modeling of Parallel Systems
February 2003 (vol. 14 no. 2)
pp. 154-165
Arjan J.C. van Gemund, IEEE Computer Society

Abstract—Performance prediction is an important engineering tool that provides valuable feedback on design choices in program synthesis and machine architecture development. We present an analytic performance modeling approach aimed to minimize prediction cost, while providing a prediction accuracy that is sufficient to enable major code and data mapping decisions. Our approach is based on a performance simulation language called Pamela. Apart from simulation, Pamela features a symbolic analysis technique that enables Pamela models to be compiled into symbolic performance models that trade prediction accuracy for the lowest possible solution cost. We demonstrate our approach through a large number of theoretical and practical modeling case studies, including six parallel programs and two distributed-memory machines. The average prediction error of our approach is less than 10 percent, while the average worst-case error is limited to 50 percent. It is shown that this accuracy is sufficient to correctly select the best coding or partitioning strategy. For programs expressed in a high-level, structured programming model, such as data-parallel programs, symbolic performance modeling can be entirely automated. We report on experiments with a Pamela model generator built within a data-parallel compiler for distributed-memory machines. Our results show that with negligible program annotation, symbolic performance models are automatically compiled in seconds, while their solution cost is in the order of milliseconds.

[1] V.S. Adve, “Analyzing the Behavior and Performance of Parallel Programs.” PhD thesis, Technical Report #1201, Univ. of Wisconsin, Madison, WI, Dec. 1993.
[2] V. Adve and M.K. Vernon, "The Influence of Random Delays on Parallel Execution Times," 1993 ACM SIGMETRICS Conf. Measurement and Modeling of Computer Systems, pp. 61-73, May 1993.
[3] M.Ajmone Marsan,G. Balbo,, and G. Conte,“A class of generalized stochastic Petri nets for the performance evaluation of multiprocessor systems,” ACM Trans. Computer Systems, pp. 93-122, vol. 2, no. 2, May 1984.
[4] F. Allen, M. Burke, P. Charles, J. Ferrante, W. Hsieh, and V. Sarkar, "A Framework for Detecting Useful Parallelism," Proc. Second Int'l Conf. Supercomputing,St. Malo, France, July 1988.
[5] D. Atapattu and D. Gannon, “Building Analytical Models into an Interactive Prediction Tool,” Proc. ACM Supercomputing '89, pp. 521-530, 1989.
[6] H. Bal, “The Distributed ASCI Supercomputer Project,” Operating Systems Review, vol. 34, pp. 76-96, Oct. 2000.
[7] V. Balasundaram, G. Fox, K. Kennedy, and U. Kremer, “A Static Performance Estimator to Guide Data Partitioning Decisions,” Proc. Third ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, Apr. 1991.
[8] M.J. Clement and M.J. Quinn, “Multivariate Statistical Techniques for Parallel Performance Prediction,” IEEE Proc. 28th Hawaii Int'l Conf. System Sciences, pp. 446-455, Jan. 1995.
[9] D. Culler,R. Karp,D. Patterson,A. Sahay,K.E. Schauser,E. Santos,R. Subramonian,, and T. von Eicken,“LogP: Towards a realistic model of parallel computation,” Fourth Symp. Principles and Practices Parallel Programming, SIGPLAN’93, ACM, May 1993.
[10] A. González Escribano, A.J.C. van Gemund, and V. Cardeñoso Payo, “Performance Trade-Offs in Series-Parallel Programming Models,” Proc. 8th Int'l Workshop Compilers for Parallel Computers (CPC '00), pp. 183-189, Jan. 2000.
[11] T. Fahringer, “Estimating and Optimizing Performance for Parallel Programs,” Computer, vol. 28, pp. 47-56, Nov. 1995.
[12] A.J.C.v. Gemund, “Performance Prediction of Parallel Processing Systems: The PAMELA Methodology,” Proc. Int'l Conf. Supercomputing (ICS), pp. 318-327, July 1993.
[13] A.J.C. van Gemund, “Compiling Performance Models from Parallel Programs,” Proc. 8th ACM Int'l Conf. Supercomputing, pp. 303-312, July 1994.
[14] A.J.C. van Gemund, “Performance Modeling of Parallel Systems,” PhD thesis, Delft Univ. of Tech., Apr. 1996.
[15] A.J.C. van Gemund, “Automatic Cost Estimation of Data Parallel Programs,” Tech. Report 1-68340-44(2001)09, Faculty of Information Technology and Systems, Delft Univ. of Tech., Oct. 2001.
[16] N. Götz, U. Herzog, and M. Rettelbach, “Multiprocessor and Distributed System Design: The Integration of Functional Specification and Performance Analysis using Stochastic Process Algebras,” Proc. SIGMETRICS '93, 1993.
[17] R.L. Graham, “Bounds on Multiprocessing Timing Anomalies,” SIAM J. Appl. Math., vol. 17, no. 2, pp. 416-429, 1969.
[18] R.W. Hockney and I.J. Curington, “$\big. (f_{1/2})\bigr.$: A Parameter to Characterize Memory and Communication Bottlenecks,” Parallel Computing, vol. 10, pp. 277-286, 1989.
[19] K.K. Jain and V. Rajaraman, "Lower and Upper Bounds on Time for Multiprocessor Optimal Schedules," IEEE Trans. Parallel and Distributed Systems, vol. 5, no. 8, pp. 879-886, Aug. 1994.
[20] H. Jonkers, A.J.C. van Gemund, and G.L. Reijns, “A Probabilistic Approach to Parallel System Performance Modelling,” IEEE Proc. 28th Hawaii Int'l Conf. System Sciences, pp. 412-421, Jan. 1995.
[21] S.S. Lavenberg, Computer Performance Modeling Handbook. Academic Press, 1983.
[22] B.P. Lester, “A System for Computing the Speedup of Parallel Programs,” Proc. 1986 Int'l Conf. Parallel Processing, pp. 145-152, Aug. 1986.
[23] V.W. Mak and S.F. Lundstrom, “Predicting Performance of Parallel Computations,” IEEE Trans. Parallel and Distributed Systems, vol. 1, pp. 257–270, July 1990.
[24] C.L. Mendes and D.A. Reed, “Integrated Compilation and Scalability Analysis for Parallel Systems,” Proc. Int'l Conf. Parallel Architectures and Compiler Technology '98, pp. 385-392, Oct. 1998.
[25] W. Oed and O. Lange,“On the effective bandwidth of interleaved memories invector processing systems,” IEEE Trans. Computers, vol. 34, no. 10, pp. 949-957, Oct. 1985.
[26] PamelaProject Web Site .
[27] Parsytec Computer GmbH,Parix Release 1.2 Software Documentation, Mar. 1993.
[28] C.D. Polychronopoulos and U. Banerjee, “Speedup Bounds and Processor Allocation for Parallel Programs on Multiprocessors,” Proc. 1986 Int'l Conf. Parallel Processing, pp. 961-968, Aug. 1986.
[29] ——,“Performance and reliability analysis using directed acyclic graphs,”IEEE Trans. Software Eng., pp. 1105-1114, Oct. 1987.
[30] V. Sarkar,Partitioning and Scheduling Parallel Programs for Execution on Multiprocessors.Cambridge, Mass.: MIT Press, 1989.
[31] A.C. Shaw, “Deterministic Timing Schema for Parallel Programs,” Proc. 5th Int'l Symp. Parallel Processing, pp. 56-63, 1991.
[32] K. So, A.S. Bolmarcich, F. Darema, and V.A. Norton, “A Speedup Analyzer for Parallel Programs,” Proc. 1987 Int'l Conf. Parallel Processing, pp. 653-661, Aug. 1987.
[33] L.G. Valiant, “A Bridging Model for Parallel Computation,” Comm. ACM, vol. 33, no. 8, pp. 103-111, Aug. 1990.
[34] C. van Reeuwijk, A.J.C. van Gemund, and H.J. Sips, “Spar: A Programming Language for Semi-Automatic Compilation of Parallel Programs,” Concurrency: Practice and Experience, vol. 9, pp. 1193-1205, Nov. 1997.
[35] K-Y. Wang, “Precise Compile-Time Performance Prediction for Superscalar-Based Computers,” Proc. ACM SIGPLAN PLDI '94, Orlando, pp. 73-84, June 1994.
[36] B. Wegbreit, “Mechanical Program Analysis,” Comm. ACM, vol. 18, no. 9, pp. 528-538, Sept. 1975.
[37] Z. Xu, X. Zhang, and L. Sun, "Semi-Empirical Multiprocessor Performance Predictions," J. Parallel and Distributed Computing, vol. 39, no. 1, pp. 14-28, 1996.
[38] J. Zahorjan, K.C. Sevcik, D.L. Eager, and B.I. Galler, “Balanced Job Bound Analysis of Queueing Networks,” Comm. ACM, vol. 25, no. 2, Feb. 1982.
[39] X. Zhang, Y. Yan, and K. He, "Latency Metric: An Experimental Method for Measuring and Evaluating Program and Architecture Scalability," J. Parallel and Distributed Computing, Vol. 22, No. 3, Sept. 1994, pp. 392-410.

Index Terms:
Performance prediction, parallel processing, analytic performance modeling.
Arjan J.C. van Gemund, "Symbolic Performance Modeling of Parallel Systems," IEEE Transactions on Parallel and Distributed Systems, vol. 14, no. 2, pp. 154-165, Feb. 2003, doi:10.1109/TPDS.2003.1178879
Usage of this product signifies your acceptance of the Terms of Use.