This Article 
 Bibliographic References 
 Add to: 
A Slowdown Model for Applications Executing on Time-Shared Clusters of Workstations
June 2001 (vol. 12 no. 6)
pp. 653-670

Abstract—Distributed applications executing on clustered environments typically share resources (computers and network links) with other applications. In such systems, application execution may be retarded by the competition for these shared resources. In this paper, we define a model that calculates the slowdown imposed on applications in time-shared multi-user clusters. Our model focuses on three kinds of slowdown: local slowdown, which synthesizes the effect of contention for CPU in a single workstation; communication slowdown, which synthesizes the effect of contention for the workstations and network links on communication costs; and aggregate slowdown, which determines the effect of contention on a parallel task caused by other applications executing on the entire cluster, i.e., on the nodes used by the parallel application. We verify empirically that this model provides an accurate estimate of application performance for a set of compute-intensive parallel applications on different clusters with a variety of emulated loads.

[1] M. Atallah, C. Black, D. Marinescu, H. Siegel, and T. Casavant, “Models and Algorithms for Coscheduling Compute-Intensive Tasks on a Network of Workstations,” J. Parallel and Distributed Computing, vol. 16, pp. 319-327, 1992.
[2] A. Beguelin, J.J. Dongarra, G.A. Geist, R. Manchek, and V.S. Sunderam, "Graphical Development Tools for Network-Based Concurrent Supercomputing," Proc. Supercomputing '91,Albuquerque, N.M., Nov. 1991.
[3] F. Berman, R. Wolski, S. Figueira, J. Schopf,, and G. Shao, “Application-Level Scheduling on Distributed Heterogeneous Networks,” Proc. Supercomputing, 1996.
[4] A. Bricker, M. Litzkow, and M. Livny, “Condor Technical Summary,” Technical Report #1069, Computer Science Dept., Univ. of Wisconsin, May 1992.
[5] W.L. Briggs, “A Multigrid Tutorial,” Society for Industrial and Applied Mathematics, 1987.
[6] H. Dietz, W. Cohen, and B. Grant, “Would You Run It Here. . . or There? (AHS: Automatic Heterogeneous Supercomputing),” Proc. Int'l Conf. Parallel Processing, vol. II, pp. 217-221, Aug. 1993.
[7] X. Du and X. Zhang, “Coordinating Parallel Processes on Networks of Workstations,” J. Parallel and Distributed Computing, vol. 46, no. 2, pp. 125-135, Nov. 1997.
[8] A. Dusseau, R. Arpaci, and D. Culler, "Effective Distributed Scheduling of Parallel Workloads," Proc. 1996 ACM Sigmetrics Int'l Conf. Measurement and Modeling of Computer Systems, Assoc. of Computing Machinery, N.Y., May 1996.
[9] S.M. Figueira, “Modeling the Effects of Contention on Application Performance in Multi-User Environments,” doctoral dissertation, Computer Science Eng. Dept., Univ. of Calif., San Diego, Dec. 1996.
[10] S.M. Figueira and F. Berman, “Modeling the Effects of Contention on the Performance of Heterogeneous Applications,” Proc. Fifth Int'l Symp. High-Performance Distributed Computing, pp. 392-401, Aug. 1996.
[11] S.J. Fink, S.B. Baden, and S.R. Kohn, “Flexible Communication Mechanisms for Dynamic Structured Applications,” Proc. Third Int'l Workshop Parallel Algorithms for Irregularly Structured Problems, Aug. 1996.
[12] S. Leutenegger and X. Sun, “Distributed Computing Feasibility in a Non-Dedicated Homogeneous Distributed System,” Proc. Supercomputing '93, pp. 143-152, Nov. 1993.
[13] “MPI: A Message-Passing Interface Standard,” Proc. Message-Passing Interface Forum, June 1995.
[14] D. Bailey, E. Barszcz, J. Barton, D. Browning, R. Carter, L. Dagum, R. Fatoohi, S. Fineberg, P. Frederickson, T. Lasinski, R. Schreiber, H. Simeon, V. Venkatakrishnan, and S. Weeratunga, The NAS Parallel Benchmarks (94), Technical Report RNR-94-007, NASA Ames Research Center Mar. 1994.
[15] A.L. Rosenberg, “Guidelines for Data-Parallel Cycle-Stealing in Networks of Workstations, II: On Maximizing Guaranteed Output,” Proc. Second Merged Symp. IPPS/SPDP, Apr. 1999.
[16] P.G. Sobalvarro, S. Pakin, W.E. Weihl, and A.A. Chien, Dynamic Coscheduling on Workstation Clusters Job Scheduling Strategies for Parallel Processing, pp. 231-256, 1998.
[17] V. Sunderam, “PVM: A Framework for Parallel Distributed Computing,” Concurrency: Practice and Experience, vol. 2, no. 4, pp. 315–339, , 1990.
[18] D. Whitley, T. Starkweather, and DUAnn Fuquay, “Scheduling Problems and Traveling Salesman: The Genetic Edge Recombination Operator,” Proc. Int'l Conf. Genetic Algorithms, 1989.
[19] R. Wolski, N. Spring, and J. Hayes, “The Network Weather Service: A Distributed Resource Performance Forecasting Service for Metacomputing,” J. Future Generation Computer Systems, 1998.
[20] R. Wolski, “Dynamically Forecasting Network Performance Using the Network Weather Service,” J. Cluster Computing, vol. 1, no. 1, pp. 119-132, 1998.
[21] X. Zhang and Y. Yan, “A Framework of Performance Prediction of Parallel Computing on Non-Dedicated Heterogeneous Networks of Workstations,” Proc. 1995 Int'l Conf. Parallel Processing, vol. I, pp. 163-167, 1995.

Index Terms:
Data-parallel applications, time-shared clusters of workstations, networks of workstations, application slowdown, performance prediction.
Silvia M. Figueira, Francine Berman, "A Slowdown Model for Applications Executing on Time-Shared Clusters of Workstations," IEEE Transactions on Parallel and Distributed Systems, vol. 12, no. 6, pp. 653-670, June 2001, doi:10.1109/71.932718
Usage of this product signifies your acceptance of the Terms of Use.