This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
When the Herd Is Smart: Aggregate Behavior in the Selection of Job Request
February 2003 (vol. 14 no. 2)
pp. 181-192

Abstract—In most parallel supercomputers, submitting a job for execution involves specifying how many processors are to be allocated to the job. When the job is moldable (i.e., there is a choice on how many processors the job uses), an application scheduler called SA can significantly improve job performance by automatically selecting how many processors to use. Since most jobs are moldable, this result has great impact to the current state of practice in supercomputer scheduling. However, the widespread use of SA can change the nature of workload processed by supercomputers. When many SAs are scheduling jobs on one supercomputer, the decision made by one SA affects the state of the system, therefore impacting other instances of SA. In this case, the global behavior of the system comes from the aggregate behavior caused by all SAs. In particular, it is reasonable to expect the competition for resources to become tougher with multiple SAs, and this tough competition to decrease the performance improvement attained by each SA individually. This paper investigates this very issue. We found that the increased competition indeed makes it harder for each individual instance of SA to improve job performance. Nevertheless, there are two other aggregate behaviors that override increased competition when the system load is moderate to heavy. First, as load goes up, SA chooses smaller requests, which increases efficiency, which effectively decreases the offered load, which mitigates long wait times. Second, better job packing and fewer jobs in the system make it easier for incoming jobs to fit in the supercomputer schedule, thus reducing wait times further. As a result, in moderate to heavy load conditions, a single instance of SA benefits from the fact that other jobs are also using SA.

[1] K. Aida, H. Kasahara, and S. Narita, “Job Scheduling Scheme for Pure Space Sharing among Rigid Jobs,” Job Scheduling Strategies for Parallel Processing, vol. 1459, 1988.
[2] K. Aida, “Effect of Job Size Characteristics on Job Scheduling Performance,” Job Scheduling Strategies for Parallel Processing, vol. 1911, 2000.
[3] S. Anastasiadis and K.C. Sevcik, “Parallel Application Scheduling on Networks of Workstations,” J. Parallel and Distributed Computing, vol. 43, pp. 109-124, 1997.
[4] F. Berman, R. Wolski, S. Figueira, J. Schopf,, and G. Shao, “Application-Level Scheduling on Distributed Heterogeneous Networks,” Proc. Supercomputing, 1996.
[5] W. Cirne and F. Berman, “A Model for Moldable Supercomputer Jobs,” Proc. Int'l Parallel and Distributed Processing Symp. (IPDPS '01), Apr. 2001.
[6] W. Cirne and F. Berman, “A Comprehensive Model of the Supercomputer Workload,” Proc. Fourth IEEE Ann. Workshop Workload Characterization (WWC-4), Dec. 2001.
[7] W. Cirne and F. Berman, “Using Moldability to Improve the Performance of Supercomputer Jobs,” J. Parallel and Distributed Computing, vol. 62, no. 10, pp. 1571-1601, 2002.
[8] S.-H. Chiang, R.K. Mansharamani, and M.K. Vernon, "Use of Application Characteristics and Limited Preemption for Run-to-Completion Parallel Processor Scheduling Policies," ACM SIGMETRICS, pp. 33-44, 1994.
[9] A. Downey, “A Model for Speedup of Parallel Programs,” Technical Report CSD-97-933, Univ. of California Berkeley, Jan. 1997.
[10] A.B. Downey, “Using Queue Time Predictions for Processor Allocation,” Proc. Int'l Parallel and Distributed Processing Symp. Workshop Job Scheduling Strategies for Parallel Processing, pp. 35-57, Apr. 1997.
[11] A.B. Downey and D.G. Feitelson, “The Elusive Goal of Workload Characterization,” Performance Evaluation Review, vol. 26, no. 4, pp. 14-29, Mar. 1999.
[12] D.L. Eager, J. Zahorian, and E.D. Lazowska, "Speedup versus Efficiency in Parallel Systems," IEEE Trans. Computers, vol. 38, no. 3, pp. 408-423, Mar. 1989.
[13] D.G. Feitelson, L. Rudolph, U. Schwiegelshohn, K.C. Sevcik, and P. Wong, “Theory and Practice in Parallel Job Scheduling,” Proc. Int'l Parallel and Distributed Processing Symp. Workshop Job Scheduling Strategies for Parallel Processing, pp. 1-34, Apr. 1997.
[14] D.G. Feitelson and A.M. Weil, “Utilization and Predictability in Scheduling the IBM SP2 with Backfilling,” Proc. 12th Int'l Parallel Processing Symp., pp. 542-546, Apr. 1998.
[15] D. Feitelson and L. Rudolph, “Metrics and Benchmarking for Parallel Job Scheduling,” Job Scheduling Strategies for Parallel Processing, vol. 1459, pp. 1-24, 1998.
[16] D. Ghosal, G. Serazzi, and S.K. Tripathi, "Processor Working Set and Its Use in Scheduling Multiprocessor Systems," IEEE Trans. Software Eng., vol. 17, no. 5, pp. 443-453, May 1991.
[17] R. Henderson, “Job Scheduling Under the Portable Batch System,” Job Scheduling Strategies for Parallel Processing, vol. 949, pp. 337-360, 1995.
[18] T. Hogg and B. Huberman, “Controlling Chaos in Distributed Systems,” IEEE Trans. Systems, Man, and Cybernetics, vol. 21, no. 6, Nov./Dec. 1991.
[19] R. Krishnamurti and E. Ma, “The Processor Partitioning Problem in Special-Purpose Partitionable Systems,” Int'l Conf. Parallel Processing, vol. 1, pp. 434-443, 1988.
[20] D. Lifka, “The ANL/IBM SP Scheduling System,” Proc. Int'l Parallel and Distributed Processing Symp. Workshop Job Scheduling Strategies for Parallel Processing vol. 949, pp. 295-303, Apr. 1995.
[21] V. Lo, J. Mache, and K. Windisch, “A Comparative Study of Real Workload Traces and Synthetic Workload Models for Parallel Job Scheduling,” Job Scheduling Strategies for Parallel Processing, D.G. Feitelson and L. Rudolph, eds., pp. 25-46, Springer-Verlag, 1998.
[22] Maui High Performance Computing Center. The Maui Scheduler Web Page.http://www.supercomp.org/sc96/proceedings/ org/sc96/proceedings/.http://wailea.mhpcc.edu maui/, 2002.
[23] C. McCann and J. Zahorjan, "Processor Allocation Policies for Message-Passing Parallel Computers," ACM SIGMETRICS, pp. 19-32, 1994.
[24] M. Michael, How Useful Is Old Information? IEEE Trans. Parallel and Distributed Systems, vol. 11, no. 1, pp. 6-20, Jan. 2000.
[25] C.N. de Graaf, A.S.E. Koster, K.L. Vincken, and M.A. Viergever, "A Methodology for the Validation of Image Segmentation Methods," Computer-Based Medical Systems, J.N. Brown and P. Santiago, eds., pp. 17-24.Los Alamitos: IEEE CS Press, 1992.
[26] D. Patterson, J. Hennessy, and D. Goldberg, Computer Architecture: A Quantitative Approach, second ed., Morgan Kaufmann, 1996.
[27] Platform Computing Corp. Load Sharing Facility Web Page,http://www.platform.com/platform/platform.nsf/ webpageLSF?OpenDocument, 2000.
[28] E. Rosti, E. Smirni, L.W. Dowdy, G. Serazzi, and B. Carlson, "Robust Partitioning Policies for Multiprocessor Systems," Performance Evaluation, vol. 19, nos. 2-3, pp. 141-165, 1994.
[29] E. Rosti, E. Smirni, L. Dowdy, G. Serazzi, and B. Carlson, “Analysis of Nonwork-Conserving Processor Partitioning Policies,” Job Scheduling Strategies for Parallel Processing, vol. 949, pp. 101-111, 1995.
[30] K. C. Sevcik,“Characterizations of parallelism in applications and their use in scheduling,”inProc. ACM Sigmetrics Conf., Berkeley, 1989, pp. 171–180.
[31] K.C. Sevcik, “Application Scheduling and Processor Allocation in Multiprogrammed Parallel Processing Systems,” Performance Evaluation–An Int'l J., vol. 19, nos. 2-3, pp. 107–140, Mar. 1994.
[32] Standard Performance Evaluation Corporation. The SPEC Web Page,http:/www.spec.org/, 2002.
[33] J. Subhlok, T. Gross, and T. Suzuoka, “Impact of Job Mix on Optimizations for Space Sharing Schedulers,” Proc. Supercomputing '96, 1996.

Index Terms:
Parallel supercomputers, space-shared supercomputers, job scheduling, application scheduling, aggregate behavior.
Citation:
Walfredo Cirne, Francine Berman, "When the Herd Is Smart: Aggregate Behavior in the Selection of Job Request," IEEE Transactions on Parallel and Distributed Systems, vol. 14, no. 2, pp. 181-192, Feb. 2003, doi:10.1109/TPDS.2003.1178881
Usage of this product signifies your acceptance of the Terms of Use.