This Article 
 Bibliographic References 
 Add to: 
On Optimal Strategies for Cycle-Stealing in Networks of Workstations
May 1997 (vol. 46 no. 5)
pp. 545-557

Abstract—We study the parallel scheduling problem for a new modality of parallel computing: having one workstation "steal cycles" from another. We focus on a draconian mode of cycle-stealing, in which the owner of workstation B allows workstation A to take control of B's processor whenever it is idle, with the promise of relinquishing control immediately upon demand. The typically high communication overhead for supplying workstation B with work and receiving its results militates in favor of supplying B with large amounts of work at a time; the risk of losing work in progress when the owner of B reclaims the workstation militates in favor of supplying B with a sequence of small packets of work. The challenge is to balance these two pressures in a way that maximizes the amount of work accomplished.

We formulate two models of cycle-stealing. The first attempts to maximize the expected work accomplished during a single episode, when one knows the probability distribution of the return of B's owner. The second attempts to match the productivity of an omniscient cycle-stealer, when one knows how much work that stealer can accomplish. We derive optimal scheduling strategies for sample scenarios within each of these models.

Perhaps our most important discovery is the as-yet unexplained coincidence that two quite distinct scenarios lead to almost identical unique optimizing schedules. One scenario falls within our first model; it assumes that the probability of the return of B's owner is uniform across the lifespan of the episode; the optimizing schedule maximizes the expected amount of work accomplished during the episode. The other scenario falls within our second model; it assumes that B's owner will interrupt our cycle-stealing at most once during the lifespan of the opportunity; the optimizing schedule maximizes the amount of work that one is guaranteed to accomplish during the lifespan.

[1] B. Awerbuch, Y. Azar, A. Fiat, and F.T. Leighton, “Making Commitments in the Face of Uncertainty: How to Pick a Winner Almost Every Time,” Proc. 28th ACM Symp. Theory of Computing, pp. 519-530, 1996.
[2] R.D. Blumofe, C.F. Joerg, B.C. Kuszmaul, C.E. Leiserson, K.H. Randall, and Y. Zhou, “Cilk: An Efficient Multithreaded Runtime System,” Proc. Fifth ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, pp. 207–216, July 1995.
[3] R.D. Blumofe and C.E. Leiserson, "Scheduling Multithreaded Computations by Work Stealing," Proc. 35th Symp. Foundations of Computer Science, 1994.
[4] S.J. Chapin, "Distributed Scheduling Support in the Presence of Autonomy," Proc. Fourth Heterogeneous Computing Workshop, pp. 22-29, 1995.
[5] S.J. Chapin, "Preliminary Performance Results for MESSIAHS," Bull. IEEE TC Operating Systems and Application Environments 7, pp. 12-23, 1995.
[6] D. Cheriton, "The V Distributed System," Comm. ACM, vol. 31, no. 3, pp. 314-333, Mar. 1988.
[7] E.G. Coffman Jr., L. Flatto, and A.Y. Krenin, "Scheduling Saves in Fault-Tolerant Computations," Acta Informatica, vol. 30, pp. 409-423, 1993.
[8] D. Gelernter and D. Kaminsky, "Supercomputing Out of Recycled Garbage: Preliminary Experience with Piranha," Technical Report RR883, Yale Univ., 1991.
[9] M. Litzkow, M. Livny, and M.W. Mutka, “Condor—A Hunter of Idle Workstations,” Proc. Eighth Int'l Conf. Distributed Computing Systems, Jun. 1988.
[10] D. Nichols, "Multiprocessing in a Network of Workstations," Ph.D. thesis, Carnegie Mellon Univ, 1990.
[11] J.K. Ousterhout et al., "The Sprite network operating system," IEEE Computer, vol. 21, no. 2, pp. 23-36, Feb. 1988.
[12] C.H. Papadimitriou and M. Yannakakis,"Towards an architecture-independent analysis of parallel algorithms," SIAM J. Computing, vol. 19, no. 2, pp. 322-328, Apr. 1990.
[13] W.R. Pearson, "PVM, the 'Parallel Virtual Machine,' vs. Net Express," Message on comp.parallel, Mar.9 1993.
[14] G.F. Pfister, In Search of Clusters, second ed. New Jersey: Prentice Hall, 1998.
[15] M. Stumm, "The Design and Implementation of a Decentralized Scheduling Facility for a Workstation Cluster," Proc. Second IEEE Conf. Computer Workstations, pp. 12-22, 1988.
[16] V. Sunderam, “PVM: A Framework for Parallel Distributed Computing,” Concurrency: Practice and Experience, vol. 2, no. 4, pp. 315–339, , 1990.
[17] A. Tannenbaum, "Amoeba: A Distributed Operating System for the 1990s," Computer, pp. 44-53, 1990.
[18] M.M. Theimer and K.A. Lantz, "Finding Idle Machines in a Workstation-Based Distributed Environment," IEEE Trans. Software Eng., vol. 15, pp. 1,444-1,458, 1989.
[19] S.W. White and D.C. Torney, "Use of a Workstation Cluster for the Physical Mapping of Chromosomes," SIAM NEWS, pp. 14-17, Mar. 1993.

Index Terms:
Cycle-stealing, data parallel computation, networks of workstations, parallel scheduling, formal models, optimal competitive ratio, optimal expected throughput.
Sandeep N. Bhatt, Fan R.K. Chung, F. Thomson Leighton, Arnold L. Rosenberg, "On Optimal Strategies for Cycle-Stealing in Networks of Workstations," IEEE Transactions on Computers, vol. 46, no. 5, pp. 545-557, May 1997, doi:10.1109/12.589220
Usage of this product signifies your acceptance of the Terms of Use.