This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Interpreting Stale Load Information
October 2000 (vol. 11 no. 10)
pp. 1033-1047

Abstract—In this paper, we examine the problem of balancing load in a large-scale distributed system when information about server loads may be stale. It is well-known that sending each request to the machine with the apparent lowest load can behave badly in such systems, yet this technique is common in practice. Other systems use round-robin or random selection algorithms that entirely ignore load information or that only use a small subset of the load information. Rather than risk extremely bad performance on one hand or ignore the chance to use load information to improve performance on the other, we develop strategies that interpret load information based on its age. Through simulation, we examine several simple algorithms that use such load interpretation strategies under a range of workloads. Our experiments suggest that by properly interpreting load information, systems can: 1) match the performance of the most aggressive algorithms when load information is fresh relative to the job arrival rate, 2) outperform the best of the other algorithms we examine by as much as 60 percent when information is moderately old, 3) significantly outperform random load distribution when information is older still, and 4) avoid pathological behavior even when information is extremely old.

[1] “The Next Step in Server Load Balancing,” http://www.alteon.com/products/white_papers slbwp.html, 1998.
[2] Y. Amir, B. Awerbuch, A. Barak, R. Borgstrom, and A. Keren, “An Opportunity Cost Approach for Job Assignment and Reassignment in a Scalable Computing Cluster,” Proc. 10th Int'l Conf. Parallel and Distributed Computing and Systems, Oct. 1998.
[3] M. Arlitt and C. Williamson, "Web Server Workload Characterization: The Search for Invariants," Proc. SIGMETRICS Int'l Conf. Measurement and Modeling of Computer Systems, ACM Press, New York, 1996, pp. 126-127.
[4] Y. Artsy and R. Finkel, “Designing a Process Migration Facility: The Charlotte Experience,” Computer, vol 22, no. 9, Sep. 1989.
[5] B. Awerbuch, Y. Azar, A. Fiat, and F.T. Leighton, “Making Commitments in the Face of Uncertainty: How to Pick a Winner Almost Every Time,” Proc. 28th ACM Symp. Theory of Computing, pp. 519-530, 1996.
[6] Y. Azar, A.Z. Broder, A.R. Karlin, and E. Upfal, Balanced Allocations Proc. 26th ACM Symp. Theory of Computing, pp. 593-602, 1994.
[7] T. Brisco, “DNS Support for Load Balancing,” Technical Report RFC 1794, Network Working Group, Apr. 1995.
[8] D. Butterfield and G. Popek, “Network Tasking in the Locus Distributed Unix System,” Proc. Summer 1984 USENIX Conf., pp. 62–71, June 1984.
[9] K. Christensen, “Dangers of the Herd Effect in Selection Methods for Replicated Web Servers,” http://www.csee.usf.educhristen/, Feb. 1999.
[10] K. Christensen, “Genpar2. C,” http://www.csee.usf.edu/christen/toolsgenpar2.c , Mar. 1999.
[11] M. Crovella and A. Bestavros, “Self-Similarity in World Wide Web Traffic: Evidence and Causes,” Proc. ACM SIGMetrics Int'l Conf. Measurement and Modeling of Computer Systems, vol. 24, no. 1, ACM SIGMetrics Performance Evaluation Rev., pp. 160-169, 1996.
[12] M. Crovella, M. Harchol-Balter, and C. Murta, “Task Assignment in a Distributed System: Improving Performance by Unbalancing Load,” Proc. ACM Sigmetrics Conf. Measurement and Modeling of Computer Systems, pp. 268-269, June 1998
[13] T. Decker, R. Diekmann, R. Lüling, and B. Monien, “Towards Developing Universal Dynamic Mapping Algorithms,” Proc. Seventh IEEE Symp. Parallel and Distributed Processing, pp. 456–459, 1995.
[14] K. Delgadillo, “Cisco Distributed Director,” white paper, Cisco, Inc., 1997.
[15] F. Douglis and J. Ousterhout, "Transparent Process Migration: Design Alternatives and the Sprite Implementation," Software Practice&Experience, Vol. 21, Aug. 1991, pp. 757-785.
[16] D. Duke, T. Green, and J. Pasko, “Research Toward a Heterogeneous Networked Computing Cluster: The Distributed Queuing System Version 3.0,” http://www.scri.fsu.edu/pasko/dqsdqs.html , Jan. 1996.
[17] D.L. Eager, E.D. Lazowska, and J. Zahorjan, "Adaptive Load Sharing in Homogeneous Distributed Systems," IEEE Trans. Software Eng., vol. 12, no. 5, pp. 662-675, May 1986.
[18] J. Gehring and A. Reinefeld, “MARS—A Framework for Minimising the Job Execution Time in a Metacomputing Environment,” Future Generation Computer Systems, vol. 12, pp. 87-99, 1996.
[19] J. Guyton and M. Schwartz, “Locating Nearby Copies of Replicated Internet Servers,” Proc. ACM SIGCOMM '95 Conf. Applications, Technologies, Architectures, and Protocols for Computer Comm., 1995.
[20] M. Harchol-Balter, M.E. Crovella, and C.D. Murta, “On Choosing a Task Assignment Policy for a Distributed Server System,” Proc. Performance Tools '98, pp. 231-242, 1998.
[21] J. Hill, B. McColl, D. Stefanescu, M. Goudreau, K. Lang, S. Rao, T. Suel, T. Tsantilas, and R. Bisseling, “BSPlib: The BSP Programming Library,” http://www.bsp-worldwide.org/standardstandard.htm , May 1997.
[22] R.M. Karp, M. Luby, and F. Meyer auf der Heide, Efficient PRAM Simulation on a Distributed Memory Machine Proc. 24th ACM Symp. Theory of Computing, pp. 318-326, May 1992.
[23] M. Litzkow, M. Livny, and M.W. Mutka, “Condor—A Hunter of Idle Workstations,” Proc. Eighth Int'l Conf. Distributed Computing Systems, Jun. 1988.
[24] D. Milojicic, Load Distribution: Implementation for the Mach Microkernel, PhD thesis, Univ. of Kaiserslautern, Kaiserslautern Germany 1993.
[25] R. Mirchandaney, D. Towsley, and J.A. Stankovic, "Analysis of the Effect of Delays on Load Sharing," IEEE Trans. Computers, vol. 38, no. 11, pp. 1,513-1,525, Nov. 1989.
[26] R. Mirchandaney, D. Towsley, and J. Stankovic, “Adaptive Load Sharing in Heterogeneous Distributed Systems,” J. Parallel and Distributed Computing, vol. 9,no. 331–346, 1990.
[27] M. Mitzenmacher, The Power of Two Choices in Randomized Load Balancing PhD thesis, Univ. of California Berkeley, 1996.
[28] K. Ramamritham and J.A. Stankovic, “Scheduling Algorithms and Operating System Support for Real Time Systems,” Proc. IEEE, vol. 82, no. 1, Jan. 1994.
[29] M. Michael, How Useful Is Old Information? IEEE Trans. Parallel and Distributed Systems, vol. 11, no. 1, pp. 6-20, Jan. 2000.
[30] A. Myers, P. Dinda, and H. Zhang, “Performance Characteristics of Mirror Servers on the Internet,” Technical Report CMU-CS-98-157, Carnegie Mellon Univ., July 1998.
[31] D. Nichols, “Using Idle Workstations in a Shared Computing Environment,” Proc. Eleventh ACM Symp. Operating Systems Principles, pp. 5–12, Oct. 1987.
[32] M. Powell and B. Miller, “Process Migration in DEMOS/MP,” Operating Systems Review, vol. 17, no. 5, pp. 110–119, 1983.
[33] N.G. Shivaratri, P. Krueger, and M. Singhal, “Load Distributing for Locally Distributed Systems,” Computer, vol. 25, no. 12, pp. 33-44, Dec. 1992.
[34] A.S. Tanenbaum et al., "Experiences with the Amoeba Distributed Operating System," Comm. ACM, Vol. 33, No. 12, Dec. 1990, pp. 46-63.
[35] M. Theimer, K. Landtz, and D. Cheriton, “Preemptable Remote Execution Facilities for the V System,” Proc. 10th ACM Symp. Operating System Principles, ACM, 1985, pp. 2-12.
[36] N. Vvedenskaya, R. Dobrushin, and F. Karpelevich, “Queuing Systems with Selection of the Shortest of Two Queues: An Asymptotic Approach,” Problems of Information Transmission, vol. 32, pp. 15–27, 1996.
[37] D. Wessels, “Squid Internet Object Cache,” http://squid.nlanr.netSquid/, August 1998.
[38] C. Yoshikawa, B. Chun, P. Eastham, A. Vahdat, T. Anderson, and D. Culler, “Using Smart Clients to Build Scalable Services,” Proc. 1997 USENIX Technical Conf., Jan. 1997.
[39] E.R. Zayas, “Attacking the Process Migration Bottleneck,” Proc. 11th ACM Symp. Operating System Principles, pp. 13-24, 1987.
[40] S. Zhou et al., "Utopia: a Load Sharing Facility for Large, Heterogeneous Distributed Computer Systems," Software—Practice and Experience, Vol. 23, No. 12, Dec. 1993, pp. 1305-1336.

Index Terms:
Load balancing, server selection, stale information, queuing theory, distributed systems.
Citation:
Michael Dahlin, "Interpreting Stale Load Information," IEEE Transactions on Parallel and Distributed Systems, vol. 11, no. 10, pp. 1033-1047, Oct. 2000, doi:10.1109/71.888643
Usage of this product signifies your acceptance of the Terms of Use.