May 31, 1999 to June 4, 1999
Michael Dahlin , University of Texas at Austin
In this paper we examine the problem of balancing load in a large-scale distributed system when information about server loads may be stale. It is well known that sending each request to the machine with the apparent lowest load can behave badly in such systems, yet this technique is common in practice. Other systems use round-robin or random selection algorithms that entirely ignore load information or that only use a small subset of the load information. Rather than risk extremely bad performance on one hand or ignore the chance to use load information to improve performance on the other, we develop strategies that interpret load information based on its age. Through simulation, we examine several simple algorithms that use such load interpretation strategies under a range of workloads. Our experiments suggest that by properly interpreting load information, systems can (1) match the performance of the most aggressive algorithms when load information is fresh relative to the job arrival rate, (2) outperform the best of the other algorithms we examine by as much as 60% when information is moderately old, (3) significantly outperform random load distribution when information is older still, and (4) avoid pathological behavior even when information is extremely old.
load balancing, queueing models, load interpretation, communication delays, distributed systems
Michael Dahlin, "Interpreting Stale Load Information", ICDCS, 1999, 2013 IEEE 33rd International Conference on Distributed Computing Systems, 2013 IEEE 33rd International Conference on Distributed Computing Systems 1999, pp. 0285, doi:10.1109/ICDCS.1999.776530