Subscribe

Issue No.02 - February (2009 vol.20)

pp: 207-218

Menno Dobber , Vrije Universitiet, Amsterdam

Rob van der Mei , CWI-Mathematics and Computer Science, Kruislaan

Ger Koole , Vrije Universitiet, Amsterdam

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TPDS.2008.61

ABSTRACT

Global-scale grids provide a massive source of processing power, providing the means to support processor intensive parallel applications. The strong burstiness and unpredictability of the available processing and network resources raise the strong need to make applications robust against the dynamics of grid environments. The two main techniques that are most suitable to cope with the dynamic nature of the grid are Dynamic Load Balancing (DLB) and job replication (JR). In this paper, we analyze and compare the effectiveness of these two approaches by means of trace-driven simulations. We observe that there exists an easy-to-measure statistic Y and a corresponding threshold value Y^{\ast}, such that DLB consistently outperforms JR when Y > Y^{\ast}, whereas the reverse is true for Y < Y^{\ast}. Based on this observation, we propose a simple and easy-to-implement approach, throughout referred to as the DLB/JR method, that can make dynamic decisions about whether to use DLB or JR. Extensive simulations based on a large set of real data monitored in a global-scale grid show that our DLB/JR method consistently performs at least as good as both DLB and JR in all circumstances, which makes our DLB/JR method highly robust against the unpredictable nature of global-scale grids.

INDEX TERMS

Grid computing, dynamic load balancing, job replication, performance.

CITATION

Menno Dobber, Rob van der Mei, Ger Koole, "Dynamic Load Balancing and Job Replication in a Global-Scale Grid Environment: A Comparison",

*IEEE Transactions on Parallel & Distributed Systems*, vol.20, no. 2, pp. 207-218, February 2009, doi:10.1109/TPDS.2008.61REFERENCES

- [1] http:/www.planet-lab.org, 2008.
- [2] I. Ahmad and Y.-K. Kwok, “A New Approach to Scheduling Parallel Programs Using Task Duplication,”
Proc. Int'l Conf. Parallel Processing (ICPP '94), pp. 47-51, 1994.- [3] H. Attiya, “Two Phase Algorithm for Load Balancing in Heterogeneous Distributed Systems,”
Proc. 12th Euromicro Conf. Parallel, Distributed and Network-Based Processing (PDP '04), p. 434, 2004.- [4] R. Bajaj and D.P. Agrawal, “Improving Scheduling of Tasks in a Heterogeneous Environment,”
IEEE Trans. Parallel and Distributed Systems, vol. 15, no. 2, pp. 107-118, Feb. 2004.- [5] I. Banicescu and V. Velusamy, “Load Balancing Highly Irregular Computations with the Adaptive Factoring,”
Proc. 16th Int'l Parallel and Distributed Processing Symp. (IPDPS '02), p. 195, 2002.- [6] J.Y. Colin and P. Chretienne, “C.P.M. Scheduling with Small Communication Delays and Task Duplication,”
Operations Research, vol. 39, no. 4, pp. 680-684, 1991.- [7] S. Darbha and D.P. Agrawal, “Optimal Scheduling Algorithm for Distributed-Memory Machines,”
IEEE Trans. Parallel and Distributed Systems, vol. 9, no. 1, pp. 87-95, Jan. 1998.- [8] A.M. Dobber, G.M. Koole, and R.D. van der Mei, “Dynamic Load Balancing for a Grid Application,”
Proc. Int'l Conf. High Performance Computing (HiPC '04), pp. 342-352, 2004.- [9] A.M. Dobber, G.M. Koole, and R.D. van der Mei, “Dynamic Load Balancing Experiments in a Grid,”
Proc. Fifth IEEE Int'l Symp. Cluster Computing and the Grid (CCGrid '05), pp. 123-130, 2005.- [10] A.M. Dobber, R.D. van der Mei, and G.M. Koole, “Effective Prediction of Job Processing Times in a Large-Scale Grid Environment,”
Proc. 15th IEEE Int'l Symp. High Performance Distributed Computing (HPDC), 2006.- [11] A.M. Dobber, R.D. van der Mei, and G.M. Koole, “Statistical Properties of Task Running Times in a Global-Scale Grid Environment,”
Proc. Sixth IEEE Int'l Symp. Cluster Computing and the Grid (CCGrid), 2006.- [12] A.M. Dobber, R.D. van der Mei, and G.M. Koole, “A Prediction Method for Job Running Times on Shared Processors: Survey, Statistical Analysis and New Avenues,”
Performance Evaluation, vol. 64, pp. 755-781, 2007.- [13] D.J. Evans, “Parallel SOR Iterative Methods,”
Parallel Computing, vol. 1, pp. 3-18, 1984.- [14] M.J. Flynn, “Some Computer Organizations and Their Effectiveness,”
IEEE Trans. Computers, vol. 21, pp. 948-960, 1972.- [15] L. Guodong, C. Daoxu, D. Wang, and Z. Defu, “Task Clustering and Scheduling to Multiprocessors with Duplication,”
Proc. Int'l Parallel and Distributed Processing Symp. (IPDPS '03), p. 6b, 2003.- [16] J.F. Kenney and E.S. Keeping, “Mathematics of Statistics,”
Chapter 15, pp. 252-285, 1962.- [17] A. Mondal, K. Goda, and M. Kitsuregawa,
Effective Load-Balancing via Migration and Replication in Spatial Grids, LNCS 2736, pp. 201-211, 2003.- [18] G.-L. Park, B. Shirazi, and J. Marquis, “Mapping of Parallel Tasks to Multiprocessors with Duplication,”
Proc. 31st Ann. Hawaii Int'l Conf. Systems Sciences (HICSS '98), vol. 7, p. 96, 1998.- [19] B.A. Shirazi, A.R. Hurson, and K.M. Kavi,
Scheduling and Load Balancing in Parallel and Distributed Systems. IEEE CS Press, 1995.- [20] L.G. Valiant, “A Bridging Model for Parallel Computation,”
Comm. ACM, vol. 33, no. 8, pp. 103-111, 1990.- [21] D. York, “Least-Square Fitting of a Straight Line,”
Canadian J. Physics, vol. 44, pp. 1079-1086, 1966.- [22] K. Yu-Kwong, “Parallel Program Execution on a Heterogeneous PC Cluster Using Task Duplication,”
Proc. Ninth Heterogeneous Computing Workshop (HCW '00), p. 364, 2000.- [23] M.J. Zaki, W. Li, and S. Parthasarathy, “Customized Dynamic Load Balancing for a Network of Workstations,”
J. Parallel and Distributed Computing, vol. 43, no. 2, pp. 156-162, 1997.- [24] S. Zhou, “A Trace-Driven Simulation Study of Dynamic Load Balancing,”
IEEE Trans. Software Eng., vol. 14, no. 9, pp. 1327-1341, Sept. 1988. |