Issue No.12 - December (2008 vol.19)
pp: 1671-1682
Issam Al-Azzoni , McMaster University, Hamilton
Resource management systems (RMS) are an important component in heterogeneous computing (HC) systems. One of the jobs of an RMS is the mapping of arriving tasks onto the machines of the HC system. Many different mapping heuristics have been proposed in recent years. However, most of these heuristics suffer from several limitations. One of these limitations is the performance degradation that results from using outdated global information about the status of all machines in the HC system. This paper proposes several heuristics which address this limitation by only requiring partial information in making the mapping decisions. These heuristics utilize the solution to a linear programming (LP) problem which maximizes the system capacity. Simulation results show that our heuristics perform very competitively while requiring dramatically less information.
distributed systems, load balancing, heterogeneous processors, queueing theory
Issam Al-Azzoni, "Linear Programming-Based Affinity Scheduling of Independent Tasks on Heterogeneous Computing Systems", IEEE Transactions on Parallel & Distributed Systems, vol.19, no. 12, pp. 1671-1682, December 2008, doi:10.1109/TPDS.2008.59
[1] Beowulf Cluster Computing with Linux, T. Sterling, E. Lusk, and W. Gropp, eds., MIT Press, 2003.
[2] I. Foster, C. Kesselman, and S. Tuecke, “The Anatomy of the Grid: Enabling Scalable Virtual Organizations,” Int'l J. High Performance Computing Applications, vol. 15, no. 3, pp. 200-222, 2001.
[3] J.-K. Kim, S. Shivle, H.J. Siegel, A.A. Maciejewski, T.D. Braun, M. Schneider, S. Tideman, R. Chitta, R.B. Dilmaghani, R. Joshi, A. Kaul, A. Sharma, S. Sripada, P. Vangari, and S.S. Yellampalli, “Dynamically Mapping Tasks with Priorities and Multiple Deadlines in a Heterogeneous Environment,” J. Parallel and Distributed Computing, vol. 67, no. 2, pp. 154-169, 2007.
[4] M. Mitzenmacher, “How Useful Is Old Information?” IEEE Trans. Parallel and Distributed Systems, vol. 11, no. 1, pp. 6-20, 2000.
[5] I. Al-Azzoni and D. Down, “Linear Programming Based Affinity Scheduling for Heterogeneous Computing Systems,” Proc. Int'l Conf. Parallel and Distributed Processing Techniques and Applications (PDPTA '07), pp. 105-111, 2007.
[6] Y.-T. He, I. Al-Azzoni, and D. Down, “MARO—MinDrift Affinity Routing for Resource Management in Heterogeneous Computing Systems,” Proc. Conf. of the Centre for Advanced Studies on Collaborative Research (CASCON '07), pp. 71-85, 2007.
[7] L. Kontothanassis and D. Goddeau, “Profile Driven Scheduling for a Heterogeneous Server Cluster,” Proc. 34th Int'l Conf. Parallel Processing (ICPP '05), pp. 336-345, 2005.
[8] M. Maheswaran, S. Ali, H.J. Siegel, D. Hensgen, and R.F. Freund, “Dynamic Matching and Scheduling of a Class of Independent Tasks onto Heterogeneous Computing Systems,” Proc. Eighth Heterogeneous Computing Workshop (HCW '99), pp. 30-44, 1999.
[9] D. Arnold, S. Agrawal, S. Blackford, J. Dongarra, M. Miller, K. Seymour, K. Sagi, Z. Shi, and S. Vadhiyar, ”Users' Guide to NetSolve V1.4.1,” Technical Report ICL-UT-02-05, Innovative Computing Dept., Univ. of Tennessee, June 2002.
[10] R. Freund, M. Gherrity, S. Ambrosius, M. Campbell, M. Halderman, D. Hensgen, E. Keith, T. Kidd, M. Kussow, J.D. Lima, F. Mirabile, L. Moore, B. Rust, and H.J. Siegel, “Scheduling Resources in Multi-User, Heterogeneous, Computing Environments with SmartNet,” Proc. Seventh Heterogeneous Computing Workshop (HCW'98), pp. 184-199, 1998.
[11] R. Freund, T. Kidd, and L. Moore, “SmartNet: A Scheduling Framework for Heterogeneous Computing,” Proc. Second Int'l Symp. Parallel Architectures, Algorithms and Networks (I-SPAN '96), pp. 514-521, 1996.
[12] A. Sharifnia, “Instability of the Join-the-Shortest-Queue and FCFS Policies in Queuing Systems and Their Stabilization,” Operations Research, vol. 45, no. 2, pp. 309-314, 1997.
[13] S. Andradóttir, H. Ayhan, and D.G. Down, “Dynamic Server Allocation for Queueing Networks with Flexible Servers,” Operations Research, vol. 51, no. 6, pp. 952-968, 2003.
[14] R. Armstrong, “Investigation of Effect of Different Run-Time Distributions on SmartNet Performance,” Master's thesis, Naval Postgraduate School, 1997.
[15] M. Mitzenmacher, “The Power of Two Choices in Randomized Load Balancing,” IEEE Trans. Parallel and Distributed Systems, vol. 12, no. 10, pp. 1094-1104, Oct. 2001.
[16] Y.-T. He, “Exploiting Limited Customer Choice and Server Flexibility,” PhD dissertation, McMaster Univ., 2007.
[17] A. Stolyar, “Optimal Routing in Output-Queued Flexible Server Systems,” Probability in the Eng. and Information Sciences, vol. 19, no. 2, pp. 141-189, 2005.
[18] T.D. Braun, H.J. Siegel, and A.A. Maciejewski, “Heterogeneous Computing: Goals, Methods, and Open Problems,” Proc. Eighth Int'l Conf. High Performance Computing (HiPC '01), pp. 307-320, 2001.
[19] H. Franke, J. Jann, J.E. Moreira, P. Pattnaik, and M.A. Jette, “An Evaluation of Parallel Job Scheduling for ASCI Blue-Pacific,” Proc. ACM/IEEE Conf. Supercomputing (SC '99), pp. 11-18, 1999.
[20] P.S. Ansell, K.D. Glazebrook, and C. Kirkbride, “Generalised ‘Join the Shortest Queue’ Policies for the Dynamic Routing of Jobs to Multiclass Queues,” J. Operational Research Soc., vol. 54, pp. 379-389, 2003.
[21] K.D. Glazebrook and C. Kirkbride, “Dynamic Routing to Heterogeneous Collections of Unreliable Servers,” Queueing Systems: Theory and Applications, vol. 55, no. 1, pp. 9-25, 2007.
[22] K. Wasserman, G. Michailidis, and N. Bambos, “Optimal Processor Allocation to Differentiated Job Flows,” Performance Evaluation, vol. 63, no. 1, pp. 1-14, 2006.
[23] H. Li, D. Groep, and L. Wolters, “Workload Characteristics of a Multi-Cluster Supercomputer,” Job Scheduling Strategies for Parallel Processing, D.G. Feitelson, L. Rudolph, and U. Schwiegelshohn, eds., pp. 176-193, Springer Verlag, 2004.
[24] J. Smith, L. Briceno, A.A. Maciejewski, H.J. Siegel, T. Renner, V. Shestak, J. Ladd, A. Sutton, D. Janovy, S. Govindasamy, A. Alqudah, R. Dewri, and P. Prakash, “Measuring the Robustness of Resource Allocations in a Stochastic Dynamic Environment,” Proc. 21st Int'l Parallel and Distributed Processing Symp. (IPDPS), 2007.
[25] S. Ali, A.A. Maciejewski, H.J. Siegel, and J.-K. Kim, “Measuring the Robustness of a Resource Allocation,” IEEE Trans. Parallel and Distributed Systems, vol. 15, no. 7, pp. 630-641, July 2004.
[26] A.M. Mehta, J. Smith, H.J. Siegel, A.A. Maciejewski, A. Jayaseelan, and B. Ye, “Dynamic Resource Allocation Heuristics That Manage Tradeoff between Makespan and Robustness,” J. Supercomputing, vol. 42, no. 1, pp. 33-58, 2007.
[27] V. Shestak, J. Smith, H.J. Siegel, and A.A. Maciejewski, “A Stochastic Approach to Measuring the Robustness of Resource Allocations in Distributed Systems,” Proc. 35th Int'l Conf. Parallel Processing (ICPP '06), pp. 459-470, 2006.
[28] H. Chen, “Fluid Approximations and Stability of Multiclass Queueing Networks: Work-Conserving Disciplines,” Annals of Applied Probability, vol. 5, pp. 637-655, 1995.
[29] H. Chen and D. Yao, Fundamentals of Queueing Networks: Performance, Asymptotics and Optimization. Springer-Verlag, 2001.
[30] J.G. Dai, Stability of Fluid and Stochastic Processing Networks. Centre for Mathematical Physics and Stochastics, publication no. 9, http:/, 1999.
[31] J.G. Dai, “On Positive Harris Recurrence of Multiclass Queueing Networks: A Unified Approach via Fluid Limit Models,” Annals of Applied Probability, vol. 5, pp. 49-77, 1995.
[32] J.G. Dai and S. Meyn, “Stability and Convergence of Moments for Multiclass Queueing Networks via Fluid Limit Models,” IEEE Trans. Automatic Control, vol. 40, pp. 1889-1904, 1995.