This Article 
 Bibliographic References 
 Add to: 
Dynamic Cluster Resource Allocations for Jobs with Known and Unknown Memory Demands
March 2002 (vol. 13 no. 3)
pp. 223-240

The cluster system we consider for load sharing is a compute farm which is a pool of networked server nodes providing high-performance computing for CPU-intensive, memory-intensive, and I/O active jobs in a batch mode. Existing resource management systems mainly target at balancing the usage of CPU loads among server nodes. With the rapid advancement of CPU chips, memory and disk access speed improvements significantly lag behind advancement of CPU speed, increasing the penalty for data movement, such as page faults and I/O operations, relative to normal CPU operations. Aiming at reducing the memory resource contention caused by page faults and I/O activities, we have developed and examined load sharing policies by considering effective usage of global memory in addition to CPU load balancing in clusters. We study two types of application workloads: 1) Memory demands are known in advance or are predictable and 2) memory demands are unknown and dynamically changed during execution. Besides using workload traces with known memory demands, we have also made kernel instrumentation to collect different types of workload execution traces to capture dynamic memory access patterns. Conducting different groups of trace-driven simulations, we show that our proposed policies can effectively improve overall job execution performance by well utilizing both CPU and memory resources with known and unknown memory demands.

[1] A. Acharya and S. Setia, “Availability and Utility of Idle Memory in Workstation Clusters,” Proc. ACM SIGMETRICS Conf. Measuring and Modeling of Computer Systems, May 1999.
[2] Y. Amir, B. Awerbuch, A. Barak, R. Borgstrom, and A. Keren, “An Opportunity Cost Approach for Job Assignment and Reassignment in a Scalable Computing Cluster,” IEEE Trans. Parallel and Distributed Systems, vol. 11, no. 7, pp. 760–768, July 2000.
[3] A. Barak and A. Braverman, “Memory Ushering in a Scalable Computing Cluster,” J. Microprocessors and Microsystems, vol. 22, no. 3-4, pp. 175-182, Aug. 1998.
[4] A. Batat and D.G. Feitelson, Gang Scheduling with Memory Considerations Proc. 14th Int'l Parallel and Distributed Processing Symp., pp. 109-114, May 2000.
[5] S. Chen, L. Xiao, and X. Zhang, “Dynamic Load Sharing with Unknown Memory Demands of Jobs in Clusters,” Proc. 21st Ann. Int'l Conf. Distributed Computing Systems (ICDCS 2001), pp. 109-118, 2001.
[6] F. Douglis and J. Ousterhout, "Transparent Process Migration: Design Alternatives and the Sprite Implementation," Software Practice&Experience, Vol. 21, Aug. 1991, pp. 757-785.
[7] X. Du and X. Zhang, “Coordinating Parallel Processes on Networks of Workstations,” J. Parallel and Distributed Computing, vol. 46, no. 2, pp. 125-135, Nov. 1997.
[8] D.L. Eager, E.D. Lazowska, and J. Zahorjan, "The Limited Performance Benefits of Migrating Active Processes for Load Sharing," Proc. ACM Sigmetrics Conf., ACM Press, New York, 1988, pp. 63-72.
[9] M. Feeley, W. Morgan, F. Pighin, A. Karlin, H. Levy, and C. Thekkath, “Implementing Global Memory Management in a Workstation Cluster,” Proc. 15th ACM Symp. Operating Systems Principles, Dec. 1995.
[10] D. Feitelson, “The Parallel Workload Archive,” logs.html\#lanlcm5, 1998.
[11] D.G. Feitelson and B. Nitzberg, “Job Characteristics of a Production Parallel Scientific Workload on the NASA Ames iPSC/860,” Job Scheduling Strategies for Parallel Processing, D.G. Feitelson and L. Rudolph, eds., pp. 337-360, Springer-Verlag, 1995.
[12] M.D. Flouris and E.P. Markatos, “Network RAM,” High Performance Cluster Computing, Chapter 16, R. Buyya, ed., vol. 1, pp. 383-508, New Jersey: Prentice Hall 1999.
[13] G. Glass and P. Cao, “Adaptive Page Replacement Based on Memory Reference Behavior,” Proc. 1997 ACM SIGMETRICS Conf., pp. 115-126, 1997
[14] M. Harchol-Balter and A. Downey, “Exploiting Process Lifetime Distributions for Load Balancing,” ACM Trans. Computer Systems, vol. 3, no. 3, 1997.
[15] J. Hennessy and D. Patterson, Computer Architecture: A Quantitative Approach. Morgan Kaufmann, 1995.
[16] C. Hui and S. Chanson, “Improved Strategies for Dynamic Load Sharing,” IEEE Concurrency, vol. 7, no. 3, 1999.
[17] URL:, 1998.
[18] T. Kunz, “The Influence of Different Workload Descriptions on a Heuristic Load Balancing Scheme,” IEEE Trans. Software Engineering, vol. 17, no. 7, pp. 725-730, July 1991.
[19] W.-F. Lin, S.K. Reinhardt, and D. Burger, “Reducing DRAM Latencies with a Highly Integrated Memory Hierarchy Design,” Proc. Seventh Symp. High-Performance Computer Architecture, pp. 301-312, Jan. 2001.
[20] K.-L. Ma and T.W. Crockett, "A Scalable, Cell-Projection Volume Rendering Algorithm for 3D Unstructured Data," Proc. 1997 Symposium on Parallel Rendering, IEEE CS Press, Los Alamitos, Calif., 1997, pp. 95-104.
[21] E.P. Markatos and G. Dramitinos, “Implementation of a Reliable Remote Memory Pager,” Proc. 1996 Usenix Technical Conf., pp. 177-190, Jan. 1996.
[22] URL:http:/, 1985.
[23] V.G.J. Peris, M.S. Squillante, and V.K. Naik, “Analysis of the Impact of Memory in Distributed Parallel Processing Systems,” Proc. ACM SIGMETRICS Conf. Measurement and Modeling of Computer Systems, pp. 5-18, May 1994.
[24] A. Silberschatz and P.B. Galvin, Operating Systems Concepts, 5th ed., Addison-Wesley, Reading, Mass., 1998.
[25] M.S. Squillante, D.D. Yao, and L. Zhang, “Analysis of Job Arrival Patterns and Parallel Scheduling Performance,” Performance Evaluation, vol. 36-37, pp. 137-163, 1999.
[26] G. Voelker, “Managing Server Load in Global Memory Systems,” Proc. ACM SIGMETRICS Conf. Measuring and Modeling of Computer Systems, May 1997.
[27] L. Xiao, X. Zhang, and S.A. Kubricht, “Improving Memory Performance of Sorting Algorithms,” ACM J. Experimental Algorithmics, vol. 5, pp. 1-23, 2000.
[28] L. Xiao, X. Zhang, and Y. Qu, “Effective Load Sharing on Heterogenous Networks of Workstations,” Proc. 14th Int'l Parallel and Distributed Processing Symp. (IPDPS 2000), May 2000.
[29] X. Zhang, Y. Qu, and L. Xiao, “Improving Distributed Workload Performance by Sharing both CPU and Memory Resources,” Proc. 20th Int'l Conf. Distributed Computing Systems (ICDCS 2000), Apr. 2000.
[30] Z. Zhang and X. Zhang, “Fast Bit-Reversals on Uniprocessors and SMP Multiprocessors,” SIAM J. Scientific Computing, vol. 22, no. 6, pp. 2113-2134, 2001.
[31] Z. Zhang, Z. Zhu, and X. Zhang, Cached DRAM: A Simple and Effective Technique for Memory Access Latency Reduction on ILP Processors IEEE Micro, vol. 21, no. 4, pp. 22-32, July/Aug. 2001.

Index Terms:
cluster computing, distributed systems, load sharing, memory-intensive workloads, and trace-driven simulations
L. Xiao, S. Chen, X. Zhang, "Dynamic Cluster Resource Allocations for Jobs with Known and Unknown Memory Demands," IEEE Transactions on Parallel and Distributed Systems, vol. 13, no. 3, pp. 223-240, March 2002, doi:10.1109/71.993204
Usage of this product signifies your acceptance of the Terms of Use.