Issue No. 03 - March (2002 vol. 13)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/71.993204
<p>The cluster system we consider for load sharing is a compute farm which is a pool of networked server nodes providing high-performance computing for CPU-intensive, memory-intensive, and I/O active jobs in a batch mode. Existing resource management systems mainly target at balancing the usage of CPU loads among server nodes. With the rapid advancement of CPU chips, memory and disk access speed improvements significantly lag behind advancement of CPU speed, increasing the penalty for data movement, such as page faults and I/O operations, relative to normal CPU operations. Aiming at reducing the memory resource contention caused by page faults and I/O activities, we have developed and examined load sharing policies by considering effective usage of global memory in addition to CPU load balancing in clusters. We study two types of application workloads: 1) Memory demands are known in advance or are predictable and 2) memory demands are unknown and dynamically changed during execution. Besides using workload traces with known memory demands, we have also made kernel instrumentation to collect different types of workload execution traces to capture dynamic memory access patterns. Conducting different groups of trace-driven simulations, we show that our proposed policies can effectively improve overall job execution performance by well utilizing both CPU and memory resources with known and unknown memory demands.</p>
cluster computing, distributed systems, load sharing, memory-intensive workloads, and trace-driven simulations
X. Zhang, S. Chen and L. Xiao, "Dynamic Cluster Resource Allocations for Jobs with Known and Unknown Memory Demands," in IEEE Transactions on Parallel & Distributed Systems, vol. 13, no. , pp. 223-240, 2002.