This Article 
 Bibliographic References 
 Add to: 
Sharc: Managing CPU and Network Bandwidth in Shared Clusters
January 2004 (vol. 15 no. 1)
pp. 2-17

Abstract—In this paper, we argue the need for effective resource management mechanisms for sharing resources in commodity clusters. To address this issue, we present the design of Sharc—a system that enables resource sharing among applications in such clusters. Sharc depends on single node resource management mechanisms such as reservations or shares, and extends the benefits of such mechanisms to clustered environments. We present techniques for managing two important resources—CPU and network interface bandwidth—on a cluster-wide basis. Our techniques allow Sharc to 1) support reservation of CPU and network interface bandwidth for distributed applications, 2) dynamically allocate resources based on past usage, and 3) provide performance isolation to applications. Our experimental evaluation has shown that Sharc can scale to 256 node clusters running 100,000 applications. These results demonstrate that Sharc can be an effective approach for sharing resources among competing applications in moderate size clusters.

[1] K. Appleby, S. Fakhouri, L. Fong, G. Goldszmidt, M. Kalantar, S. Krishnakumar, D.P. Pazel, J. Pershing, and B. Rochwerger, Oceano SLA Based Management of a Computing Utility IBM Research, 2001.
[2] M. Aron, P. Druschel, and W. Zwaenepoel, Cluster Reserves: A Mechanism for Resource Management in Cluster-Based Network Servers Proc. ACM SIGMETRICS Conf., June 2000.
[3] A. Arpaci-Dusseau and D.E. Culler, Extending Proportional-Share Scheduling to a Network of Workstations Proc. Conf. Parallel and Distributed Processing Techniques and Applications, June 1997.
[4] G. Banga, P. Druschel, and J. Mogul, Resource Containers: A New Facility for Resource Management in Server Systems Proc. Third Symp. Operating System Design and Implementation, pp. 45-58, Feb. 1999.
[5] J. Blanquer, J. Bruno, M. McShea, B. Ozden, A. Silberschatz, and A. Singh, Resource Management for QoS in Eclipse/BSD Proc. Free BSD Conf., Oct. 1999.
[6] J. Chase, D. Anderson, P. Thakar, A. Vahdat, and R. Doyle, Managing Energy and Server Resources in Hosting Centers Proc. 18th ACM Symp. Operating Systems Principles, pp. 103-116, Oct. 2001.
[7] Corba Documentation,http:/, 2003.
[8] Distributed Computing Environment Documentation,http:/, 2003.
[9] A. Fox, S.D. Gribble, Y. Chawathe, E.A. Brewer, and P. Gauthier, Cluster-Based Scalable Network Services Proc. 16th ACM Symp. Operating Systems Principles, pp. 78-91, Dec. 1997.
[10] M.R. Garey and D.S. Johnson, Computer and Intractability: A Guide to the Theory of NP-Completeness. 2000.
[11] K. Govil, D. Teodosiu, Y. Huang, and M. Rosenblum, Cellular Disco: Resource Management Using Virtual Clusters on Shared-Memory Multiprocessors Proc. ACM Symp. Operating Systems Principles, pp. 154-169, Dec. 1999.
[12] P. Goyal, H.M. Vin, and H. Cheng, Start-Time Fair Queuing: A Scheduling Algorithm for Integrated Services Packet Switching Networks Proc. ACM SIGCOMM, Aug. 1996.
[13] P. Goyal, S.S. Lam, and H.M. Vin, Determining End-to-End Delay Bounds In Heterogeneous Networks ACM/Springer-Verlag Multimedia Systems J., vol. 5, no. 3, pp. 157-163, May 1997.
[14] S.D. Gribble, E.A. Brewer, J.M. Hellerstein, and D. Culler, Scalable, Distributed Data Structures for Internet Service Construction Proc. Fourth Symp. Operating System Design and Implementation, pp. 319-332, Oct. 2000.
[15] A. Hori, H. Tezuka, Y. Ishikawa, N. Soda, H. Konaka, and M. Maeda, Implementation of Gang Scheduling on a Workstation Cluster Proc. Workshop Job Scheduling Strategies for Parallel Processing, pp. 27-40, 1996.
[16] M.B. Jones, D. Rosu, and M. Rosu, CPU Reservations and Time Constraints: Efficient, Predictable Scheduling of Independent Activities Proc. 16th ACM Symp. Operating Systems Principles, pp. 198-211, Dec. 1997.
[17] I. Leslie, D. McAuley, R. Black, T. Roscoe, P. Barham, D. Evers, R. Fairbairns, and E. Hyden, “The Design and Implementation of an Operating System to Support Distributed Multimedia Applications,” IEEE J. Selected Areas in Comm., June 1997.
[18] C. Lin, H. Chu, and K. Nahrstedt, A Soft Real-Time Scheduling Server on the Windows NT Proc. Second USENIX Windows NT Symp., Aug. 1998.
[19] M. Litzkow, M. Livny, and M.W. Mutka, “Condor—A Hunter of Idle Workstations,” Proc. Eighth Int'l Conf. Distributed Computing Systems, Jun. 1988.
[20] J. Moore, D. Irwin, L. Grit, S. Sprenkle, and J. Chase, Managing Mixed-Use Clusters with Cluster-on-Demand Cluster-on-Demand Draft, Internet Systems and Storage Group, Duke Univ., 2002.
[21] QLinux Software Distribution, wareqlinux , 1999.
[22] S. Ranjan, J. Rolia, H. Fu, and E. Knightly, “QoS-Driven Server Migration for Internet Data Centers,” Proc. IEEE/IFIP Int'l Workshop Quality-of-Service 2002, May 2002.
[23] REACT: IRIX Real-Time Extensions, Silicon Graphics, Inc.,, 1999.
[24] J. Reumann, A. Mehra, K. Shin, and D. Kandlur, Virtual Services: A New Abstraction for Server Consolidation Proc. USENIX Ann. Technical Conf., June 2000.
[25] T. Roscoe and B. Lyles, Distributing Computing without DPEs: Design Considerations for Public Computing Platforms Proc. Ninth ACM SIGOPS European Workshop, Sept. 2000.
[26] Y. Saito, B. Bershad, and H. Levy, Manageability, Availability and Performance in Porcupine: A Highly Available, Scalable Cluster-Based Mail Service Proc. 17th Symp. Operating Systems Principles, pp. 1-15, Dec. 1999.
[27] P. Shenoy and H. Vin, Cello: A Disk Scheduling Framework for Next Generation Operating Systems Proc. ACM SIGMETRICS Conf, pp. 44-55, June 1998.
[28] Solaris Resource Manager 1.0: Controlling System Resources Effectively, Sun Microsystems, Inc., ware/white-papers wp-srm/, 1998.
[29] B. Urgaonkar and P. Shenoy, Sharc: Managing CPU and Network Bandwidth in Shared Clusters Technical Report TR01-08, Dept. of Computer Science, Univ. of Mass., Oct. 2001.
[30] B. Urgaonkar, P. Shenoy, and T. Roscoe, Resource Overbooking and Application Profiling in Shared Hosting Platforms Proc. Fifth Symp. Operating Systems Design and Implementation, Dec. 2002.
[31] B. Urgaonkar, P. Shenoy, and A. Rosenberg, Application Placement on a Cluster of Servers Dept. of Computer Science, Univ. of Mass., 2003.
[32] B. Verghese, A. Gupta, and M. Rosenblum, Performance Isolation: Sharing and Isolation in Shared-Memory Multiprocessors Proc. ASPLOS-VIII, pp. 181-192, Oct. 1998.
[33] C.A. Waldspurger, Memory Resource Management in VMWare ESX Server Proc. Fifth Symp. Operating Systems Design and Implementation, Dec. 2002.
[34] T. Zhao and V. Karmacheti, Enforcing Resource Sharing Agreements among Distributed Server Clusters Proc. 16th Int'l Parallel and Distributed Processing Symp., April 2002.

Index Terms:
Shared clusters, dedicated clusters, Sharc, capsule, nucleus, control plane, CPU and network bandwidth, Linux, hosting platforms.
Bhuvan Urgaonkar, Prashant Shenoy, "Sharc: Managing CPU and Network Bandwidth in Shared Clusters," IEEE Transactions on Parallel and Distributed Systems, vol. 15, no. 1, pp. 2-17, Jan. 2004, doi:10.1109/TPDS.2004.1264781
Usage of this product signifies your acceptance of the Terms of Use.