This Article 
 Bibliographic References 
 Add to: 
STORM: Scalable Resource Management for Large-Scale Parallel Computers
December 2006 (vol. 55 no. 12)
pp. 1572-1587
Although clusters are a popular form of high-performance computing, they remain more difficult to manage than sequential systems—or even symmetric multiprocessors. In this paper, we identify a small set of primitive mechanisms that are sufficiently general to be used as building blocks to solve a variety of resource-management problems. We then present STORM, a resource-management environment that embodies these mechanisms in a scalable, low-overhead, and efficient implementation. The key innovation behind STORM is a modular software architecture that reduces all resource management functionality to a small number of highly scalable mechanisms. These mechanisms simplify the integration of resource management with low-level network features. As a result of this design, STORM can launch large, parallel applications an order of magnitude faster than the best time reported in the literature and can gang-schedule a parallel application as fast as the node OS can schedule a sequential application. This paper describes the mechanisms and algorithms behind STORM and presents a detailed performance model that shows that STORM's performance can scale to thousands of nodes.

[1] A.C. Arpaci-Dusseau, “Implicit Coscheduling: Coordinated Scheduling with Implicit Information in Distributed Systems,” ACM Trans. Computer Systems, vol. 19, no. 3, pp. 283-331, Aug. 2001.
[2] N.J. Boden, D. Cohen, R.E. Felderman, A.E. Kulawick, C.L. Seitz, J.N. Seizovic, and W.-K. Su, “Myrinet: A Gigabit-per-Second Local Area Network,” IEEE Micro, vol. 15, no. 1, pp. 29-36, Feb. 1995.
[3] R. Brightwell and L.A. Fisk, “Scalable Parallel Application Launch on Cplant,” Proc. IEEE/ACM Conf. Supercomputing (SC '01), Nov. 2001.
[4] Compaq High Performance Technical Computing Group, “U.S. DOE Selects Compaq to Build ASCI Q,” HPTC News, vol. 17, Sept./Oct. 2000.
[5] D.G. Feitelson, “Packing Schemes for Gang Scheduling,” Proc. Int'l Parallel Processing Symp. (IPPS '96), Second Workshop Job Scheduling Strategies for Parallel Processing, D.G.Feitelson and L. Rudolph, eds., pp. 89-110, Apr. 1996.
[6] D.G. Feitelson, A. Batat, G. Benhanokh, D. Er-El, Y. Etsion, A. Kavas, T. Klainer, U. Lublin, and M. Volovic, “The ParPar System: A Software MPP,” High Performance Cluster Computing, R. Buyya, ed., vol. 1: Architectures and Systems, pp. 758-774, 1999.
[7] D.G. Feitelson and M.A. Jette, “Improved Utilization and Responsiveness with Gang Scheduling,” Proc. Int'l Parallel Processing Symp. (IPPS '97), Third Workshop Job Scheduling Strategies for Parallel Processing, D.G. Feitelson and L. Rudolph, eds., pp. 238-261, Apr. 1997.
[8] D.G. Feitelson and L. Rudolph, “Gang Scheduling Performance Benefits for Fine-Grain Synchronization,” J. Parallel and Distributed Computing, vol. 16, no. 4, pp. 306-318, Dec. 1992.
[9] E. Frachtenberg, D.G. Feitelson, J. Fernandez-Peinador, and F. Petrini, “Parallel Job Scheduling under Dynamic Workloads,” Proc. Ninth Workshop Job Scheduling Strategies for Parallel Processing, D.G.Feitelson, L. Rudolph, and U. Schwiegelshohn, eds., pp. 208-227, Springer-Verlag, 2003.
[10] E. Frachtenberg, D.G. Feitelson, F. Petrini, and J. Fernandez, “Adaptive Parallel Job Scheduling with Flexible Coscheduling,” IEEE Trans. Parallel and Distributed Systems, vol. 16, no. 11, pp.1066-1077, Nov. 2005.
[11] E. Frachtenberg, F. Petrini, S. Coll, and W. Feng, “Gang Scheduling with Lightweight User-Level Communication,” Proc. Int'l Conf. Parallel Processing (ICPP '01), Workshop Scheduling and Resource Management for Cluster Computing, Sept. 2001.
[12] E. Frachtenberg, F. Petrini, J. Fernandez, S. Pakin, and S. Coll, “STORM: Lightning-Fast Resource Management,” Proc. IEEE/ACM Conf. Supercomputing (SC '02), Nov. 2002.
[13] H. Franke, J. Jann, J.E. Moreira, P. Pattnaik, and M.A. Jette, “An Evaluation of Parallel Job Scheduling for ASCI Blue-Pacific,” Proc. IEEE/ACM Conf. Supercomputing (SC '99), Nov. 1999.
[14] D.P. Ghormley, D. Petrou, S.H. Rodrigues, A.M. Vahdat, and T.E. Anderson, “GLUnix: A Global Layer Unix for a Network of Workstations,” Software—Practice and Experience, vol. 28, no. 9, pp.929-961, July 1998.
[15] E. Hendriks, “BProc: The Beowulf Distributed Process Space,” Proc. ACM Int'l Conf. Supercomputing (ICS '02), June 2002.
[16] A. Hori, H. Tezuka, and Y. Ishikawa, “Highly Efficient Gang Scheduling Implementation,” Proc. IEEE/ACM Conf. Supercomputing (SC '98), Nov. 1998.
[17] A. Kavas, D. Er-El, and D.G. Feitelson, “Using Multicast to Pre-Load Jobs on the ParPar Cluster,” Parallel Computing, vol. 27, no. 3, pp. 315-327, Feb. 2001.
[18] K.R. Koch, R.S. Baker, and R.E. Alcouffe, “Solution of the First-Order Form of the 3-D Discrete Ordinates Equation on a Massively Parallel Processor,” Trans. Am. Nuclear Soc., vol. 65, no. 108, pp. 198-199, 1992.
[19] L. Lamport, “How to Make a Multiprocessor Computer that Correctly Executes Multiprocess Programs,” IEEE Trans. Computers, vol. 28, no. 9, pp. 690-691, Sept. 1979.
[20] C.E. Leiserson, Z.S. Abuhamdeh, D.C. Douglas, C.R. Feynman, M.N. Ganmukhi, J.V. Hill, W.D. Hillis, B.C. Kuszmaul, M.A. St.Pierre, D.S. Wells, M.C. Wong-Chan, S.-W. Yang, and R. Zak, “The Network Architecture of the Connection Machine CM-5,” J.Parallel and Distributed Computing, vol. 33, no. 2, pp. 145-158, Mar. 1996.
[21] J.E. Moreira, H. Franke, W. Chan, L.L. Fong, M.A. Jette, and A.B. Yoo, “A Gang-Scheduling System for ASCI Blue-Pacific,” Proc. High-Performance Computing and Networking Conf. in Europe (HPCN Europe), pp. 831-840, Apr. 1999.
[22] S. Nagar, A. Banerjee, A. Sivasubramaniam, and C.R. Das, “A Closer Look at Coscheduling Approaches for a Network of Workstations,” Proc. ACM Symp. Parallel Algorithms and Architectures (SPAA '99), June 1999.
[23] F. Petrini, S. Coll, E. Frachtenberg, and A. Hoisie, “Hardware- and Software-Based Collective Communication on the Quadrics Network,” Proc. Int'l Symp. Network Computing and Applications (NCA '01), Oct. 2001.
[24] F. Petrini and W. Feng, “Improved Resource Utilization with Buffered Coscheduling,” J. Parallel Algorithms and Applications, vol. 16, pp. 123-144, 2001.
[25] F. Petrini, W. Feng, A. Hoisie, S. Coll, and E. Frachtenberg, “The Quadrics Network: High-Performance Clustering Technology,” IEEE Micro, vol. 22, no. 1, pp. 46-57, Jan./Feb. 2002.
[26] F. Petrini, J. Fernández, E. Frachtenberg, and S. Coll, “Scalable Collective Communication on the ASCI Q Machine,” Proc. Symp. High Performance Interconnects (HotI '03), Aug. 2003.
[27] F. Petrini, D. Kerbyson, and S. Pakin, “The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8,192 Processors of ASCI Q,” Proc. IEEE/ACM Conf. Supercomputing (SC '03), Nov. 2003.
[28] Quadrics Supercomputers World Ltd, Elan Reference Manual, first ed., Jan. 1999.
[29] R. Riesen, R. Brightwell, L.A. Fisk, T. Hudson, J. Otto, and A.B. Maccabe, “Cplant,” Proc. USENIX Ann. Technical Conf., Second Extreme Linux Workshop, June 1999.
[30] S. Shepler, B. Callaghan, D. Robinson, R. Thurlow, C. Beame, M. Eisler, and D. Noveck NFS Version 4 Protocol, RFC 3010, Internet Eng. Task Force, Network Working Group, Dec. 2000.
[31] M. Snir, S. Otto, S. Huss-Lederman, D. Walker, and J. Dongarra, MPI: The Complete Reference, vol. 1, The MPI Core, second ed. The MIT Press, Sept. 1998.
[32] J.H. Straathof, A.K. Thareja, and A.K. Agrawala, “UNIX Scheduling for Large Systems,” Proc. USENIX 1986 Winter Conf., Jan. 1986.
[33] H. Tezuka, A. Hori, Y. Ishikawa, and M. Sato, “PM: An Operating System Coordinated High Performance Communication Library,” Proc. High-Performance Computing and Networking: Int'l Conf. and Exhibition (HPCN Europe), B. Hertzberger and P.M.A. Sloot, eds., pp. 708-717, Apr. 1997.
[34] Top 500 supercomputers, http:/, 2006.

Index Terms:
Hardware/software interface, system architectures, integration, and modeling, network operating systems, supercomputers.
Eitan Frachtenberg, Fabrizio Petrini, Juan Fern?ndez, Scott Pakin, "STORM: Scalable Resource Management for Large-Scale Parallel Computers," IEEE Transactions on Computers, vol. 55, no. 12, pp. 1572-1587, Dec. 2006, doi:10.1109/TC.2006.206
Usage of this product signifies your acceptance of the Terms of Use.