The Community for Technology Leaders
Parallel and Distributed Processing Symposium, International (2003)
Nice, France
Apr. 22, 2003 to Apr. 26, 2003
ISSN: 1530-2075
ISBN: 0-7695-1926-1
pp: 267b
G. Almasi , IBM T.J. Watson Research Center
L. Bachega , IBM T.J. Watson Research Center
R. Bellofatto , IBM T.J. Watson Research Center
J. Brunheroto , IBM T.J. Watson Research Center
C. Caşcaval , IBM T.J. Watson Research Center
J. Castaños , IBM T.J. Watson Research Center
P. Crumley , IBM T.J. Watson Research Center
C. Erway , IBM T.J. Watson Research Center
J. Gagliano , IBM T.J. Watson Research Center
D. Lieber , IBM T.J. Watson Research Center
P. Mindlin , IBM T.J. Watson Research Center
J.E. Moreira , IBM T.J. Watson Research Center
R.K. Sahoo , IBM T.J. Watson Research Center
A. Sanomiya , IBM T.J. Watson Research Center
E. Schenfeld , IBM T.J. Watson Research Center
R. Swetz , IBM T.J. Watson Research Center
M. Bae , IBM Unix Development Lab
G. Laib , IBM Unix Development Lab
K. Ranganathan , IBM Unix Development Lab
Y. Aridor , IBM Haifa Research Lab
T. Domany , IBM Haifa Research Lab
Y. Gal , IBM Haifa Research Lab
O. Goldshmidt , IBM Haifa Research Lab
E. Shmueli , IBM Haifa Research Lab
ABSTRACT
With 65,536 compute nodes, the BlueGene/L supercomputer represents a new level of scalability for parallel systems. In this paper, we discuss system management and control for BlueGene/L, including machine booting, software installation, user account management, system monitoring, and job execution. We address the issue of scalability by organizing the system hierarchically. The 65,536 compute nodes are organized in 1,024 clusters of 64 compute nodes each, called processing sets. Each processing set is under control of a 65 th node, called an I/O node. The 1,024 processing sets can then be managed to a great extent as a regular Linux cluster. Regular cluster management is complemented by BlueGene/L specific services, performed by a service node over a separate control network.
INDEX TERMS
null
CITATION

J. Moreira et al., "System Management in the BlueGene/L Supercomputer," Parallel and Distributed Processing Symposium, International(IPDPS), Nice, France, 2003, pp. 267b.
doi:10.1109/IPDPS.2003.1213483
198 ms
(Ver 3.3 (11022016))