loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Sixth IEEE International Symposium on Cluster Computing and the Grid Workshops (CCGRIDW'06)
Monitoring High Performance Networks in Large-scale Clusters
Singapore
May 16-May 19
ISBN: 0-7695-2585-7
Fabrice Gadaud, CEA/DIF, France
The number of large-scale clusters is rising. They are included into Grids or become key components o f large structures. As more users and projects rely 071 HPC clusters, high availability and security are requirements for a fast growing adoption and use. I n this paper, ute focus o n high performance networks. All HPC clusters are built o n top of them. We demonstrate that classical instrumentation are ineficient in HPC environment, they do not scale or cause a significant loss of performance. Based 071. this fact, we highlight clusters properties: nodes have assigned roles and are coupled at various levels. Moreover, we study the main characteristics of resource usage for each type of node and propose a n instrumentation that can be effectively deployed. It results in fine-grained mechanisms adapted to system architecture and performance constraints.. Relevant information is collected over time. Two properties are verified online and dynamically: coherency and containment. Each induces a type of verification and both aim at reducin,g recovery time from failure and security risk of a whole cluster. We illustrate our. rnethodology o n QsNet network and provide a way t o increase safety of high performance networks and clusters.
Citation:
Fabrice Gadaud, "Monitoring High Performance Networks in Large-scale Clusters," ccgrid, vol. 2, pp.32, Sixth IEEE International Symposium on Cluster Computing and the Grid Workshops (CCGRIDW'06), 2006
Usage of this product signifies your acceptance of the Terms of Use.