loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
12th International Conference on Parallel and Distributed Systems - Volume 1 (ICPADS'06)
Job Centric Cluster Monitoring
Minneapolis, Minnesota
July 12-July 15
ISBN: 0-7695-2612-8
Roger Curry, University of Calgary, Canada
Rob Simmonds, University of Calgary, Canada
This paper describes a system for monitoring jobs on large computational clusters. The aim is to extract information that is most useful for understanding the complete life-cycle of a job, combining and organising data from multiple sources. Information is taken from the batch scheduler and from collectors running on each node. These collect information about processes associated with the jobs as well as general operating system and device statistics.

Heuristics are applied to extract information that could help a client tune job submission strategy, to provide better throughput on this cluster and to determine how effectively the provisioned resources are being utilised. Data is stored for post-mortem analysis and data-mining by other tools. Ways of utilising this service in a grid computing environment are discussed.

Index Terms:
Job Monitoring, High Performance Computing, Cluster Computing, Grid Monitoring.
Citation:
Roger Curry, Rob Simmonds, "Job Centric Cluster Monitoring," icpads, vol. 1, pp.571-578, 12th International Conference on Parallel and Distributed Systems - Volume 1 (ICPADS'06), 2006
Usage of this product signifies your acceptance of the Terms of Use.