loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07)
What Supercomputers Say: A Study of Five System Logs
Edinburgh, UK
June 25-June 28
ISBN: 0-7695-2855-4
Adam Oliner, Stanford University, USA
Jon Stearley, Sandia National Laboratories, USA
If we hope to automatically detect and diagnose failures in large-scale computer systems, we must study real deployed systems and the data they generate. Progress has been hampered by the inaccessibility of empirical data. This paper addresses that dearth by examining system logs from five supercomputers, with the aim of providing useful insight and direction for future research into the use of such logs. We present details about the systems, methods of log collection, and how alerts were identified; propose a simpler and more effective filtering algorithm; and define operational context to encompass the crucial information that we found to be currently missing from most logs. The machines we consider (and the number of processors) are: Blue Gene/L (131072), Red Storm (10880), Thunderbird (9024), Spirit (1028), and Liberty (512). This is the first study of raw system logs from multiple supercomputers.
Citation:
Adam Oliner, Jon Stearley, "What Supercomputers Say: A Study of Five System Logs," dsn, pp.575-584, 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07), 2007
Usage of this product signifies your acceptance of the Terms of Use.