San Diego, CA, USA
Sept. 20, 2004 to Sept. 23, 2004
N. Desai , Sandia Nat. Labs., Sandia Corp., Albuquerque, NM, USA
R. Bradshaw , Dept. of Comput. Sci., William & Mary Coll., Williamsburg, VA, USA
E. Lusk , Dept. of Biomed. Informatics, Ohio State Univ., Columbus, OH, USA
R. Butler , Dept. of Biomed. Informatics, Ohio State Univ., Columbus, OH, USA
The complexity and cost of isolating the root cause of system problems in large parallel computers generally scales with the size of the system. Syslog messages provide a primary source of system feedback, but manual review is tedious and error prone. Informatic analysis can be used to detect subtle anomalies in the syslog message stream, thereby increasing the availability of the overall system. In This work the author describes the use of the bioinformatic-inspired Teiresias algorithm to automatically classify syslog messages, and compare it to an existing log analysis tool (SLCT). He then describes the use of occurrence statistics to group time-correlated messages, and present a simple graphical user interface for viewing analysis results. Finally, example analyses of syslogs from three independent clusters are presented.
N. Desai, R. Bradshaw, E. Lusk, R. Butler, "Component-based cluster systems software architecture a case study", CLUSTER, 2004, 2013 IEEE International Conference on Cluster Computing (CLUSTER), 2013 IEEE International Conference on Cluster Computing (CLUSTER) 2004, pp. 319-326, doi:10.1109/CLUSTR.2004.1392629