The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.06 - June (2013 vol.39)
pp: 806-821
Marcello Cinque , Universitá degli Studi di Napoli Federico II, Naples
Domenico Cotroneo , Universitá degli Studi di Napoli Federico II, Naples
Antonio Pecchia , Universitá degli Studi di Napoli Federico II, Naples
ABSTRACT
Event logs have been widely used over the last three decades to analyze the failure behavior of a variety of systems. Nevertheless, the implementation of the logging mechanism lacks a systematic approach and collected logs are often inaccurate at reporting software failures: This is a threat to the validity of log-based failure analysis. This paper analyzes the limitations of current logging mechanisms and proposes a rule-based approach to make logs effective to analyze software failures. The approach leverages artifacts produced at system design time and puts forth a set of rules to formalize the placement of the logging instructions within the source code. The validity of the approach, with respect to traditional logging mechanisms, is shown by means of around 12,500 software fault injection experiments into real-world systems.
INDEX TERMS
Unified modeling language, Failure analysis, Analytical models, Systematics, Proposals, Software systems, software failures, Event log, logging mechanism, rule-based logging, error detection
CITATION
Marcello Cinque, Domenico Cotroneo, Antonio Pecchia, "Event Logs for the Analysis of Software Failures: A Rule-Based Approach", IEEE Transactions on Software Engineering, vol.39, no. 6, pp. 806-821, June 2013, doi:10.1109/TSE.2012.67
REFERENCES
[1] H. Barringer, A. Groce, K. Havelund, and M. Smith, "Formal Analysis of Log Files," J. Aerospace Computing, Information and Comm., vol. 7, no. 11, pp. 365-390, 2010.
[2] A. Bauer, M. Leucker, and C. Schallhart, "Runtime Reflection: Dynamic Model-Based Analysis of Component-Based Distributed Embedded Systems," Modellierung von Automotive Systems, 2006.
[3] R. Obermaisser, H. Kopetz, C. El Salloum, and B. Huber, "Error Containment in the Time-Triggered System-on-a-Chip Architecture," Proc. Int'l Embedded Systems Symp., June 2007.
[4] A.J. Oliner and J. Stearley, "What Supercomputers Say: A Study of Five System Logs," Proc. Int'l Conf. Dependable Systems and Networks, pp. 575-584.
[5] R.K. Iyer, Z. Kalbarczyk, and M. Kalyanakrishnan, "Measurement-Based Analysis of Networked System Availability," Performance Evaluation: Origins and Directions, pp. 161-199, Springer, 2000.
[6] M. Kalyanakrishnam, Z. Kalbarczyk, and R.K. Iyer, "Failure Data Analysis of a LAN of Windows NT Based Computers," Proc. Int'l Symp. Reliable Distributed Systems, pp. 178-187, 1999.
[7] B. Murphy and B. Levidow, "Windows 2000 Dependability," Technical Report MSR-TR-2000-56, 2000.
[8] C. Simache and M. Kaâniche, "Availability Assessment of SunOS/Solaris Unix Systems Based on Syslogd and Wtmpx Log Files: A Case Study," Proc. Pacific Rim Int'l Symp. Dependable Computing, pp. 49-56.
[9] Y. Liang, Y. Zhang, A. Sivasubramaniam, M. Jette, and R.K. Sahoo, "Bluegene/L Failure Analysis and Prediction Models," Proc. Int'l Conf. Dependable Systems and Networks, pp. 425-434, 2006.
[10] A. Pecchia, D. Cotroneo, Z. Kalbarczyk, and R.K. Iyer, "Improving Log-Based Field Failure Data Analysis of Multi-Node Computing Systems," Proc. Int'l Conf. Dependable Systems and Networks, pp. 97-108, 2011.
[11] D.L. Oppenheimer, A. Ganapathi, and D.A. Patterson, "Why Do Internet Services Fail, and What Can Be Done about It?" Proc. USENIX Symp. Internet Technologies and Systems, 2003.
[12] B. Schroeder and G.A. Gibson, "A Large-Scale Study of Failures in High-Performance Computing Systems," Proc. Int'l Conf. Dependable Systems and Networks, pp. 249-258, 2006.
[13] A. Avižienis, J.C. Laprie, B. Randell, and C. Landwehr, "Basic Concepts and Taxonomy of Dependable and Secure Computing," IEEE Trans. Dependable and Secure Computing, vol. 1, no. 1, pp. 11-33, Jan.-Mar. 2004.
[14] M.F. Buckley and D.P. Siewiorek, "VAX/VMS Event Monitoring and Analysis," Proc. Int'l Symp. Fault-Tolerant Computing, pp. 414-423, 1995.
[15] J.P. Hansen and D.P. Siewiorek, "Models for Time Coalescence in Event Logs," Proc. Int'l Symp. Fault-Tolerant Computing, pp. 221-227, 1992.
[16] C. Simache and M. Kaâniche, "Measurements-Based Availability Analysis of Unix Systems in a Distributed Environment," Proc. Int'l Symp. Software Reliability Eng., 2001.
[17] R.K. Sahoo, A.J. Oliner, I. Rish, M. Gupta, J.E. Moreira, S. Ma, R. Vilalta, and A. Sivasubramaniam, "Critical Event Prediction for Proactive Management in Large-Scale Computer Clusters," Proc. Int'l Conf. Knowledge Discovery and Data Mining, pp. 426-435, 2003.
[18] W. Xu, L. Huang, A. Fox, D. Patterson, and M.I. Jordan, "Detecting Large-Scale System Problems by Mining Console Logs," Proc. Symp. Operating Systems Principles, 2009.
[19] D.P. Siewiorek, R. Chillarege, and Z.T. Kalbarczyk, "Reflections on Industry Trends and Experimental Research in Dependability," IEEE Trans. Dependable and Secure Computing, vol. 1, no. 2, pp. 109-127, Apr.-June 2004.
[20] D. Cotroneo, S. Orlando, and S. Russo, "Failure Classification and Analysis of the Java Virtual Machine," Proc. Int'l Conf. Distributed Computing Systems, 2006.
[21] J. Xu, Z. Kalbarczyk, and R.K. Iyer, "Networked Windows NT System Field Failure Data Analysis," Proc. Pacific Rim Int'l Symp. Dependable Computing, 1999.
[22] M. Cinque, D. Cotroneo, R. Natella, and A. Pecchia, "Assessing and Improving the Effectiveness of Logs for the Analysis of Software Faults," Proc. Int'l Conf. Dependable Systems and Networks, pp. 457-466, 2010.
[23] C. Lonvick, "The BSD Syslog Protocol," Request for Comments 3164, The Internet Soc., Network Working Group, RFC3164, 2001.
[24] J.D. Murray, Windows NT Event Logging. O'Reilly, 1998.
[25] D. Tang, M. Hecht, J. Miller, and J. Handal, "Meadep: A Dependability Evaluation Tool for Engineers," IEEE Trans. Reliability, vol. 47, no. 4, pp. 443-450, Dec. 1998.
[26] A. Thakur and R.K. Iyer, "Analyze-Now—An Environment for Collection and Analysis of Failures in a Networked of Workstations," IEEE Trans. Reliability, vol. 45, no. 4, pp. 561-570, Dec. 1996.
[27] R. Vaarandi, "SEC—A Lightweight Event Correlation Tool," Proc. Workshop IP Operations and Management, 2002.
[28] J.P. Rouillard, "Real-Time Log File Analysis Using the Simple Event Correlator (SEC)," Proc. USENIX Systems Administration Conf., 2004.
[29] R.K. Iyer, L.T. Young, and V. Sridhar, "Recognition of Error Symptoms in Large Systems," Proc. ACM Fall Joint Computer Conf., pp. 797-806, 1986.
[30] M.F. Buckley and D.P. Siewiorek, "A Comparative Analysis of Event Tupling Schemes," Proc. Int'l Symp. Fault-Tolerant Computing, pp. 294-303, 1996.
[31] L.M. Silva, "Comparing Error Detection Techniques for Web Applications: An Experimental Study," Proc. Int'l Symp. Network Computing and Applications, 2008.
[32] IBM, "Common Event Infrastructure," http://www-01.ibm.com/software/tivoli/features cei, 2012.
[33] Apache log4j, http://logging.apache.orglog4j/, 2012.
[34] T. Elrad, R.E. Filman, and A. Bader, "Aspect-Oriented Programming: Introduction," Comm. ACM, vol. 44, pp. 29-32, Oct. 2001.
[35] J. Viega and J. Vuas, "Can Aspect-Oriented Programming Lead to More Reliable Software?" IEEE Software, vol. 17, no. 6, pp. 19-21, Nov./Dec. 2000.
[36] F. Salfner, S. Tschirpke, and M. Malek, "Comprehensive Logfiles for Autonomic Systems," Proc. IEEE Parallel and Distributed Processing Symp., 2004.
[37] D. Yuan, J. Zheng, S. Park, Y. Zhou, and S. Savage, "Improving Software Diagnosability via Log Enhancement," Proc. Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 3-14, 2011.
[38] A. Rabkin, W. Xu, A. Wildani, A. Fox, D. Patterson, and R. Katz, "A Graphical Representation for Identifier Structure in Logs," Proc. Workshop Managing Systems via Log Analysis and Machine Learning Techniques, 2010.
[39] M.Y. Chen, E. Kiciman, E. Fratkin, A. Fox, and E. Brewer, "Pinpoint: Problem Determination in Large, Dynamic Internet Services," Proc. Int'l Conf. Dependable Systems and Networks, pp. 595-604, 2002.
[40] F.M. David, J.C. Carlyle, and R.H. Campbell, "Exploring Recovery from Operating System Lockups," Proc. USENIX Ann. Technical Conf., pp. 1-6, 2007.
[41] L. Wang, Z. Kalbarczyk, W. Gu, and R.K. Iyer, "Reliability Microkernel: Providing Application-Aware Reliability in the OS," IEEE Trans. Reliability, vol. 56, no. 4, pp. 597-614, Dec. 2007.
[42] H. Hecht, "Fault-Tolerant Software for Real-Time Applications," ACM Computing Surveys, vol. 8, no. 4, pp. 391-407, Dec. 1976.
[43] M. Hiller, "Executable Assertions for Detecting Data Errors in Embedded Control Systems," Proc. Int'l Conf. Dependable Systems and Networks, 2000.
[44] B. Cantrill, M.W. Shapiro, and A.H. Leventhal, "Dynamic Instrumentation of Production Systems," Proc. USENIX Ann. Technical Conf., 2004.
[45] A. Tamches and B.P. Miller, "Fine-Grained Dynamic Instrumentation of Commodity Operating System Kernels," Proc. Third Symp. Operating Systems Design and Implementation, 1999.
[46] SLOCCount, http://www.dwheeler.comsloccount/, 2012.
[47] W. Stallings, Operating Systems, Internals and Design Principles, sixth ed. Prentice Hall, 2008.
[48] J.A. Duraes and H.S. Madeira, "Emulation of Software Faults: A Field Data Study and a Practical Approach," IEEE Trans. Software Eng., vol. 32, no. 11, pp. 849-867, Nov. 2006.
[49] R. Natella, D. Cotroneo, J.A. Duraes, and H.S. Madeira, "On Fault Representativeness of Software Fault Injection," IEEE Trans. Software Eng., vol. 39, no 1, pp. 80-96, Jan. 2013.
[50] C. Lim, N. Singh, and S. Yajnik, "A Log Mining Approach to Failure Analysis of Enterprise Telephony Systems," Proc. Int'l Conf. Dependable Systems and Networks, 2008.
[51] J. Stearley and A.J. Oliner, "Bad Words: Finding Faults in Spirit's Syslogs," Proc. Int'l Symp. Cluster Computing and the Grid, pp. 765-770, 2008.
[52] M. Alizadeh, A. Greenberg, D.A. Maltz, J. Padhye, P. Patel, B. Prabhakar, S. Sengupta, and M. Sridharan, "Data Center TCP (DCTCP)," Proc. SIGCOMM Computer Comm. Rev., pp. 63-74, 2010.
17 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool