This Article 
 Bibliographic References 
 Add to: 
Diagnosing Rediscovered Software Problems Using Symptoms
February 2000 (vol. 26 no. 2)
pp. 113-127

Abstract—This paper presents an approach to automatically diagnosing rediscovered software failures using symptoms, in environments in which many users run the same procedural software system. The approach is based on the observation that the great majority of field software failures are rediscoveries of previously reported problems and that failures caused by the same defect often share common symptoms. Based on actual data, the paper develops a small software failure fingerprint, which consists of the procedure call trace, problem detection location, and the identification of the executing software. The paper demonstrates that over 60 percent of rediscoveries can be automatically diagnosed based on fingerprints; less than 10 percent of defects are misdiagnosed. The paper also discusses a pilot that implements the approach. Using the approach not only saves service resources by eliminating repeated data collection for and diagnosis of reoccurring problems, but it can also improve service response time for rediscoveries.

[1] E.N. Adams, “Optimizing Preventive Service of Software Products,” IBM J. Research and Development, vol. 28, no. 1, pp. 2–14, Jan. 1984.
[2] I. Lee and R.K. Iyer, “Software Dependability in the Tandem GUARDIAN System,” IEEE Trans. Software Eng., vol. 21, no. 5, pp. 455–467, May 1995.
[3] R. Chillarege, S. Biyani, and J. Rosenthal, "Measurements of Failure Rate in Commercial Software," Proc. 25th Symp. Fault Tolerant Computing, June 1995.
[4] P.R. Cobb, C.J. Lennon, and K.J. Long, System and Method for Software Early Error Detection and Data Capture, U.S. Patent no. 5119377, 1992.
[5] T.A. Thayer, M. Lipow, and E.C. Nelson, Software Reliability. New York: Elsevier North-Holland, 1978.
[6] V.R. Basili and B.T. Perricone,“Software errors and complexity: An empirical investigation,” Comm. ACM, vol. 27, no. 1, pp. 42-52, Jan. 1984.
[7] J.D. Musa,A. Iannino,, and K. Okumoto,Software Reliability: Measurement, Prediction and Application.New York: McGraw-Hill, 1987.
[8] J. Gray, "A Census of Tandem System Availability Between 1985 and 1990," IEEE Trans. Reliability, vol. 39, no. 4, pp. 409-418, Oct. 1990.
[9] R. Chillarege et al., "Orthogonal Defect Classification: A Concept for In-Process Measurements," IEEE Trans. Software Eng., Vol. 18, No. 11, Nov. 1992, pp. 943-956.
[10] I. Lee and R.K. Iyer, “Faults, Symptoms, and Software Fault Tolerance in Tandem GUARDIAN90 Operating System,” Proc. 23rd IEEE Int'l Symp. Fault-Tolerant Computing (FTCS23), pp. 20-29, Toulouse, France 1993.
[11] I. Lee and R. K. Iyer,“Identifying software problems using symptoms,” Proc. 24th Int’l Symp. on Fault-Tolerant Computing,Austin, Tex., June 1994, pp. 320-329.
[12] I. Lee, R.K. Iyer, and G. Pit, “Efficient Service of Rediscovered Software Problems,” Proc. 26th Int'l Symp. Fault-Tolerant Computing, pp. 348–352, June 1996.
[13] D. Gupta, P. Jalote, and G. Barua, “A Formal Framework for On-Line Software Version Change,” IEEE Trans. Software Eng., vol. 22, no. 2, pp. 120–131, Feb. 1996.
[14] B. Latham and M.W. Swartwout, “$CD_x$–Crash Diagnostician for VMS,” Expert Systems and Knowledge Eng., T. Bernold, ed. Elsevier Science Publishers, 1986.
[15] R.A. Maxion and R.T. Olszewski, “Detection and Discrimination of Injected Network Faults,” Proc. 23rd Int'l Symp. Fault-Tolerant Computing, pp. 198-207, 1993.
[16] R. Chillarege, B. Ray, A. Garrigan, and D. Ruth, “The Recreate Problem in Software Failures,” Proc. Fourth Int'l Symp. Software Reliability Eng., 1993.
[17] T.Y. Lin and D.P. Siewiorek, “Error Log Analysis: Statistical Modeling and Heuristic Trend Analysis,” IEEE Trans. Reliability, vol. 39, no. 4, pp. 419-432, 1990.
[18] R.K. Iyer, L.T. Young, and P.V.K. Iyer, “Automatic Recognition of Intermittent Failures: An Experimental Study of Field Data,” IEEE Trans. Computers, vol. 39, pp. 525-537, 1990.
[19] J. Bartlett, W. Bartlett, R. Carr, D. Garcia, J. Gray, R. Horst, R. Jardine, D. Lenoski, and D. McGuire, “Fault Tolerance in Tandem Computer Systems,” Tandem Technical Report 90.5, Tandem Computers, Cupertino, CA, May 1990.
[20] J. Bredenoord, Tandem Failure Data System Architectural Overview. Tandem Computers Inc., May 1995.

Index Terms:
Software failure, rediscovery, diagnosis, failure symptom, software service, measurement.
Inhwan Lee, Ravishankar K. Iyer, "Diagnosing Rediscovered Software Problems Using Symptoms," IEEE Transactions on Software Engineering, vol. 26, no. 2, pp. 113-127, Feb. 2000, doi:10.1109/32.841113
Usage of this product signifies your acceptance of the Terms of Use.