This Article 
 Bibliographic References 
 Add to: 
Analysis of Preventive Maintenance in Transactions Based Software Systems
January 1998 (vol. 47 no. 1)
pp. 96-107

Abstract—Preventive maintenance of operational software systems, a novel technique for software fault tolerance, is used specifically to counteract the phenomenon of software "aging." However, it incurs some overhead. The necessity to do preventive maintenance, not only in general purpose software systems of mass use, but also in safety-critical and highly available systems, clearly indicates the need to follow an analysis based approach to determine the optimal times to perform preventive maintenance.

In this paper, we present an analytical model of a software system which serves transactions. Due to aging, not only the service rate of the software decreases with time, but also the software itself experiences crash/hang failures which result in its unavailability. Two policies for preventive maintenance are modeled and expressions for resulting steady state availability, probability that an arriving transaction is lost and an upper bound on the expected response time of a transition are derived. Numerical examples are presented to illustrate the applicability of the models.

[1] E. Adams, "Optimizing Preventive Service of the Software Products," IBM J. Research and Development, vol. 28, no. 1, pp. 2-14, Jan. 1984.
[2] A. Avizienis, "The n-Verion Approach to Fault-Tolerant Software," IEEE Trans. Software Eng., vol. 11, no. 12, pp. 1,491-1,501, Dec. 1985.
[3] A. Avritzer and E.J. Weyuker, "Monitoring Smoothly Degrading Systems for Increased Dependability," submitted for publication.
[4] L. Bernstein Text of seminar delivered at the Univ. Learning Center, George Mason Univ., Jan.29, 1996.
[5] R. Chillarege, S. Biyani, and J. Rosenthal, "Measurements of Failure Rate in Commercial Software," Proc. 25th Symp. Fault Tolerant Computing, June 1995.
[6] E. Cinlar, Introduction to Stochastic Processes.Englewood Cliffs, N.J.: Prentice Hall, 1975.
[7] G.F. Clement and P.K. Giloth, "Evolution of Fault Tolerant Switching Systems in AT&T," The Evolution of Fault-Tolerant Computing, Dependable Computing and Fault-Tolerant Systems, A. Avizienis, H. Kopetz, J. C. Laprie, eds., vol. 1, pp. 37-53.Springer-Verlag, 1987.
[8] S. Garg, A. Puliafito, M. Telek, and K.S. Trivedi, "Analysis of Software Rejuvenation Using Markov Regenerative Stochastic Petri Net," Proc. Sixth Int'l. Symp. Software Reliability Eng., pp. 24-27,Toulouse, France, Oct. 1995.
[9] S. Garg, Y. Huang, C. Kintala, and K.S. Trivedi, "Time and Load Based Software Rejuvenation: Policy, Evaluation and Optimality," Proc. First Fault-Tolerant Symp.,Madras, India, Dec.22-25, 1995.
[10] S. Garg, Y. Huang, C. Kintala, and K.S. Trivedi, "Minimizing Completion Time of a Program by Checkpointing and Rejuvenation," Proc. 1996 ACM SIGMETRICS Conf., pp. 252-261,Philadelphia, May 1996.
[11] J. Gray and D.P. Siewiorek, "High-Availability Computer Systems," Computer, pp. 39-48, Sept. 1991.
[12] J. Gray, "Why Do Computers Stop and What Can Be Done About It?" Proc. Fifth Symp. Reliability in Distributed Software and Database Systems, pp. 3-12, Jan. 1986.
[13] J. Gray, "A Census of Tandem System Availability Between 1985 and 1990," IEEE Trans. Reliability, vol. 39, no. 4, pp. 409-418, Oct. 1990.
[14] B.O.A. Grey, "Making SDI Software Reliable Through Fault-Tolerant Techniques" Defense Electronics, pp. 77-80, 85-86, Aug. 1987.
[15] Y. Huang, P. Jalote, and C. Kintala, "Two Techniques for Transient Software Error Recovery," Lecture Notes in Computer Science, vol. 774, pp. 159-170. Springer Verlag, 1994.
[16] Y. Huang, C. Kintala, N. Kolettis, and N.D. Fulton, Software Rejuvenation: Analysis, Module and Applications Proc. 25th IEEE Int'l Symp. Fault-Tolerant Computing, pp. 381-390, June 1995.
[17] R.K. Iyer and I. Lee, "Software Fault Tolerance in Computer Operating Systems," Software Fault Tolerance, M.R. Lyu, ed. John Wiley and Sons Ltd., 1995.
[18] P. Jalote, Y. Huang, and C. Kintala, "A Framework for Understanding and Handling Transient Software Failures," Proc. Second ISSAT Int'l. Conf. Reliability and Quality in Design,Orlando, Fla., 1995.
[19] J.C. Laprie, J. Arlat, C. B'eounes, K. Kanoun, and C. Hourtolle, "Hardware and Software Fault Tolerance: Definition and Analysis of Architectural Solutions," Digest 17th FTCS, pp. 116-121,Pittsburgh, Penn., 1987.
[20] J-C. Laprie, J. Arlat, C. B'eounes, and K. Kanoun, "Architectural Issues in Software Fault-Tolerance," Software Fault Tolerance, M.R. Lyu, ed., pp. 47-80. John Wiley&Sons. Ltd., 1995.
[21] E. Marshall, "Fatal Error: How Patriot Overlooked a Scud," Science, p. 1,347, Mar.13, 1992.
[22] A. Pfening, S. Garg, A. Puliafito, M. Telek, and K.S. Trivedi, "Optimal Rejuvenation for Tolerating Soft Failures," Performance Evaluation, vols. 27/28, pp. 491-506, Oct. 1996.
[23] B. Randell, "System Structure for Software Fault Tolerance," IEEE Trans. Software Eng., vol. 1, pp. 220-232, June 1975.
[24] M. Sullivan and R. Chillarege, "Software Defects and Their Impact on System Availability—A Study of Field Failures in Operating Systems," Proc. Int'l Symp. Fault-Tolerant Computing, pp. 2-9, 1991.
[25] J.J. Stiffler, "Fault-Tolerant Architectures—Past, Present and Future," Lecture Notes in Computer Science, vol. 774, pp. 117-121.Berlin: Springer Verlag, 1994.
[26] A. Tai, S.N. Chau, L. Alkalaj, and H. Hecht, "On-Board Preventive Maintenance: Analysis of Effectiveness and Optimal Duty Period," Proc. Third Int'l Workshop Object-Oriented Real-time Dependable Systems, Feb. 1997.
[27] Y.M. Wang, Y. Huang, and W.K. Fuchs, "Progressive Retry for Software Error Recovery in Distributed Systems," Proc. IEEE Fault Tolerant Computing Symp., pp. 138-144, June 1993.

Index Terms:
Preventive maintenance, software fault tolerance, software rejuvenation, transactions based software systems, reliability modeling, Markov regenerative models.
Sachin Garg, Antonio Puliafito, Miklós Telek, Kishor Trivedi, "Analysis of Preventive Maintenance in Transactions Based Software Systems," IEEE Transactions on Computers, vol. 47, no. 1, pp. 96-107, Jan. 1998, doi:10.1109/12.656092
Usage of this product signifies your acceptance of the Terms of Use.