Issue No.02 - February (1985 vol.11)
R.K. Iyer , Computer Systems Group at the Coordinated Science Laboratory and the Department of Electrical and Computer Engineering, University of Illinois
This paper describes an analysis of hardware-related software (HW/SW) errors on an MVS/SP operating system at Stanford University. The analysis procedure demonstrates a methodology for evaluating the interaction between hardware and software as it relates to system reliability. The paper examines the operating system's handling of HW/SW errors and also the effectiveness of recovery management. Nearly 35 percent of all observed software failures were found to be hareware-related. The analysis shows that the operating system is seldom able to diagnose that a software error may be hardware-related. The impact of HW/SW errors on the system is evaluated by measuring the effectiveness of system recovery in containing the propagation of HW/SW errors. The system failure probability for HW/SW errors is close to three times that for software errors in general. The observed HW/SW errors are seen to have a specific pattern, suggesting the possibility of the use of such error patterns for intelligent error prediction and recovery.
software reliability, Hardware/software interactions, recovery analysis
R.K. Iyer, P. Velardi, "Hardware-Related Software Errors: Measurement and Analysis", IEEE Transactions on Software Engineering, vol.11, no. 2, pp. 223-231, February 1985, doi:10.1109/TSE.1985.232198