This Article 
 Bibliographic References 
 Add to: 
Spare Capacity as a Means of Fault Detection and Diagnosis in Multiprocessor Systems
June 1989 (vol. 38 no. 6)
pp. 881-891
A technique for detecting and diagnosing faults at the processor level in a multiprocessor system is described. A process is assigned whenever possible to two processors: the processor to which it would normally be assigned (primarily) and an additional processor that would otherwise be idle (secondary). Two strategies are described and analyzed: one that is preemptive and another that is nonpr

[1] M. Malek, "A comparison connection assignment for diagnosis of multiprocessor systems," inProc. 7th Symp. Comput. Architecture, May 1980, pp. 31-35.
[2] S. L. Hakimi and K. Y. Chwa, "Schemes for fault-tolerant computing: A comparison of modularly redundant and t-diagnosable systems,"Inform. Contr., vol. 49, pp. 212-238, June 1981.
[3] A. T. Dahbura and G. M. Masson, "Greedy diagnosis at the basis of intermittent-fault/transient-upset tolerant system design,"IEEE Trans. Comput., vol. C-32, pp. 953-957, Oct. 1983.
[4] A. Dahbura, K. K. Sabnani, and L. L. King, "The comparison approach to multiprocessors fault diagnosis,"IEEE Trans. Comput., vol. C-36, pp. 373-378, Mar. 1987.
[5] L. Kleinrock,Queuing Systems. Vol. I: Theory. New York: Wiley, 1975.
[6] W. H. Huggins and D. R. Entwistle,Introductory Systems and Design, Waltham, MA: Blaisdell, 1968.
[7] M. Shooman,Probabilistic Reliability: An Engineering Approach. New York: McGraw-Hill, 1968.
[8] F. S. Hiller and G. J. Lieberman,Operations Research, 2nd ed. San Francisco, CA: Holden-Day, 1974.
[9] R. Nair, "Self-diagnosis and roving diagnosis in distributed systems," Tech. Rep. R-823, Coordinated Sci. Lab., Univ. of Illinois-Urbana, 1978.
[10] M. A. Breuer and A. A. Ismael, "Roving emulation as a fault detection mechanism,"IEEE Trans. Comput., vol. C-35, pp. 933-939, Nov. 1986.

Index Terms:
fault detection; diagnosis; multiprocessor systems; processor level; preemptive; nonpreemptive; spare capacity; response time; detecting faults; fault tolerant computing; multiprocessing systems; redundancy; system recovery.
A.T. Dahbura, K.K. Sabnani, W.J. Hery, "Spare Capacity as a Means of Fault Detection and Diagnosis in Multiprocessor Systems," IEEE Transactions on Computers, vol. 38, no. 6, pp. 881-891, June 1989, doi:10.1109/12.24300
Usage of this product signifies your acceptance of the Terms of Use.