This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
An Approach to Experimental Evaluation of Real-Time Fault-Tolerant Distributed Computing Schemes
June 1989 (vol. 15 no. 6)
pp. 715-725

A test-based approach to the evaluation of fault-tolerant distributed-computing schemes is discussed. The approach is based on experimental incorporation of system structuring and design techniques into real-time distributed computing testbeds centered around tightly coupled microcomputer networks. The effectiveness of this approach has been experimentally confirmed. Primary advantages of the testbed-based approach include the relatively high accuracy of the data obtained on timing and logical complexity, as well as the relatively high degree of assurance that can be obtained on the practical effectiveness of the scheme evaluated. Various design issues encountered in the course of establishing the basic microcomputer network testbed facilities are discussed, along with their augmentation to support some experiments. The shortcomings of the testbeds that have been recognized are also discussed together with the desired extensions of the testbeds. Some of the desired extensions are beyond the state-of-the-art in microcomputer network implementation.

[1] B. Bhargava and J. Reidl, "The Raid distributed database system,"IEEE Trans. Software Eng., this issue, pp. 726-736.
[2] P. Brinch Hansen,The Architecture of Concurrent Programs. Englewood Cliffs, NJ: Prentice-Hall, 1977.
[3] W. Chu, K. H. Kim, and W. C. McDonald, "Testbed-based validation of design techniques for reliable distributed real-time systems,"Proc. IEEE (Special Issue on Distributed Databases), pp. 649-667, May 1987.
[4] Honeywell Inc. Corp. Syst. Develop. Div., "Fault-tolerant distributed systems," Rome Air Develop. Center Contract F30602-85-C- 0300, Final Rep., Dec. 1987.
[5] J. Horning, H. C. Lauer, P. M. Melliar-Smith, and B. Randell, "A program structure for error detection and recovery,"Lecture Notes in Computer Science, vol. 16, New York: Springer-Verlag, 1974, pp. 171-187.
[6] IEEE Comput. Soc.,Summary of the Workshop on Design Principles for Experimental Distributed Systems, Purdue Univ., Oct. 1986 (for the list of presentations made at this workshop, seeIEEE Comput. Soc. Distributed Processing Tech. Committee Newslett., vol. 10, no. 1, pp. 5-10, Mar. 1988).
[7] IEEE Comput. Soc.,Proc. Workshop Instrumentation for Distributed Computing Systems, Sanibel Island, FL, Jan. 1987.
[8] K. H. Kim, "Approaches to mechanization of the conversation scheme based on monitor,"IEEE Trans. Software Eng., vol. SE-8, no. 3, pp. 189-197, May 1982.
[9] K. H. Kim, A. Abouelnaga, S. Heu, and S. M. Yang, "Process scheduling and prevention of communication deadlocks in an experimental microcomputer network," inProc. Real-Time Systems Symp., Dec. 1982, pp. 124-132.
[10] K. H. Kim, "Evolution of a virtual machine supporting fault-tolerant distributed processes at a research laboratory," inProc. Int. Conf. Data Eng., Los Angeles, CA, Apr. 1984, pp. 620-628.
[11] K. H. Kim, "Distributed execution of recovery blocks: An approach to uniform treatment of hardware and software faults," inProc. 4th Int. Conf. Distributed Computing Systems, May 1984, pp. 526-532.
[12] K. H. Kim and H. O. Welch, "Distributed execution of recovery blocks: An approach to uniform treatment of hardware and software faults in real-time applications,"IEEE Trans. Comput., vol. 38, May 1989.
[13] W. Kohler and B. P. Jeng, "Performance evaluation of integrated concurrency control and recovery algorithms using a distributed transaction processing testbed," inProc. 6th Int. Conf. Distributed Computing Systems, May 1986, pp. 130-139.
[14] H. Kopetz and W. Merker, "The architecture of Mars," inProc. 15th Int. Symp. Fault-Tolerant Computing, June 1985, pp. 274-279.
[15] H. Kopetz and W. Ochsenreiter, "Clock Synchronization in Distributed Real-Time Systems,"IEEE Trans. Computers, Vol. 36, No. 8, Aug. 1987, pp. 933-940.
[16] W. C. McDonald and R. W. Smith, "A flexible distributed testbed for real-time applications,"Computer, vol. 15, no. 10, pp. 25-39, Oct. 1982.
[17] W. C. McDonald and M. W. Beasley, "A real-time multi-microcomputer architecture employing a fully parallel crossbar switch," inProc. ICCD 83, Oct. 1983, pp. 255-258.
[18] B. Randell, "System structure for software fault tolerance,"IEEE Trans. Software Eng., vol. SE-1, pp. 220-232, June 1975.
[19] R. E. Schantz, R. H. Thomas, and G. Bono, "The architecture of the Cronus distributed operating system," inProc. 6th Int. Conf. Distributed Computing Systems, May 1986, pp. 250-259.
[20] J. F. Shochet al., "Evolution of the Ethernet local computer network,"Computer, vol. 15, pp. 10-27, Aug. 1982.
[21] J. C. Yoon, "An approach to design of fault-tolerant real-time tightly coupled networks and its experimental validation," Ph.D. dissertation, Dep. Comput. Sci. Eng., Univ. South Florida, May 1988.

Index Terms:
experimental evaluation; real-time fault-tolerant distributed computing schemes; test-based approach; system structuring; design techniques; real-time distributed computing testbeds; tightly coupled microcomputer networks; timing; logical complexity; practical effectiveness; design issues; microcomputer network testbed facilities; microcomputer network implementation; computer networks; fault tolerant computing; microcomputer applications; performance evaluation; program testing; real-time systems
Citation:
K.H. Kim, "An Approach to Experimental Evaluation of Real-Time Fault-Tolerant Distributed Computing Schemes," IEEE Transactions on Software Engineering, vol. 15, no. 6, pp. 715-725, June 1989, doi:10.1109/32.24725
Usage of this product signifies your acceptance of the Terms of Use.