This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
A Distributed Recovery Block Approach to Fault-Tolerant Execution of Application Tasks in Hypercubes
January 1993 (vol. 4 no. 1)
pp. 104-111

An approach to fault-tolerant execution of real-time application tasks in hypercubes isproposed. The approach is based on the distributed recovery block (DRB) scheme anddoes not require special hardware mechanisms in support of fault tolerance. Each task isassigned to a pair of processors forming a DRB computing station for execution in adual-redundant and self-checking mode. Assignment of all tasks in an application in sucha form is called the full DRB mapping. The DRB scheme was developed as an approach to uniform treatment of hardware and software faults with the effect of fast forwardrecovery. However, if the system developer is concerned with hardware fault possibilitiesonly, then forming DRB stations becomes a mechanical process not burdening theapplication software designer in any way. A procedure for converting an efficientnonredundant task-to-processor mapping into an efficient full DRB mapping is presented.

[1] S. H. Bokhari, "On the mapping problem,"IEEE Trans. Comput., vol. C-30, no. 3, pp. 207-214, 1981.
[2] S. K. Chen, C. T. Liang, and W. T. Tsai, "An efficient grids reconfiguration algorithm on hypercubes," inProc. IEEE Comput. Society's 18th Int. Symp. Fault-Tolerant Comput., June 1988, pp. 368-373.
[3] M. Hecht, J. Agron, and S. Hochhauser, "A distributed fault-tolerant architecture for nuclear reactor control and safety functions," inProc. IEEE Computer Society's 1989 Real-Time Syst. Symp., Dec. 1989, pp. 214-221.
[4] J.-W. Hong, K. Mehlhorn, and A. Rosenberg, "Cost trade-offs in graph embeddings, with applications,"J. ACM, pp. 709-728, 1983.
[5] J. Horning, H. C. Lauer, P. M. Melliar-Smith, and B. Randell, "A program structure for error detection and recovery,"Lecture Notes in Computer Science, vol. 16, New York: Springer-Verlag, 1974, pp. 171-187.
[6] S. H. Hosseini, "Fault-tolerant scheduling of independent tasks and concurrent fault-diagnosis in multiple processor systems," inProc. 1988 Int. Conf. Parallel Processing, Aug. 1988, pp. 343-350.
[7] J. Kim, C. R. Das, and W. Lin, "A processor allocation scheme for hypercube computers," inProc. 1989 Int. Conf. Parallel Processing, Aug. 1989, pp. 231-238.
[8] K. H. Kim, "Distributed execution of recovery blocks: An approach to uniform treatment of hardware and software faults," inProc. IEEE Comput. Society's 4th Int. Conf. Distributed Comput. Syst., May 1984, pp. 526-532.
[9] K. Kim and J. Yoon, "Approaches to implementation of a reparable distributed recovery block scheme," inDig. Papers, FTCS-18, 1988, pp. 50-55.
[10] K. H. Kim and H. O. Welch, "Distributed execution of recovery blocks: An approach for uniform treatment of hardware and software faults in real-time applications,"IEEE Trans. Comput., vol. 38, no. 5, pp. 626-636, 1989.
[11] D. Kiskis and K. Shin, "Embedding triple-modular redundancy into a hypercube architecture," inProc. of the Third Conf. on Hypercube Concurrent Comput. and Applicat., 1988, pp. 337-345.
[12] S.-Y. Lee and J. K. Aggarwal, "A mapping strategy for parallel processing,"IEEE Trans. Comput., vol. C-36, pp. 433-442, Apr. 1987.
[13] D. A. Rennels, "On implementing fault tolerance in binary hypercubes," inProc. IEEE Comput. Society's 16th Int. Symp. Fault-Tolerant Comput., June 1986, pp. 344-349.
[14] D. J. Taylor, D. E. Morgan, and J. P. Black, "Redundancy in data structures: Improving software fault-tolerance,"IEEE Trans. Software Eng., vol. SE-6, pp. 585-594, Nov. 1980.
[15] J. H. Wensley, L. Lamport, J. Goldberg, M. W. Green, K. N. Levitt, P. M. Melliar-Smith, R. E. Shostack, and C. B. Wienstock, "SIFT: Design and analysis of a fault-tolerant computer for aircraft control,"Proc. IEEE, vol. 66, no. 10, pp. 1240-1255, Oct. 1978.
[16] A. Y. Wu, "Mapping of tree networks into hypercubes,"J. Parallel Distributed Comput., vol. 2, pp. 238-249, 1985.

Index Terms:
Index Termsdual redundant mode; task assignment; distributed recovery block; fault-tolerantexecution; real-time application tasks; hypercubes; computing station; self-checkingmode; software faults; fast forward recovery; hardware fault; fault tolerant computing;hypercube networks
Citation:
K.H. Kim, A. Kavianpour, "A Distributed Recovery Block Approach to Fault-Tolerant Execution of Application Tasks in Hypercubes," IEEE Transactions on Parallel and Distributed Systems, vol. 4, no. 1, pp. 104-111, Jan. 1993, doi:10.1109/71.205657
Usage of this product signifies your acceptance of the Terms of Use.