This Article 
 Bibliographic References 
 Add to: 
On Dependability Evaluation of Mesh-Connected Processors
September 1995 (vol. 44 no. 9)
pp. 1073-1084

Abstract—Analytical techniques for reliability and availability prediction of mesh-connected systems are proposed in this paper. The models are based on the submesh requirements. First, a reliability model is proposed assuming that a submesh can be always recognized if it exits. Analysis of the linear consecutive n-out-of-N system is extended using an expanding row/column technique to evaluate the submesh reliability. An alternative approach called row folding is also discussed. Due to the high complexity involved in computing the exact reliability, both of these techniques use approximation to estimate lower bounds. Next, the submesh reliability is computed based on two different allocation policies, known as the two-dimensional buddy system (TDBS), and the frame sliding (FS). The model with the TDBS is further extended to estimate the reliability of multiple working submeshes, which is useful in a multiuser environment. Availability analysis for a submesh of the required size is conducted using a Markov chain (MC). State truncation is used to reduce the computation time, and the MC is solved using a software package called HARP. Validation of the analytical models is done through extensive simulation. Issues, such as reliability comparison based on allocation policies, and methods for improving system reliability are addressed using the analytical models.

[1] G. Zorpette,“Reinventing the machine,” IEEE Spectrum, pp. 28-41, Sept. 1992.
[2] J.C. Laprie and A. Costes,“Dependability: A unifying concept for reliable computing,” FTCS-12, pp. 18-21, June 1982.
[3] S. Rai and D.P. Agrawal, Distributed Computing Network Reliability. Los Alamitos, Calif.: IEEE CS Press, 1990.
[4] K. Li and K.H. Cheng, "Job Scheduling in a Partitionable Mesh Using a Two Dimensional Buddy System Partitioning Scheme," IEEE Trans. Parallel and Distributed Systems, pp. 413-422, Oct. 1991.
[5] P.-J. Chuang and N.-F. Tzeng, “Allocating Precise Submeshes in Mesh Connected Systems,” IEEE Trans. Computers, vol. 5, no. 2, pp. 211-217, Feb. 1994.
[6] M.J. Serrano and B. Parhami, "Optimal Architectures and Algorithms for Mesh-Connected Parallel Computers with Separable Row/Column Buses," IEEE Trans. Parallel and Distributed Systems, vol. 4, pp. 1,073-1,080, Oct. 1993.
[7] C.W.H. Lam, H.F. Li, and R. Jayakumar, "A Study of Two Approaches for Reconfiguring Fault-Tolerant Systolic Arrays," IEEE Trans. Computers, vol. 38, no. 6, pp. 833-844, June 1989.
[8] M. Chean and J.A.B. Fortes, "The Full-Use-of-Suitable-Spares (FUSS) Approach to Hardware Reconfiguration for Fault-Tolerant Processor Arrays," IEEE Trans. Computers, Vol. 39, No. 4, Apr. 1990, pp. 564-571.
[9] J.A.B. Fortes and C.S. Raghavendra, "Gracefully Degradable Processor Arrays," IEEE Trans. Computers, Vol. C-34, Nov. 1985, pp. 1033-1044.
[10] N. Lopez-Benitez and J.A.B. Fortes, "Detailed Modeling and Reliability Analysis of Fault-Tolerant Processor Arrays," IEEE Trans. Computers, vol. 41, no. 9, pp. 1,193-1,200, Sept. 1992.
[11] C. Das and J. Kim,“A unified task-based dependability model for hypercube computers” IEEE Trans. Parallel and Distributed Systems, pp. 312-324, May 1992.
[12] J.T. Blake and K.S. Trivedi, "Reliability Analysis of Interconnection Networks Using Hierarchical Composition," IEEE Trans. Reliability, vol. 38, no. 1, pp. 111-120, Apr. 1989.
[13] C.R. Das,P. Mohapatra et al., "An Availability Model for MIN-Based Multiprocessors," , IEEE Trans. Parallel and Distributed Systems, pp. 1,118-1,129, Oct. 1993.
[14] J.B. Dugan,K.S. Trivedi,M.K. Smotherman,, and R.M. Geist,“The hybrid automated reliability predictor,” AIAA J. Guidance, Control, and Dynamics, vol. 9, no. 3, pp. 319-331, May-June 1986.
[15] R. Sahner and K.S. Trivedi,“Reliability modeling using SHARPE,” IEEE Trans. Reliability, pp. 186-193, June 1987.
[16] Y. Zhu,“Efficient processor allocation strategies for mesh-connectedparallel computers,” J. Parallel and Distributed Computing, vol. 16, pp. 328-337, 1992.
[17] A.A. Salvia and W.C. Lasher, "2-Dimensional Consecutive-k-out-of-n:F Models," IEEE Trans. Reliability, vol. 39, no. 3, pp. 382-385, Aug. 1990.
[18] T.K. Boehme, A. Kossow, and W. Preuss, "A Generalization of Consecutive k-out-of-n:F Systems," IEEE Trans. Reliability, pp. 451-457, Sept. 1992.
[19] C. Derman,G.J. Lieberman,, and S.M. Ross,“On the consecutive k-of-n:F system,” IEEE Trans. Reliability, pp. 57-63, Apr. 1982.
[20] W.E. Smith and K. S. Trivedi,“Dependability evaluation of a class of multi-loop topologies for local areanetworks,” IBM J. Research Development, vol. 33, pp. 511-523, 1989.

Index Terms:
Allocation-based reliability, availability model, consecutive n-out-of-N system, expanding row/column technique, Markov chain, mesh-connected systems, submesh dependability.
Chita R. Das, Prasant Mohapatra, "On Dependability Evaluation of Mesh-Connected Processors," IEEE Transactions on Computers, vol. 44, no. 9, pp. 1073-1084, Sept. 1995, doi:10.1109/12.464386
Usage of this product signifies your acceptance of the Terms of Use.