This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Fault-Tolerant Interleaved Memory Systems with Two-Level Redundancy
September 1997 (vol. 46 no. 9)
pp. 1028-1034

Abstract—Highly reliable interleaved memory systems for uniprocessor and multiprocessor computer architectures are presented. The memory systems are divided into groups. Each group consists of several banks and each bank has several modules. The error model is defined at the memory-module level. A module is faulty if any single or multiple faults result in loss of the entire module. Spare modules, as well as spare banks, are included in the systems to enhance reliability and availability. A faulty module is replaced by a spare module within a bank first, and, if the bank has no redundancy remaining for the faulty module, the whole bank will be replaced by a spare bank at the next higher level. The structure of the reconfigurable memory system is designed in such a way that the replacement of faulty modules (banks) by spare modules (banks) will not disturb memory references if each bank (group) has at most two spare modules (banks). If there are more than two spare modules (banks) in a bank (group), a second-level address translator is designed which can prohibit references to faulty modules by address remapping. The address translator can be implemented with a CAM or switches. Analysis results show that the system reliability can be significantly improved with little hardware overhead. Also, a typical system with one redundant row of modules has the highest cost-effectiveness during its useful lifetime period. User transparency in memory access is retained.

[1] K. Hwang and F.A. Briggs,Computer Architecture and Parallel Processing.New York: McGraw Hill, 1984.
[2] R. Duncan, "A Survey of Parallel Computer Architectures," Computer, pp. 5-16, Feb. 1990.
[3] H.S. Stone, High-Performance Computer Architecture.Reading, Mass.: Addison-Wesley, 1990.
[4] K. Hwang, Advanced Computer Architecture: Parallelism, Scalability, Programmability. McGraw-Hill, 1993.
[5] D.P. Bhandarkar, "Analysis of Memory Interference in Multiprocessors," IEEE Trans. Computers, vol. 24, no. 9, pp. 897-908, Sept. 1975.
[6] D.J. Kuck and R.A. Stokes, "The Burroughs Scientific Processor (BSP)," IEEE Trans. Computers, vol. 31, no. 5, pp. 363-376, May 1982.
[7] D.T. Harper and J.R. Jump,“Vector access performance in parallel memoriesusing a skewed storage scheme,” IEEE Trans. Computers, vol. 36, pp. 1440-1449, 1987.
[8] D.H. Lawrie and C.R. Vora, "The Prime Memory System for Array Access," IEEE Trans. Computers, vol. 31, no. 5, pp. 435-442, May 1982.
[9] C.L. Chen and C.K. Liao, "Analysis of Vector Access Performance on Skewed Interleaved Memory," Proc. Int'l Conf. Parallel Processing, pp. 387-394, 1989.
[10] G. Burnett and E.G. Coffman, "A Study of Interleaved Memory Systems," Proc. AFIPS Spring Joint Computing Conf., pp. 467-474, 1970.
[11] M.S. Algudady, C.R. Das, and W. Lin, "Fault-Tolerant Task Mapping Algorithms for MIN-Based Multiprocessors," Proc. Int'l Conf. Parallel Processing, 1990.
[12] K.C. Cheung, G.S. Sohi, K.K. Saluja, and D.K. Pradhan, "Design and Analysis of a Graceful Degrading Interleaved Memory System," IEEE Trans. Computers, vol. 39, no. 1, pp. 63-71, Jan. 1990.
[13] M. Wang, M. Cutler, and S.Y.H. Su, "Reconfiguration of VLSI/WSI Mesh Array Processors with Two-Level Redundancy," IEEE Trans. Computers, vol. 38, no. 4, pp. 547-554, Apr. 1989.

Index Terms:
Interleaved memory system, fault tolerance, content addressable memory, two-level redundancy, reconfigurable memory system.
Citation:
Shyue-Kung Lu, Sy-Yen Kuo, Cheng-Wen Wu, "Fault-Tolerant Interleaved Memory Systems with Two-Level Redundancy," IEEE Transactions on Computers, vol. 46, no. 9, pp. 1028-1034, Sept. 1997, doi:10.1109/12.620483
Usage of this product signifies your acceptance of the Terms of Use.