This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
TRUSS: A Reliable, Scalable Server Architecture
November/December 2005 (vol. 25 no. 6)
pp. 51-59
Brian T. Gold, Carnegie Mellon University
Jangwoo Kim, Carnegie Mellon University
Jared C. Smolens, Carnegie Mellon University
Eric S. Chung, Carnegie Mellon University
Vasileios Liaskovitis, Carnegie Mellon University
Eriko Nurvitadhi, Carnegie Mellon University
Babak Falsafi, Carnegie Mellon University
James C. Hoe, Carnegie Mellon University
Andreas G. Nowatzyk, Cedars-Sinai Medical Center
Traditional reliable servers require costly design changes to the processor, use custom system or application software, or cannot scale beyond a few processing elements. We present TRUSS, a family of server architectures providing reliable, scalable computation from distributed shared-memory hardware while requiring no changes to software. The TRUSS paradigm centers around a logical division of computation and memory that isolates errors in processing from memory storage and vice versa. In this paper, we present the key mechanisms that enable this separation and use full-system simulation to evaluate the impact on a range of commercial and scientific workloads.
Index Terms:
Reliability, Testing, and Fault-Tolerance, Performance Analysis and Design Aids, Reliability, Testing, and Fault-Tolerance, Reliability, Testing, and Fault-Tolerance
Citation:
Brian T. Gold, Jangwoo Kim, Jared C. Smolens, Eric S. Chung, Vasileios Liaskovitis, Eriko Nurvitadhi, Babak Falsafi, James C. Hoe, Andreas G. Nowatzyk, "TRUSS: A Reliable, Scalable Server Architecture," IEEE Micro, vol. 25, no. 6, pp. 51-59, Nov.-Dec. 2005, doi:10.1109/MM.2005.122
Usage of this product signifies your acceptance of the Terms of Use.