loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07)
RAS by the Yard
Edinburgh, UK
June 25-June 28
ISBN: 0-7695-2855-4
Alan Wood, Sun Microsystems, Inc.
Swami Nathan, Sun Microsystems, Inc.
Different applications require different levels of fault tolerance. Therefore, it is important to create a flexible architecture that allows a customer to choose the appropriate amount of fault tolerance, a concept we call "RAS by the yard." In this paper we describe a next generation supercomputer and the design flexibility that allows us to offer a range of alternatives for RAS (reliability, availability, serviceability). In particular we explain how checkpointing can provide an availability continuum. Design alternatives that improve RAS may be expensive, so it is important to do cost/benefit studies of the alternatives. For a fixed budget and specified system balance ratios, such as Bytes/FLOPS, we analyze the system performance impact of alternative RAS strategies. We show how to optimize the amount of RAS purchased by using a performability measure.
Citation:
Alan Wood, Swami Nathan, "RAS by the Yard," dsn, pp.606-611, 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07), 2007
Usage of this product signifies your acceptance of the Terms of Use.