loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
International Conference on Dependable Systems and Networks (DSN'06)
Assessment of the Effect of Memory Page Retirement on System RAS Against Hardware Faults
Philadelphia, Pennsylvania
June 25-June 28
ISBN: 0-7695-2607-1
Dong Tang, Sun Microsystems, Inc.
Peter Carruthers, Sun Microsystems, Inc.
Zuheir Totari, Sun Microsystems, Inc.
Michael W. Shapiro, Sun Microsystems, Inc.
The Solaris 10 Operating System includes a number of new features for predictive self-healing. One such feature is the ability of the Fault Management software to diagnose memory errors and drive automatic memory page retirement (MPR), intended to reduce the negative impact of permanent memory faults that generate either correctable or uncorrectable errors on system reliability, availability, and serviceability (RAS). The MPR technique allows memory pages suffering from correctable errors and relocatable clean pages suffering from uncorrectable errors to be removed from use in the virtual memory system without interrupting user applications. It also allows relocatable dirty pages associated with uncorrectable errors to be isolated with limited impact on affected user processes, avoiding an outage for the entire system. This study applies analytical models, with parameters calibrated by field experience, to quantify the reduction that can be made by this operating system self-healing technique on the system interruptions, yearly downtime, and number of services introduced by hardware permanent faults, for typical low-end and mid-range server systems. The results show that significant improvements can be made on these three system RAS metrics by deploying the MPR capability.
Citation:
Dong Tang, Peter Carruthers, Zuheir Totari, Michael W. Shapiro, "Assessment of the Effect of Memory Page Retirement on System RAS Against Hardware Faults," dsn, pp.365-370, International Conference on Dependable Systems and Networks (DSN'06), 2006
Usage of this product signifies your acceptance of the Terms of Use.