Issue No. 03 - March (2009 vol. 58)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TC.2008.163
Jon Elerath , Network Appliance, Sunnyvale
Michael Pecht , University of Maryland, College Park
Abstract - The statistical bases for current models of RAID reliability are reviewed and a highly accurate alternative is provided and justified. This new model corrects statistical errors associated with the pervasive assumption that system (RAID group) times to failure follow a homogeneous Poisson process, and corrects errors associated with assuming the time-to-failure and time-to-restore distributions are exponentially distributed. Statistical justification for the new model uses theory for reliability of repairable systems. Four critical component distributions are developed from field data. These distributions are for times to catastrophic failure, reconstruction and restoration, read errors, and disk data scrubs. Model results have been verified and predict between 2 to 1,500 times as many double disk failures as estimates made using the mean time to data loss method. Model results are compared to system level field data for RAID group of 14 drives and show excellent correlation and greater accuracy than either MTTDL.
Hardware reliability, Redundant design, Reliability, Testing, and Fault-Tolerance
J. Elerath and M. Pecht, "A Highly Accurate Method for Assessing Reliability of Redundant Arrays of Inexpensive Disks (RAID)," in IEEE Transactions on Computers, vol. 58, no. , pp. 289-299, 2008.