37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07)
Enhanced Reliability Modeling of RAID Storage Systems
Edinburgh, UK
June 25-June 28
ISBN: 0-7695-2855-4
A flexible model for estimating reliability of RAID storage systems is presented. This model corrects errors associated with the common assumption that system times to failure follow a homogeneous Poisson process. Separate generalized failure distributions are used to model catastrophic failures and usage dependent data corruptions for each hard drive. Catastrophic failure restoration is represented by a three-parameter Weibull, so the model can include a minimum time to restore as a function of data transfer rate and hard drive storage capacity. Data can be scrubbed as a background operation to eliminate corrupted data that, in the event of a simultaneous catastrophic failure, results in double disk failures. Field-based times to failure data and mathematic justification for a new model are presented. Model results have been verified and predict between 2 to 1,500 times as many double disk failures as that estimated using the current mean time to data loss method.
Citation:
Jon G. Elerath, Michael Pecht, "Enhanced Reliability Modeling of RAID Storage Systems," dsn, pp.175-184, 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07), 2007