2008 International Conference on Networking, Architecture, and Storage
Reliability Assurance of RAID Storage Systems for a Wide Range of Latent Sector Errors
June 12-June 14
ISBN: 978-0-7695-3187-8
The low-cost disk drives, which are increasingly being adopted in today's data storage systems, have higher capacity but lower reliability, which leads to more frequent rebuilds and to a higher risk of unrecoverable or latent media errors. An intra-disk redundancy scheme has been proposed to cope with such errors and enhance the reliability of RAID systems. Empirical field results recently reported in the literature, however, suggest that the extent to which unrecoverable media errors occur is higher than the data sheet specifications provided by the disk manufacturers. Our results demonstrate that the reliability improvement due to intradisk redundancy is adversely affected because of the increase in the number of unrecoverable errors. We demonstrate that, by revising the parameter choice of the intra-disk redundancy scheme, we can obtain essentially the same reliability as that of a system operating without unrecoverable sector errors. The I/O and throughput performance are evaluated by means of analysis and event-driven simulations. The effects of the spatial locality of errors and of the error-burst length distribution on the system reliability are also investigated.
Index Terms:
Unrecoverable or latent sector errors, RAID, reliability analysis, MTTDL, stochastic modeling
Citation:
Ilias Iliadis, Xiao-Yu Hu, "Reliability Assurance of RAID Storage Systems for a Wide Range of Latent Sector Errors," nas, pp.10-19, 2008 International Conference on Networking, Architecture, and Storage, 2008