The Community for Technology Leaders
2013 IEEE 19th Pacific Rim International Symposium on Dependable Computing (PRDC) (2013)
Vancouver, BC, Canada
Dec. 2, 2013 to Dec. 4, 2013
ISBN: 978-0-7695-5130-2
pp: 108-117
ABSTRACT
We present a general method for estimating the risk of data loss in arbitrary two-dimensional RAID arrays where each data disk belongs to exactly two single-parity stripes. We start by representing each array organization by a graph where each parity stripe, and its associated parity disk, is represented by a node and each data disk by an edge. We then use this representation to identify and enumerate minimal sets of disk failures, say, triple failures, quadruple failures and so forth, that will cause a data loss. The overall probabilities that a given number n of disk failures will cause a data loss is then given by the ratio of the total number of fatal disk failures involving n disks over the total number of possible failures of n disks. To illustrate the power of our method, we apply it to two distinct, archival two-dimensional array organizations. The first, "square" organization is a traditional square layout where data disks are formed into a square and the parity stripes are formed by the rows and columns in the square. Hence a square layout organization with n^2 data disks will have 2n parity disks. The second, "complete" organization corresponds to a closer weave, where all parity stripes intersect and each intersection contains a parity disk. This organization with n parity disks will have n(n - 1)/2 data disks. Our results show that previous ad hoc estimates of the reliability of these arrays significantly underestimated their reliability by assuming that either all triple or all quadruple disk failures were fatal. We show that the two two-dimensional array organizations exhibit mean times to data loss and five-year survival rates that are very similar to those of a RAID Level 6 organization of much smaller capacity. Our complete organization is about 4.5 times and the square organization is about 8 times more reliable than a disk array with same storage capacity built from RAID level 6 stripes. We present a general method for estimating the risk of data loss in arbitrary two-dimensional RAID arrays where each data disk belongs to exactly two single-parity stripes. We start by representing each array organization by a graph where each parity stripe, and its associated parity disk, is represented by a node and each data disk by an edge. We then use this representation to identify and enumerate minimal sets of disk failures, say, triple failures, quadruple failures and so forth, that will cause a data loss. The overall probabilities that a given number n of disk failures will cause a data loss is then given by the ratio of the total number of fatal disk failures involving n disks over the total number of possible failures of n disks. To illustrate the power of our method, we apply it to two distinct, archival two-dimensional array organizations. The first, "square" organization is a traditional square layout where data disks are formed into a square and the parity stripes are formed by the rows and columns in the square. Hence a square layout organization with n^2 data disks will have 2n parity disks. The second, "complete" organization corresponds to a closer weave, where all parity stripes intersect and each intersection contains a parity disk. This organization with n parity disks will have n(n - 1)/2 data disks. Our results show that previous ad hoc estimates of the reliability of these arrays significantly underestimated their reliability by assuming that either all triple or all quadruple disk failures were fatal. We show that the two two-dimensional array organizations exhibit mean times to data loss and five-year survival rates that are very similar to those of a RAID Level 6 organization of much smaller capacity. Our complete organization is about 4.5 times and the square organization is about 8 times more reliable than a disk array with same storage capacity built from RAID level 6 stripes.
INDEX TERMS
five year survival rate, Disk array organization, archival storage system, Markov model, mean time to data loss
CITATION

S. T. Schwarz, D. D. Long and J. Paris, "Reliability of Disk Arrays with Double Parity," 2013 IEEE 19th Pacific Rim International Symposium on Dependable Computing (PRDC), Vancouver, BC, Canada, 2013, pp. 108-117.
doi:10.1109/PRDC.2013.20
190 ms
(Ver 3.3 (11022016))