The Community for Technology Leaders
Green Image
Issue No. 04 - April (2018 vol. 29)
ISSN: 1045-9219
pp: 734-747
Yongkun Li , School of Computer Science and Technology, University of Science and Technology of China, Hefei, Anhui, China
Neng Wang , School of Computer Science and Technology, University of Science and Technology of China, Hefei, Anhui, China
Chengjin Tian , School of Computer Science and Technology, University of Science and Technology of China, Hefei, Anhui, China
Si Wu , School of Computer Science and Technology, University of Science and Technology of China, Hefei, Anhui, China
Yueming Zhang , School of Computer Science and Technology, University of Science and Technology of China, Hefei, Anhui, China
Yinlong Xu , School of Computer Science and Technology, AnHui Province Key Laboratory of High Performance Computing, University of Science and Technology of China, Hefei, Anhui, China
ABSTRACT
Disk failures are very common in modern storage systems due to the large number of inexpensive disks. As a result, it takes a long time to recover a failed disk due to its large capacity and limited I/O. To speed up the recovery process and maintain a high system reliability, we propose a hierarchical code architecture with erasure codes, OI-RAID, which consists of two layers of codes, outer layer code and inner layer code. Specifically, the outer layer code is deployed with disk grouping technique based on Balanced Incomplete Block Design (BIBD) or complete graph with skewed data layout to provide efficient parallel I/O of all disks for fast failure recovery, and the inner layer code is deployed within each group of disks to provide high reliability. As an example, we deploy RAID5 in both layers to achieve fault tolerance of at least three disk failures, which meets the requirement of data availability in practical systems, as well as much higher speed up ratio for disk failure recovery than existing approaches. Besides, OI-RAID also keeps the optimal data update complexity and incurs low storage overhead in practice.
INDEX TERMS
Layout, Bandwidth, Fault tolerant systems, Redundancy, Reed-Solomon codes
CITATION

Y. Li, N. Wang, C. Tian, S. Wu, Y. Zhang and Y. Xu, "A Hierarchical RAID Architecture Towards Fast Recovery and High Reliability," in IEEE Transactions on Parallel & Distributed Systems, vol. 29, no. 4, pp. 734-747, 2018.
doi:10.1109/TPDS.2017.2775231
176 ms
(Ver 3.3 (11022016))