This Article 
 Bibliographic References 
 Add to: 
RAID5 Performance with Distributed Sparing
June 1997 (vol. 8 no. 6)
pp. 640-657

Abstract—Distributed sparing is a method to improve the performance of RAID5 disk arrays with respect to a dedicated sparing system with N + 2 disks (including the spare disk), since it utilizes the bandwidth of all N + 2 disks. We analyze the performance of RAID5 with distributed sparing in normal mode, degraded mode, and rebuild mode in an OLTP environment, which implies small reads and writes. The analysis in normal mode uses an M/G/1 queuing model, which takes into account the components of disk service time. In degraded mode, a low-cost approximate method is developed to estimate the mean response time of fork-join requests resulting from accesses to recreate lost data on the failed disk. Rebuild mode performance is analyzed by considering an M/G/1 vacationing server model with multiple vacations of different types to take into account differences in processing requirements for reading the first and subsequent tracks. An iterative solution method is used to estimate the mean response time of disk requests, as well as the time to read each disk, which is shown to be quite accurate through validation against simulation results. We next compare RAID5 performance in a system 1) without a cache; 2) with a cache; and 3) with a nonvolatile storage (NVS) cache. The last configuration, in addition to improved read response time due to cache hits, provides a fast-write capability, such that dirty blocks can be destaged asynchronously and at a lower priority than read requests, resulting in an improvement in read response time. The small write penalty is also reduced due to the possibility of repeated writes to dirty blocks in the cache and by taking advantage of disk geometry to efficiently destage multiple blocks at a time.

[1] P. Biswas, K.K. Ramakrishnan, and D. Towsley, "Trace Driven Analysis of Write Caching Policies for Disks," Proc. 1993 ACM SIGMETRICS Conf. Measurement and Modeling of Computer Systems, pp. 13-23,Santa Clara, Calif., May 1993.
[2] A. Brandwajn, "Models of Disk Subsystems with Multiple Access Paths: A Throughput Driven Approach," IEEE Trans. Computers, vol. 32, no. 5, pp. 451-463, May 1993.
[3] P.M. Chen, E.K. Lee, G.A. Gibson, R.H. Katz, and D.A. Patterson, "RAID: High-Performance Reliable Secondary Storage," ACM Computing Surveys, vol. 36, no. 3, pp. 145-185, Aug. 1994.
[4] S.Z. Chen and D. Towsley, "The Design and Evaluation of RAID 5 and Parity Striping Disk Array Architectures," J. Parallel and Distributed Computing, vol. 10, no. 1/2, pp. 41-57, Jan./Feb. 1993.
[5] E.G. Coffman Jr. and M. Hofri, "Queueing Models of Secondary Storage Devices," Stochastic Analysis of Computer and Comm. Systems, H. Takagi, ed., pp. 549-588. North-Holland, 1990.
[6] B.T. Doshi, "An M/G/1 Queue with Variable Vacations," Proc. Modeling Techniques and Tools for Performance Analysis '85, N. Abu el Ata, ed., pp. 67-81.Amsterdam: North-Holland, 1985.
[7] G.A. Gibson, Redundant Disk Arrays: Reliable, Parallel Secondary Storage. The MIT Press, 1992.
[8] J. Gray, B. Horst, and M. Walker, "Parity Striping of Disk Arrays: Low Cost Reliable Storage with Acceptable Throughput," Proc. 16th Int'l VLDB Conf., p. 152, 1990.
[9] M. Hofri, "Disk Scheduling: FCFS vs. SSTF Revisited," Comm. ACM, vol. 23, no. 11, pp. 645-653, Nov. 1980. (Corrigendum, vol. 24, no. 11, p. 772).
[10] M. Hofri private communication, June 1994.
[11] M.C. Holland, "On-Line Reconstruction in Redundant Disk Arrays," PhD thesis, Dept. of Computer and Electrical Eng., Carnegie-Mellon Univ., 1994.
[12] M. Holland and G.A. Gibson, "Parity Declustering for Continuous Operation in Redundant Disk Arrays," Proc. Fifth Architectural Support for Programming Languages and Operating Systems, pp. 23-35,Boston, Oct. 1992.
[13] M. Holland, G.A. Gibson, and D.P. Siewiorek, "Fast, On-Line Failure Recovery in Redundant Disk Arrays," Proc. 23rd Int'l Symp. Fault-Tolerant Computing Systems, pp. 422-431,Touluse, France, June 1993.
[14] M. Holland, G. Gibson, and D. Siewiorek, “Architectures and Algorithms for On-Line Failure Recovery in Redundant Disk Arrays,” J. Distributed and Parallel Databases, vol. 2, July 1994.
[15] R.Y. Hou, J. Menon, and Y.N. Patt, "Balancing I/O Response Time and Disk Rebuild Time in a RAID5 Disk Array," Proc. Hawaii Int'l Conf. System Sciences, vol. 1, pp. 70-79,Honolulu, Jan. 1993.
[16] J. Hyde personal communication, IBM Storage Systems Division Performance Group, Tucson, Ariz., 1990.
[17] J. Menon and D. Mattson, “Comparison of Sparing Alternatives for Disk Arrays,” Proc. Int'l Symp. Computer Architecture, 1992.
[18] J. Menon and J. Cortney, "The Architecture of a Fault-Tolerant Cached RAID Controller," Proc. 20th Int'l Symp. Computer Architecture, pp. 76-86,San Diego, Calif., May 1993.
[19] J. Menon, "Performance of RAID5 Disk Arrays with Read and Write Caching," Distributed and Parallel Databases, vol. 11, no. 3, pp. 261-293, July 1994.
[20] A. Merchant and P.S. Yu, "Design and Modeling of Clustered RAID," Proc. 22th Int'l Symp. Fault Tolerant Computing Systems, pp. 140-149,Boston, July 1992.
[21] R.R. Muntz and J.C.S. Lui, "Performance Analysis of Disk Arrays Under Failure," Proc. 16th Int'l Conf. Very Large Data Bases, pp. 162-173,Brisbane, Australia, Aug. 1990.
[22] A.L. Narasimha Reddy and P. Banerjee, "Gracefully Degradable Disk Arrays," Proc. 21st Int'l Symp. Fault Tolerant Computing Systems, pp. 401-408,Montreal, Canada, June 1991.
[23] A. L. Narasimha Reddy, J. Chandy, and P. Banerjee, "Design and Evaluation of Gracefully Degradable Disk Arrays," J. Parallel and Distributed Computing, vol. 17, no. 1/2, pp. 28-40, Jan./Feb. 1993.
[24] R. Nelson and A. Tantawi, “Approximate Analysis of Fork/Join Synchronization in Parallel Queues,” IEEE Trans. Computers, vol. 37, pp. 739–743, June 1988.
[25] S.W. Ng and R.L. Mattson, "Uniform Parity Distribution in Disk Arrays with Multiple Failures," IEEE Trans. Computers, vol. 43, no. 4, pp. 501-506, Apr. 1994.
[26] D.A. Patterson, G. Gibson, and R.H. Katz, “A Case for Redundant Arrays of Inexpensive Disks (RAID),” Proc. ACM SIGMOD Conf., pp. 109–116, 1988.
[27] K. Ramakrishnan, P. Biswas, and R. Karedla, “Analysis of File I/O Traces in Commercial Computing Environments,” Performance Evaluation Rev., Vol. 20, No. 1, June 1992, pp. 78-90.
[28] M. Rosenblum and J.K. Ousterhout, "The Design and Implementation of a Log-Structured File System," ACM Trans. Computer Systems, vol. 10, no. 1, Feb. 1992.
[29] C. Ruemmler and J. Wilkes, "An Introduction to Disk Drive Modeling," Computer, vol. 27, no. 3, pp. 17-28, Mar. 1994.
[30] H. Takagi, Queueing Analysis, Vol. 1: Vacation and Priority Systems, Part 1. North-Holland, 1991.
[31] A. Thomasian and J. Menon, "Performance Analysis of RAID5 Disk Arrays with a Vacationing Server Model for Rebuild Mode Operation," Proc. 10th Int'l Conf. Data Eng., pp. 111-119,Houston, Feb. 1994.
[32] A. Thomasian, "Priority Queueing in RAID5 Disk Arrays," IBM Research Report RC 19734, Hawthorne, N.Y., Oct. 1994.
[33] A. Thomasian and A. Tantawi, "Approximate Solutions for M/G/1 Fork-Join Synchronization," Proc. Winter Simulation Conf., pp. 361-368,Orlando, Fla., Dec. 1994.
[34] A. Thomasian, "Priority Queueing in RAID5 Disk Arrays with an NVS Cache," Proc. MASCOTS'95: Int'l Workshop Modeling, Analysis, and Simulation of Computer and Comm. Systems, pp. 168-172,Durham, N.C., Jan. 1995.
[35] A. Thomasian, "Rebuild Options in RAID5 Disk Arrays," Proc. Seventh IEEE Symp. Parallel and Distributed Systems, pp. 511-518,San Antonio, Tex., Oct. 1995.
[36] A. Thomasian, "Approximate Analyses for Fork/Join Synchronization in RAID5," Computer Systems: Science and Eng., 1997.
[37] K. Treiber and J. Menon, "Simulation Study of Cached RAID5 Designs," Proc. First IEEE Symp. High Performance Computer Architecture, pp. 186-197. (also IBM Research Report RJ 9823, Almaden Research Center, Calif., May 1994).
[38] B. Worthington, G. Ganger, and Y. Patt, "Scheduling Algorithms for Modern Disk Drives," Proc. ACM Sigmetrics Conf., pp. 241-251, May 1994.

Index Terms:
RAID5 disk arrays, dedicated sparing, distributed sparing, disk failures, fault-tolerance, operation in degraded mode, rebuild processing, striping unit, small-write syndrome, disk cache, nonvolatile storage, fast writes, disk zoning, performance analysis, queuing theory, M/G/1 queues, fork-join synchronization, vacationing server model, disk response time, rebuild time, nonpreemptive and preemptive priority queuing.
Alexander Thomasian, Jai Menon, "RAID5 Performance with Distributed Sparing," IEEE Transactions on Parallel and Distributed Systems, vol. 8, no. 6, pp. 640-657, June 1997, doi:10.1109/71.595583
Usage of this product signifies your acceptance of the Terms of Use.