This Article 
 Bibliographic References 
 Add to: 
Automatic Recovery from Disk Failure in Continuous-Media Servers
May 2002 (vol. 13 no. 5)
pp. 499-515

Continuous-media (CM) servers have been around for some years. Apart from server capacity, another important issue in the deployment of CM servers is reliability. This study investigates rebuild algorithms for automatically rebuilding data stored in a failed disk into a spare disk. Specifically, a block-based rebuild algorithm is studied with the rebuild time and buffer requirement modeled. A buffer-sharing scheme is then proposed to eliminate the additional buffers needed by the rebuild process. To further improve rebuild performance, a track-based rebuild algorithm that rebuilds lost data in tracks is proposed and analyzed. Results show that track-based rebuild, while it substantially outperforms block-based rebuild, requires significantly more buffers (17-135 percent more) even with buffer sharing. To tackle this problem, a novel pipelined rebuild algorithm is proposed to take advantage of the sequential property of track retrievals to pipeline the reading and writing processes. This pipelined rebuild algorithm achieves the same rebuild performance as track-based rebuild, but reduces the extra buffer requirement to insignificant levels (0.7-1.9 percent). Numerical results computed using models of five commercial disk drives demonstrate that automatic rebuild of a failed disk can be done in a reasonable amount of time, even at relatively high server utilization (e.g., less than 1.5 hours at 90 percent utilization).

[1] F.A. Tobagi, J. Pang, R. Baird, and M. Gang, “Streaming RAID—A Disk Array Management System For Video Files,” Proc. ACM Multimedia Conf., pp. 393–399, 1993.
[2] S. Berson, L. Golubchik, and R.R. Muntz, “Fault-Tolerant Design of Multimedia Servers,” Proc. SIGMOD '95, pp. 364–375, May 1995.
[3] A. Cohen, W.A. Burkhard, and P.V. Rangan, “Pipelined Disk Arrays for Digital Movie Retrieval,” Proc. Int'l Conf. Multimedia Computing and Systems, pp. 312-317, 1995.
[4] M.S. Chen, H.I. Hsiao, C.S. Li, and P.S. Yu, “Using Rotational Mirrored Declustering for Replica Placement in a Disk-Array-Based Video Server,” Proc. ACM Multimedia '95, Nov. 1995.
[5] B. Ozden, R. Rastogi, P.J. Shenoy, and A. Silberschatz, “Fault-Tolerant Architectures for Continuous Media Servers,” Proc. SIGMOD '96 Int'l Conf. Management of Data, pp. 79–90, June 1996.
[6] A.N. Mourad, “Issues in the Design of a Storage Server For Video-on-Demand,” ACM Multimedia Systems J., vol. 4, pp. 70-86, 1996.
[7] A. Cohen and W. Burkhard, “Segmented Information Dispersal (SID) for Efficient Reconstruction in Fault-Tolerant Video Servers,” Proc. ACM Multimedia 1996, pp. 277–286, Nov. 1996.
[8] J. Korst, Random Duplicate Assignment: An Alternative to Striping in Video Servers Proc. ACM Multimedia, pp. 219-226, 1997.
[9] P.M. Chen, E.K. Lee, G.A. Gibson, R.H. Katz, and D.A. Patterson, "RAID: High-Performance Reliable Secondary Storage," ACM Computing Surveys, vol. 36, no. 3, pp. 145-185, Aug. 1994.
[10] J. Menon and D. Mattson, “Performance of Disk Arrays in Transaction Processing Environments,” Proc. 12th Int'l Conf. Distributed Computing Systems, pp. 302-309, 1992.
[11] J. Menon and D. Mattson, “Distributed Sparing in Disk Arrays,” Proc. 37th IEEE Computer Society Int'l Conf. (COMPCON '92), pp. 410-421, Feb. 1992.
[12] R.Y. Hou and Y.N. Patt, “Comparing Rebuild Algorithms for Mirrored and RAID5 Disk Arrays,” Proc. 1993 ACM SIGMOD Int'l Conf. Management of Data, pp. 317-326, May 1993.
[13] R.Y. Hou, J. Menon, and Y.N. Patt, "Balancing I/O Response Time and Disk Rebuild Time in a RAID5 Disk Array," Proc. Hawaii Int'l Conf. System Sciences, vol. 1, pp. 70-79,Honolulu, Jan. 1993.
[14] A. Thomasian and J. Menon, "Performance Analysis of RAID5 Disk Arrays with a Vacationing Server Model for Rebuild Mode Operation," Proc. 10th Int'l Conf. Data Eng., pp. 111-119,Houston, Feb. 1994.
[15] A. Thomasian, "Rebuild Options in RAID5 Disk Arrays," Proc. Seventh IEEE Symp. Parallel and Distributed Systems, pp. 511-518,San Antonio, Tex., Oct. 1995.
[16] A. Thomasian and J. Menon, RAID5 Performance with Distributed Sparing IEEE Trans. Parallel and Distributed Systems, vol. 8, no. 6, pp. 640-657, June 1997.
[17] K. Mogi and M. Kitsuregawa, “Hot Mirroring: A Method of Hiding Parity Update Penalty and Degradation During Rebuilds for RAID5,” Proc. 1996 ACM SIGMOD Int'l Conf. Management of Data, pp. 183-194, June 1996.
[18] D.A. Patterson, G. Gibson, and R.H. Katz, “A Case for Redundant Arrays of Inexpensive Disks (RAID),” Proc. ACM SIGMOD Conf., pp. 109–116, 1988.
[19] A. Reddy and J. Wyllie, "I/O Issues in a Multimedia System," Computer, Mar. 1994, pp. 69-74.
[20] D.J. Gemmell et al., "Multimedia Storage Servers: A Tutorial," Computer, May 1995, pp. 40-49.
[21] G.R. Ganger, B.L. Worthington, and Y.N. Patt, The DiskSim Simulation Environment Version 2.0. Available at, Dec. 1999.

Index Terms:
Continuous media, server, disk, rebuild, fault tolerance, reliability, scheduler, block-based, track-based, performance analysis.
Jack Y.B. Lee, John C.S. Lui, "Automatic Recovery from Disk Failure in Continuous-Media Servers," IEEE Transactions on Parallel and Distributed Systems, vol. 13, no. 5, pp. 499-515, May 2002, doi:10.1109/TPDS.2002.1003860
Usage of this product signifies your acceptance of the Terms of Use.