This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Improving Availability of RAID-Structured Storage Systems by Workload Outsourcing
January 2011 (vol. 60 no. 1)
pp. 64-79
Suzhen Wu, Huazhong University of Science and Technology, Wuhan and Xiamen University, Xiamen
Hong Jiang, University of Nebraska-Lincoln, Lincoln
Dan Feng, Huazhong University of Science and Technology, Wuhan
Lei Tian, Huazhong University of Science and Technology, Wuhan and University of Nebraska-Lincoln, Lincoln
Bo Mao, Huazhong University of Science and Technology, Wuhan
Due to the contention for the shared disk bandwidth, the user I/O intensity can significantly impact the performance of the online low-priority background tasks, thus reducing the reliability and availability of RAID-structured storage systems. In this paper, we propose a novel and practical scheme, called WorkOut (I/O Workload Outsourcing), to significantly boost the performance of those low-priority background tasks. WorkOut effectively outsources all write requests and popular read requests originally targeted at the degraded RAID set that is performing the low-priority background tasks to a surrogate RAID set. The lightweight prototype implementation of WorkOut and extensive trace-driven and benchmark-driven experiments on two case studies demonstrate that, compared with existing approaches, WorkOut effectively improves the performance of the low-priority background tasks, such as RAID reconstruction and RAID resynchronization. Importantly, WorkOut is portable and can be easily incorporated into any existing optimizing algorithms for RAID-structured storage systems.

[1] N. Agrawal, V. Prabhakaran, T. Wobber, J.D. Davis, M. Manasse, and R. Panigrahy, "Design Tradeoffs for SSD Performance," Proc. Ann. Technical Conf. (USENIX '08), June 2008.
[2] M. Arlitt and C. Williamson, "Web Server Workload Characterization: The Search for Invariants," Proc. Int'l Conf. Measurement and Modelling of Computer Systems (SIGMETRICS '96), May 1996.
[3] R. Arnan, E. Bachmat, T.K. Lam, and R. Michel, "Dynamic Data Reallocation in Disk Arrays," ACM Trans. Storage, vol. 3, no. 1, 2007.
[4] E. Bachmat and J. Schindler, "Analysis of Methods for Scheduling Low Priority Disk Drive Tasks," Proc. Int'l Conf. Measurement and Modelling of Computer Systems (SIGMETRICS '02), June 2002.
[5] L.N. Bairavasundaram, G.R. Goodson, S. Pasupathy, and J. Schindler, "An Analysis of Latent Sector Errors in Disk Drives," Proc. SIGMETRICS '07, June 2007.
[6] F. Chen, D.A. Koufaty, and X. Zhang, "Understanding Intrinsic Characteristics and System Implications of Flash Memory Based Solid State Drives," Proc. Int'l Joint Conf. Measurement and Modelling of Computer Systems (SIGMETRICS/Performance '09), June 2009.
[7] L. Cherkasova and G. Ciardo, "Characterizing Temporal Locality and Its Impact on Web Server Performance," Technical Report HPL-2000-82, Hewlett Packard Laboratories, July 2000.
[8] L. Cherkasova and M. Gupta, "Analysis of Enterprise Media Server Workloads: Access Patterns, Locality, Content Evolution, and Rates of Change," IEEE/ACM Trans. Networking, vol. 12, no. 5, pp. 781-794, Oct. 2004.
[9] T.E. Denehy, A.C. Arpaci-Dusseau, and R.H. Arpaci-Dusseau, "Journal-Guided Resynchronization for Software RAID," Proc. Conf. File and Storage Technologies (FAST '05), Dec. 2005.
[10] EMC Storage Products, http://www.emc.com/products/ category storage.htm, 2010.
[11] G. Gibson, "Reflections on Failure in Post-Terascale Parallel Computing. Keynote," Proc. Int'l Conf. Parallel Processing (ICPP '07), Sept. 2007.
[12] J. Gray, "Rules of Thumb in Data Engineering. Keynote Address," Proc. Int'l Conf. Data Eng. (ICDE '00), Feb. 2000.
[13] J.L. Hennessy and D.A. Patterson, Computer Architecture: A Quantitative Approach, fourth ed. Morgan Kaufmann, 2006.
[14] M. Holland, "On-Line Data Reconstruction in Redundant Disk Arrays," PhD thesis, Carnegie Mellon Univ., Apr. 1994.
[15] M. Holland and G. Gibson, "Parity Declustering for Continuous Operation in Redundant Disk Arrays," Proc. Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS '92), Oct. 1992.
[16] R. Hou, J. Menon, and Y. Patt, "Balancing I/O Response Time and Disk Rebuild Time in a RAID5 Disk Array," Proc. Hawaii Int'l Conf. System Sciences (HICSS '93), 1993.
[17] R. Hou and Y. Patt, "Using Non-Volatile Storage to Improve the Reliability of RAID5 Disk Arrays," Proc. Int'l Symp. Fault-Tolerant Computing (FTCS '97), 1997.
[18] HP Disk Storage Systems, http://h18006.www1.hp.com/storage/disk_storage index.html, 2010.
[19] IBM Disk Storage Systems, http://www-03.ibm.com/systems/storagedisk /, 2010.
[20] I. Iliadis, R. Haas, X.-Y. Hu, and E. Eleftheriou, "Disk Scrubbing versus Intra-Disk Redundancy for High-Reliability RAID Storage System," Proc. Int'l Conf. Measurement and Modelling of Computer Systems (SIGMETRICS '08), June 2008.
[21] Iometer, http://sourceforge.net/projectsiometer, 2010.
[22] W. Jiang, C. Hu, Y. Zhou, and A. Kanevsky, "Are Disks the Dominant Contributor for Storage Failures? A Comprehensive Study of Storage Subsystem Failure Characteristics," Proc. Conf. File and Storage Technologies (FAST '08), Feb. 2008.
[23] S. Kang and A.L.N. Reddy, "User-Centric Data Migration in Networked Storage Systems," Proc. IEEE Int'l Symp. Parallel and Distributed Processing (IPDPS '08), Apr. 2008.
[24] H.H. Kari, H.K. Saikkonen, N. Park, and F. Lombardi, "Analysis of Repair Algorithms for Mirrored-Disk Systems," IEEE Trans. Reliability, vol. 46, no. 2, pp. 193-200, June 1997.
[25] A.J. Klosterman and G. Ganger, "Cukoo: Layered Clustering for NFS," Technical Report CMU-CS-02-183, Carnegie Mellon Univ., Oct. 2002.
[26] A. Krioukov, L.N. Bairavasundaram, G.R. Goodson, K. Srinivasan, R. Thelen, A.C. Arpaci-Dusseau, and R.H. Arpaci-Dusseau, "Parity Lost and Parity Regained," Proc. Conf. File and Storage Technologies (FAST '08), Feb. 2008.
[27] J.Y.B. Lee and J.C.S. Lui, "Automatic Recovery from Disk Failure in Continuous-Media Servers," IEEE Trans. Parallel and Distributed Systems, vol. 13, no. 5, pp. 499-515, May 2002.
[28] Z. Li, Z. Chen, S.M. Srinivasan, and Y. Zhou, "C-Miner: Mining Block Correlations in Storage Systems," Proc. Conf. File and Storage Technologies (FAST '04), Mar. 2004.
[29] C. Lu, G.A. Alvarez, and J. Wilkes, "Aqueduct: Online Data Migration with Performance Guarantees," Proc. Conf. File and Storage Technologies (FAST '02), Jan. 2002.
[30] C.R. Lumb, J. Schindler, G.R. Ganger, D.F. Nagle, and E. Riedel, "Towards Higher Disk Head Utilization: Extracting Free Bandwidth from Busy Disk Drives," Proc. Symp. Operating Systems Design and Implementation (OSDI '00), Oct. 2000.
[31] M.P. Mesnier, M. Wachs, R.R. Sambasivan, J. Lopez, J. Hendricks, G.R. Ganger, and D. O'Hallaron, "//TRACE: Parallel Trace Replay with Approximate Causal Events," Proc. Conf. File and Storage Technologies (FAST '07), Feb. 2007.
[32] N. Mi, A. Riska, X. Li, E. Smirni, and E. Riedel, "Restrained Utilization of Idleness for Transparent Scheduling of Background Tasks," Proc. Int'l Joint Conf. Measurement and Modelling of Computer Systems (SIGMETRICS/Performance '09), June 2009.
[33] D. Narayanan, A. Donnelly, and A. Rowstron, "Write Off-Loading: Practical Power Management for Enterprise Storage," Proc. Conf. File and Storage Technologies (FAST '08), Feb. 2008.
[34] D. Narayanan, A. Donnelly, E. Thereska, S. Elnikety, and A. Rowstron, "Everest: Scaling Down Peak Loads Through I/O Off-Loading," Proc. Symp. Operating Systems Design and Implementation (OSDI '08), Dec. 2008.
[35] A. Oprea and A. Juels, "A Clean-Slate Look at Disk Scrubbing," Proc. Conf. File and Storage Technologies (FAST '10), Feb. 2010.
[36] J.-F. Pâris, A. Amer, and D.D.E. Long, "Using Storage Class Memories to Increase the Reliability of Two-Dimensional RAID Arrays," Proc. IEEE Int'l Conf. Modeling, Analysis and Simulation of Computer and Telecomm. Systems (MASCOTS '09), Sept. 2009.
[37] J. Piernas, T. Cortes, and J.M. García, "Tpcc-uva: A Free, Open-Source Implementation of the Tpc-c Benchmark," http://www.infor.uva.es/~diegotpcc-uva.html , 2005.
[38] E. Pinheiro, W.-D. Weber, and L.A. Barroso, "Failure Trends in a Large Disk Drive Population," Proc. Conf. File and Storage Technologies (FAST '07), Feb. 2007.
[39] A. Riska and E. Riedel, "Idle Read After Write—IRAW," Proc. Ann. Technical Conf. (USENIX '08), June 2008.
[40] M. Rosenblum and J.K. Ousterhout, "The Design and Implementation of a Log-Structured File System," Proc. ACM Symp. Operating Systems Principles (SOSP '91), Oct. 1991.
[41] B. Schroeder and G. Gibson, "Disk Failures in the Real World: What Does an MTTF of 1,000,000 Hours Mean to You?" Proc. Conf. File and Storage Technologies (FAST '07), Feb. 2007.
[42] B. Schroeder, A. Wierman, and M. Harchol-Balter, "Open versus Closed: A Cautionary Tale," Proc. Conf. Networked Systems Design and Implementation (NSDI '06), May 2006.
[43] T.J.E. Schwarz, Q. Xin, E.L. Miller, D.D.E. Long, A. Hospodor, and S. Ng, "Disk Scrubbing in Large Archival Storage Systems," Proc. IEEE Int'l Conf. Modeling, Analysis and Simulation of Computer and Telecomm. Systems (MASCOTS '04), Oct. 2004.
[44] M. Sivathanu, L.N. Bairavasundaram, A.C. Arpaci-Dusseau, and R.H. Arpaci-Dusseau, "Life or Death at Block-Level," Proc. Symp. Operating Systems Design and Implementation (OSDI '04), Dec. 2004.
[45] M. Sivathanu, V. Prabhakaran, F.I. Popovici, T.E. Denehy, A.C. Arpaci-Dusseau, and R.H. Arpaci-Dusseau, "Improving Storage System Availability with D-GRAID," Proc. Conf. File and Storage Technologies (FAST '04), Mar. 2004.
[46] Storage Performance Council, http://www.storageperformance. orghome, 2010.
[47] E. Thereska, J. Schindler, J. Bucy, B. Salmon, C.R. Lumb, and G.R. Ganger, "A Framework for Building Unobtrusive Disk Maintenance Applications," Proc. Conf. File and Storage Technologies (FAST '04), Apr. 2004.
[48] L. Tian, D. Feng, H. Jiang, K. Zhou, L. Zeng, J. Chen, Z. Wang, and Z. Song, "PRO: A Popularity-Based Multi-Threaded Reconstruction Optimization for RAID-Structured Storage Systems," Proc. Conf. File and Storage Technologies (FAST '07), Feb. 2007.
[49] L. Tian, H. Jiang, D. Feng, Q. Xin, and X. Shu, "Implementation and Evaluation of a Popularity-Based Reconstruction Optimization Algorithm in Availability-Oriented Disk Arrays," Proc. IEEE Conf. Mass Storage Systems and Technologies (MSST '07), Sept. 2007.
[50] TPC-C Specification, http://www.tpc.orgtpcc/, 2010.
[51] UMass Trace Repository, http://traces.cs.umass.edu/index.php/Storage Storage, 2010.
[52] M. Wachs, M. Abd-El-Malek, E. Thereska, and G.R. Ganger, "Argon: Performance Insulation for Shared Storage Servers," Proc. Conf. File and Storage Technologies (FAST '07), Feb. 2007.
[53] M. Wang, "Performance Modeling of Storage Devices using Machine Learning," PhD thesis, Carnegie Mellon Univ., Jan. 2006.
[54] C. Weddle, M. Oldham, J. Qian, A.A. Wang, P. Reiher, and G. Kuenning, "PARAID: The Gear-Shifting Power-Aware RAID," Proc. Conf. File and Storage Technologies (FAST '07), Feb. 2007.
[55] B. Welch, M. Unangst, Z. Abbasi, G. Gibson, B. Mueller, J. Small, J. Zelenka, and B. Zhou, "Scalable Performance of the Panasas Parallel File System," Proc. Conf. File and Storage Technologies (FAST '08), Feb. 2008.
[56] S. Wu, D. Feng, H. Jiang, B. Mao, L. Zeng, and J. Chen, "JOR: A Journal-guided Reconstruction Optimization for RAID-Structured Storage Systems," Proc. Int'l Conf. Parallel and Distributed Systems (ICPADS '09), Dec. 2009.
[57] S. Wu, H. Jiang, D. Feng, L. Tian, and B. Mao, "WorkOut: I/O Workload Outsourcing for Boosting RAID Reconstruction Performance," Proc. Conf. File and Storage Technologies (FAST '09), Feb. 2009.
[58] T. Xie and H. Wang, "MICRO: A Multilevel Caching-Based Reconstruction Optimization for Mobile Storage Systems," IEEE Trans. Computers, vol. 57, no. 10, pp. 1386-1398, Oct. 2008.
[59] Q. Xin, E.L. Miller, and T.J.E. Schwarz, "Evaluation of Distributed Recovery in Large-Scale Storage Systems," Proc. IEEE Int'l Conf. High performance Distributed Computing (HPDC '04), June 2004.
[60] Q. Xin, E.L. Miller, T.J.E. Schwarz, D.D.E. Long, S.A. Brandt, and W. Litwin, "Reliability Mechanisms for Very Large Storage Systems," Proc. IEEE Conf. Mass Storage Systems and Technologies (MSST '03), Apr. 2003.

Index Terms:
Low-priority background tasks, RAID reconstruction, reliability, availability, performance evaluation.
Citation:
Suzhen Wu, Hong Jiang, Dan Feng, Lei Tian, Bo Mao, "Improving Availability of RAID-Structured Storage Systems by Workload Outsourcing," IEEE Transactions on Computers, vol. 60, no. 1, pp. 64-79, Jan. 2011, doi:10.1109/TC.2010.206
Usage of this product signifies your acceptance of the Terms of Use.