The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.01 - January (2009 vol.21)
pp: 50-65
Yu Zhang , Purdue University, West Lafayette
Bharat Bhargava , Purdue University, West Lafayette
ABSTRACT
Performance of disk I/O schedulers is affected by many factors, such as workloads, file systems, and disk systems. Disk scheduling performance can be improved by tuning scheduler parameters, such as the length of read timers. Scheduler performance tuning is mostly done manually. To automate this process, we propose four self-learning disk scheduling schemes: Change-sensing Round-Robin, Feedback Learning, Per-request Learning, and Two-layer Learning. Experiments show that the novel Two-layer Learning Scheme performs best. It integrates the workload-level and request-level learning algorithms. It employs feedback learning techniques to analyze workloads, change scheduling policy, and tune scheduling parameters automatically. We discuss schemes to choose features for workload learning, divide and recognize workloads, generate training data, and integrate machine learning algorithms into the Two-layer Learning Scheme. We conducted experiments to compare the accuracy, performance, and overhead of five machine learning algorithms: Decision Tree, Logistic Regression, Na
INDEX TERMS
Sequencing and scheduling, Machine learning, Input/output, Application-transparent adaptation
CITATION
Yu Zhang, Bharat Bhargava, "Self-Learning Disk Scheduling", IEEE Transactions on Knowledge & Data Engineering, vol.21, no. 1, pp. 50-65, January 2009, doi:10.1109/TKDE.2008.116
REFERENCES
[1] S. Iyer and P. Druschel, “Anticipatory Scheduling: A Disk Scheduling Scheme to Overcome Deceptive Idleness in Synchronous I/O,” Proc. 18th ACM Symp. Operating Systems Principles (SOSP '01), Sept. 2001.
[2] D.L. Martens and M.J. Katchabaw, “Optimizing System Performance through Dynamic Disk Scheduling Algorithm Selection,” WSEAS Trans. Information Science and Applications, 2006.
[3] C. Ruemmler and J. Wilkes, “An Introduction to Disk Drive Modeling,” Computer, vol. 27, no. 3, pp. 17-29, Mar. 1994.
[4] F. Popovici, A.C. Arpaci-Dusseau, and R.H. Arpaci-Dusseau, “Robust, Portable I/O Scheduling with the Disk Mimic,” Proc. Usenix Ann. Technical Conf., June 2003.
[5] S. Pratt, “Workload-Dependent Performance Evaluation of the Linux 2.6 I/O Schedulers,” Proc. Linux Symp., 2005.
[6] D.A. Patterson, G.A. Gibson, and R.H. Katz, “Case for Redundant Arrays of Inexpensive Disks (RAID),” Proc. ACM SIGMOD, 1988.
[7] P.J. Shenoy and H.M. Vin, “Cello: A Disk Scheduling Scheme for Next Generation Operating Systems,” Proc. ACM SIGMETRICS, 1998.
[8] R.K. Abbort and H. Garcia-Molina, “Scheduling I/O Requests with Deadlines: A Performance Evaluation,” Proc. Real-Time Systems Symp. (RTSS), 1990.
[9] T.J. Teorey and T.B. Pinkerton, “A Comparative Analysis of Disk Scheduling Policies,” Comm. ACM, 1972.
[10] M. Seltzer, P. Chen, and J. Ousterhout, “Disk Scheduling Revisited,” Proc. Winter Usenix Conf., pp. 313-323, 1990.
[11] J. Nieh and M.S. Lam, “The Design, Implementation and Evaluation of SMART: A Scheduler for Multimedia Applications,” Proc. 16th ACM Symp. Operating Systems Principles (SOSP '97), Oct. 1997.
[12] S.R. Seelam, J.S. Babu, and P. Teller, “Automatic I/O Scheduler Selection for Latency and Bandwidth Optimization,” Proc. Workshop Operating System Interference in High Performance Applications, Sept. 2005.
[13] E. Anderson, M. Hobbs, K. Keeton, S. Spence, M. Uysal, and A. Veitch, “Hippodrome: Running Circles around Storage Administration,” Proc. First Usenix Conf. File and Storage Technologies (FAST '02), Jan. 2002.
[14] M. Wang, “Black-Box Storage Device Modeling with Learning,” PhD dissertation, Carnegie Mellon Univ., 2006.
[15] M. Stillger, G. Lohman, V. Markl, and M. Kandil, “LEO—DB2'S Learning Optimizer,” Proc. 27th Int'l Conf. Very Large Data Bases (VLDB), 2001.
[16] F. Hidrobo and T. Cortes, “Toward a Zero-Knowledge Model for Disk Drives,” Proc. Autonomic Computing Workshop (AMS '03), June 2003.
[17] K. Lund and V. Goebel, “Adaptive Disk Scheduling in a Multimedia DBMS,” Proc. 11th ACM Int'l Conf. Multimedia, 2003.
[18] C. Ruemmler and J. Wilkes, “An Introduction to Disk Drive Modeling,” Computer, vol. 27, no. 3, pp. 17-29, Mar. 1994.
[19] D.M. Jacobson and J. Wilkes, “Disk Scheduling Algorithms Based on Rotational Position,” Technical Report HPL-CSP-91-7, HP Laboratories, 1991.
[20] M.F. Mokbel, W.G. Aref, K. El-Bassyouni, and I. Kamel, “Scalable Multimedia Disk Scheduling,” Proc. 20th Int'l Conf. Data Eng. (ICDE), 2004.
[21] J. Bruno, J. Brustoloni, E. Gabber, B. Ozden, and A. Silberschatz, “Disk Scheduling with Quality of Service Guarantees,” Proc. IEEE Int'l Conf. Multimedia Computing and Systems (ICMCS '99), vol. 2, p.400, June 1999.
[22] M.E. Gomez and V. Santonja, “Analysis of Self-Similarity in I/O Workload Using Structural Modeling,” Proc. Seventh IEEE Int'l Symp. Modeling, Analysis, and Simulation of Computer and Telecomm. Systems (MASCOTS), 1999.
[23] J. Wilkes, “Traveling to Rome: QoS Specifications for Automated Storage System Management,” Proc. Ninth Int'l Workshop Quality of Service (IWQoS '01), pp. 75-91, June 2001.
[24] W.E. Leland, M.S. Taqqu, W. Willinger, and D.V. Wilson, “On the Self-Similar Nature of Ethernet Traffic,” Proc. ACM SIGCOMM '93, Sept. 1993.
[25] R. Bryant, R. Forester, and J. Hawkes, “Filesystem Performance and Scalability in Linux 2.4.17,” Proc. FREENIX Track: Usenix Ann. Technical Conf., 2002.
[26] P. Cao, E.W. Felten, A.R. Karlin, and K. Li, “A Study of Integrated Prefetching and Caching Strategies,” Measurement and Modeling of Computer Systems, 1995.
[27] H. Dai, M. Neufeld, and R. Han, “ELF: An Efficient Log-Structured Flash File System for Micro Sensor Nodes,” Proc. Second Int'l Conf. Embedded Networked Sensor Systems, pp. 176-187, 2004.
[28] B.L. Worthington, G.R. Ganger, Y.N. Patt, and J. Wilkes, “On-Line Extraction of SCSI Disk Drive Parameters,” Proc. ACM SIGMETRICS, May 1995.
[29] S.T. Jones, A.C. Arpaci-Dusseau, and R.H. Arpaci-Dusseau, “Antfarm: Tracking Processes in a Virtual Machine Environment,” Proc. Usenix Ann. Technical Conf., June 2006.
[30] Linux Kernel Documentation, Anticipatory Scheduler, http://www. linuxhq.com//kernel/v3.6/8/Documentation as-iosched.txt, 2007.
[31] O. Raz, R. Buchheit, M. Shaw, P. Koopman, and C. Faloutsos, “Automated Assistance for Eliciting User Expectations,” Proc. 16th Int'l Conf. Software Eng. and Knowledge Eng. (SEKE '04), June 2004.
[32] T.M. Madhyastha and D.A. Reed, “Intelligent, Adaptive File System Policy Selection,” Proc. Sixth Symp. Frontiers of Massively Parallel Computing (Frontiers '96), Oct. 1996.
[33] Z. Li, Z. Chen, S.M. Srinivasan, and Y. Zhou, “C-Miner: Mining Block Correlations in Storage Systems,” Proc. Third Usenix Conf. File and Storage Technologies (FAST '04), Mar. 2004.
[34] M. Sivathanu, V. Prabhakaran, F.I. Popovici, T.E. Denehy, A.C. Arpaci-Dusseau, and R.H. Arpaci-Dusseau, “Semantically-Smart Disk Systems,” Proc. Second Usenix Conf. File and Storage Technologies (FAST '03), pp. 73-89, 2003.
[35] N. Littlestone and M.K. Warmuth, “The Weighted Majority Algorithm,” Proc. 30th Ann. Symp. Foundations of Computer Science (FOCS '89), pp. 256-261, 1989.
[36] D. Helmbold, D. Long, T. Sconyers, and B. Sherrod, “Adaptive Disk Spin-Down for Mobile Computers,” Mobile Networks and Applications, vol. 5, no. 4, pp. 285-297, 2000.
[37] K. Shen, M. Zhong, and C. Li, “I/O System Performance Debugging Using Model-Driven Anomaly Characterization,” Proc. Fourth Usenix Conf. File and Storage Technologies (FAST '05), Dec. 2005.
[38] Z. Dimitrijevic, R. Rangaswami, and E. Chang, “Preemptive RAID Scheduling,” UCSB Technical Report TR-2004-19, 2004.
[39] T.M. Madhyastha and D.A. Reed, “Intelligent, Adaptive File System Policy Selection,” Proc. Sixth Symp. Frontiers of Massively Parallel Computation (Frontiers '96), Oct. 1996.
[40] T.M. Madhyastha and D.A. Reed, “Input/Output Access Pattern Classification Using Hidden Markov Models,” Proc. Workshop Input/Output in Parallel and Distributed Systems, Nov. 1997.
[41] M. Karlsson, C. Karamanolis, and X. Zhu, “Triage: Performance Isolation and Differentiation for Storage Systems,” Proc. 12th Int'l Workshop Quality of Service (IWQoS '04), June 2004.
[42] A. Riska, E. Riedel, and S. Iren, “Managing Overload via Adaptive Scheduling,” Proc. First Workshop Algorithms and Architecture for Self-Managing Systems, June 2003.
[43] J. Schindler and G.R. Ganger, “Automated Disk Drive Characterization,” CMU SCS Technical Report CMU-CS-99-176, Dec. 1999.
[44] C.R. Lumb, J. Schindler, and G.R. Ganger, “Freeblock Scheduling Outside of Disk Firmware,” Proc. First Usenix Conf. File and Storage Technologies (FAST '02), Jan. 2002.
[45] A. Riska and E. Riedel, “Disk Drive Level Workload Characterization,” Proc. Usenix Ann. Technical Conf., June 2006.
[46] M. Wang, K. Au, A. Ailamaki, A. Brockwell, C. Faloutsos, and G.R. Ganger, “Storage Device Performance Prediction with CART Models,” SIGMETRICS Performance Evaluation Rev., vol. 32, no. 1, pp. 412-413, 2004.
[47] J. Wildstrom, P. Stone, E. Witchel, and M. Dahlin, “Machine Learning for On-Line Hardware Reconfiguration,” Proc. 20th Int'l Joint Conf. Artificial Intelligence (IJCAI '07), Jan. 2007.
[48] M.I. Seltzer and C. Small, “Self-Monitoring and Self-Adapting Operating Systems,” Proc. Sixth Workshop Hot Topics in Operating Systems (HotOS '97), May 1997.
[49] R. Kohavi, J.R. Quinlan, W. Klosgen, and J.M. Zytkow, “Decision-Tree Discovery,” Handbook of Data Mining and Knowledge Discovery, Oxford Univ. Press, 2003.
[50] T.M. Mitchell, Machine Learning. McGraw-Hill, 1997.
[51] MySQL Doc, http://dev.mysql.com/doc/refman/5.0/enindex. html , 2008.
[52] Apache HTTP Server Benchmarking Tool, http://httpd.apache.org/docs/3.0/programs ab.html, 2007.
[53] Nearest Neighbor Pattern Classification Techniques, B.V. Dasarathy, ed. IEEE CS Press, 1990.
[54] P.A. Devijver and J. Kittler, Pattern Recognition: A Statistical Approach. Prentice Hall, 1982.
[55] I. Rish, “An Empirical Study of the Naive Bayes Classifier,” Proc. IJCAI Workshop Empirical Methods in AI, 2001.
[56] M. Collins, R.E. Schapire, and Y. Singer, “Logistic Regression, Adaboost and Bregman Distances,” Proc. 13th Ann. Conf. Computational Learning Theory (COLT '00), pp. 158-169, 2000.
[57] R.O. Duda, Pattern Classification, second ed. John Wiley & Sons, 2004.
[58] C.J.C. Burges, “A Tutorial on Support Vector Machines for Pattern Recognition,” Data Mining and Knowledge Discovery, vol. 2, no. 2, pp. 121-167, 1998.
[59] http://www.cs.waikato.ac.nz/mlweka/, 2008.
[60] http:/www.iometer.org/, 2008.
14 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool