This Article 
 Bibliographic References 
 Add to: 
File Assignment in Parallel I/O Systems with Minimal Variance of Service Time
February 2000 (vol. 49 no. 2)
pp. 127-140

Abstract—We address the problem of assigning nonpartitioned files in a parallel I/O system where the file accesses exhibit Poisson arrival rates and fixed service times. We present two new file assignment algorithms based on open queuing networks which aim at minimizing simultaneously the load balance across all disks, as well as the variance of the service time at each disk. We first present an off-line algorithm, Sort Partition, which assigns to each disk files with similar access time. Next, we show that, assuming that a perfectly balanced file assignment can be found for a given set of files, Sort Partition will find the one with minimal mean response time. We then present an on-line algorithm, Hybrid Partition, that assigns groups of files with similar service times in successive intervals while guaranteeing that the load imbalance at any point does not exceed a certain threshold. We report on synthetic experiments which exhibit skew in file accesses and sizes and we compare the performance of our new algorithms with the vanilla greedy file allocation algorithm.

[1] Y. Azar, A. Broder, and A. Karlin, “On-Line Load Balancing,” Theoretical Computer Science, vol. 130, 1994.
[2] Y. Bartal, A. Fiat, H. Karloff, and R. Vohra, “New Algorithms for an Ancient Scheduling Problem,” Proc. 24th ACM Symp. Theory of Computing, pp. 51-58, 1992.
[3] P.M. Chen, E.K. Lee, G.A. Gibson, R.H. Katz, and D.A. Patterson, "RAID: High-Performance Reliable Secondary Storage," ACM Computing Surveys, vol. 36, no. 3, pp. 145-185, Aug. 1994.
[4] E. Coffman, M. Garey, and D. Johnson, “An Application of Bin-Packing to Multiprocessor Scheduling,” SIAM J. Computing, vol. 7, no. 1, pp. 1-17, 1978.
[5] E. Coffman, G. Grederickson, and G. Lueker, “A Note on Expected Makespans for Largest-First Sequences of Independent Tasks on Two Processors,” Math. Operations Research, vol. 9, 1984.
[6] G. Copeland, W. Alexander, E. Boughter, and T. Keller,“Data placement in Bubba,”inProc. ACM SIGMOD Conf., May 1988, pp. 99–108.
[7] R. Dewan and B. Gavish, “Models for the Combined Logical and Physical Design of Databases,” IEEE Trans. Computers, vol. 38, no. 7, pp. 955-967, July 1989.
[8] L.W. Dowdy and D.V. Foster, "Comparative Models of the File Assignment Problem," ACM Computing Surveys, vol. 14, no. 2, 1982.
[9] U. Faigle, W. Kern, and G. Turan, “On the Performance of On-Line Algorithms for Particular Problems” Acta Cybernetica, vol. 9, pp. 107-119, 1989.
[10] M.R. Garey and D.S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness.New York: W.H. Freeman, 1979.
[11] R.L. Graham, “Bounds on Multiprocessing Timing Anomalies,” SIAM J. Applied Math., vol. 17, no. 2, pp. 416-429, 1969.
[12] D. Karger, S. Phillips, and E. Torng, “A Better Algorithm for an Ancient Scheduling Problem,” Proc. Fifth ACM Symp. Discrete Algorithms, 1994.
[13] D.E. Knuth, The Art of Computer Programming. Addison-Wesley, 1973.
[14] J. Kurose and R. Simha, “A Microeconomic Approach to Optimal Resource Allocation in Distributed Computer Systems,” IEEE Trans. Computers, vol. 38, no. 5, May 1989.
[15] T. Kwan, R. Mcgrath, and D. Reed, “Ncsas World Wide Web Server Design and Performance,” Computer, vol. 28, no. 11, pp. 67-74, Nov. 1995.
[16] C. Lee and K. Hua, “A Self-Adjusting Data Distribution Mechanism for Multidimensional Load Balancing in Multiprocessor-Based Database Systems,” Information Systems, vol. 19, no. 7, pp. 549-567, 1994.
[17] H. Lee and T. Park, “Allocating Data and Workload among Multiple Servers in a Local Area Network,” Information Systems, vol. 20, no. 3, 1995.
[18] S. March and S. Rho, “Allocationg Data and Operations to Nodes in Distributed Database Design,” IEEE Trans. Knowledge and Data Eng., vol. 7, no. 2, pp. 305-317, Mar./Apr. 1995.
[19] J. Ousterhout, H. Da Costa, D. Harrison, J. Kunze, M. Kupfer, and J. Thompson, “A Trace-Driven Analysis of the UNIX 4.2BSD File System” Technical Report CSD-85-230, Univ. of California at Berkeley, 1985.
[20] D. Rotem, G. Schloss, and A. Segev, “Data Allocation of Multidisk Databases,” IEEE Trans. Knowledge and Data Eng., vol. 5, no. 5, pp. 882-887, Sept./Oct. 1993.
[21] J. Salehi et al., "Optimal Buffering for the Delivery of Compressed Prerecorded Video," Proc. ACM Special Interest Group on Computer/Communication System Performance (Sigmetrics 96), ACM Press, New York, 1996, pp. 222-231.
[22] P. Scheuermann, G. Weikum, and P. Zabback, “Disk Cooling in Parallel Disk Systems,” IEEE Data Eng. Bulletin, vol. 17, no. 3, pp. 29-40, 1994.
[23] P. Scheuermann, G. Weikum, and P. Zabback, "Data Partitioning and Load Balancing in Parallel Disk Systems," VLDB J., vol. 7, no. 1, pp. 48-66, 1998.
[24] B. Wah, “File Placement on Distributed Computer Systems,” Computer, vol. 17, no. 1, pp. 23-32, Jan. 1984.
[25] J. Wolf, “The Placement Optimization Program: A Practical Solution to the Disk File Assignment Problem,” Proc. Int'l Conf. Measurement and Modeling of Computer Systems, pp. 1-10, 1989.
[26] J. Wolf, K. Pattipati, “A File Assignment Problem Model for Extended Local Area Network Environments,” Proc. 10th Int'l Conf. Distributed Computing Systems, 1990.
[27] J.L. Wolf, P.S. Yu, J. Turek, and D.M. Dias, “A Parallel Hash Join Algorithm for Managing Data Skew,” IEEE Trans. Parallel and Distributed Systems, vol. 4, no. 12, 1993.
[28] P. Zabback, “I/O Parallelism in Database Systems—Design, Implementation, and Evaluation of a Storage System for Parallel Disks,” PhD thesis, Dept. of Computer Science ETH Zurich, 1994 (in German).
[29] P. Zabback, I. Onyuksel, P. Scheuermann, and G. Weikum, “Database Reorganization in Parallel Disk Arrays with I/O Service Stealing,” IEEE Trans. Knowledge and Data Eng., vol. 10, no. 5, pp. 855-858, Sept./Oct. 1998.

Index Terms:
File allocation, parallel I/O systems, load balancing, variance of service time, heuristic algorithms.
Lin-Wen Lee, Peter Scheuermann, Radek Vingralek, "File Assignment in Parallel I/O Systems with Minimal Variance of Service Time," IEEE Transactions on Computers, vol. 49, no. 2, pp. 127-140, Feb. 2000, doi:10.1109/12.833109
Usage of this product signifies your acceptance of the Terms of Use.