This Article 
 Bibliographic References 
 Add to: 
Multiround Algorithms for Scheduling Divisible Loads
November 2005 (vol. 16 no. 11)
pp. 1092-1102

Abstract—Divisible load applications occur in many fields of science and engineering and can be easily parallelized in a master-worker fashion, but pose several scheduling challenges. While a number of approaches have been proposed that allocate load to workers in a single round, using multiple rounds improves overlap of computation with communication. Unfortunately, multiround algorithms are difficult to analyze and have thus received only limited attention. In this paper, we answer three open questions in the multiround divisible load scheduling area: 1) how to account for latencies, 2) how to account for heterogeneous platforms, and 3) how many rounds should be used. To answer 1), we derive the first closed-form optimal schedule for a homogeneous platform with both computation and communication latencies, for a given number of rounds. To answer 2) and 3), we present a novel algorithm, UMR. We evaluate UMR in a variety of realistic scenarios.

[1] E.G. Coffman, M.R. Garey, and D.S. Johnson, Bin Packing Approximation Algorithms: A Survey. Boston: PWS Publishing Co., 1996.
[2] M. Drozdowski and P. Wolniewicz, “Experiments with Scheduling Divisible Tasks in Clusters of Workstations,” Proc. Int'l Conf. Parallel and Distributed Computing (Europar), pp. 311-319, 2000.
[3] V. Bharadwaj and S. Ranganath, “Theoretical and Experimental Study on Large Size Image Processing Applications Using Divisible Load Paradigm on Distributed Bus Networks,” Image and Vision Computing, vol. 20, nos. 13-14, pp. 917-1034, 2002.
[4] C.K. Lee and M. Hamdia, “Parallel Image Processing Applications on a Network of Workstations,” Parallel Computing, vol. 21, pp. 137-160, 1995.
[5] G. Miller, D.G. Payne, T.N. Phung, H. Siegel, and R. Williams, “Parallel Processing of Spaceborne Imaging Radar Data,” Proc. Int'l Conf. High Performance Computing and Comm. (SC '95), 1995.
[6] Y.-J. Chiang, R. Farias, C.T. Silva, and B. Wei, “A Unified Infrastructure for Parallel Out-of-Core Isosurface Extraction and Volume Rendering of Unstructured Grids,” Proc. IEEE Symp. Parallel and Large-Data Visualization and Graphics, pp. 59-66, 2001.
[7] W. Bethel, B. Tierney, J. lee, D. Gunter, and S. Lau, “Using High-speed WANs and Network Data Caches to Enable Remote and Distributed Visualization,” Proc. Int'l Conf. High Performance Computing and Comm. (SC '00), 2000.
[8] A. Garcia and H.-W. Shen, “Parallel Volume Rendering: An Interleaved Parallel Volume Renderer with PC-Clusters,” Proc. Fourth Eurographics Workshop Parallel Graphics and Visualization, pp. 51-59, 2002.
[9] Visible Human Project, 2003, .
[10] V. Bharadwaj, D. Ghose, V. Mani, and T.G. Robertazzi, Scheduling Divisible Loads in Parallel and Distributed Systems. IEEE CS Press, 1996.
[11] Cluster Computing, special issue on divisible load scheduling, vol. 6, no. 1, 2003.
[12] Grid2: Blueprint for a New Computing Infrastructure, I. Foster and C. Kesselman, eds., second ed. San Francisco: Morgan Kaufmann Publishers, 2003.
[13] V. Bharadwaj, D. Ghose, and V. Mani, “Optimal Sequencing and Arrangement in Single-Level Tree Networks with Communication Delays,” IEEE Trans. Parallel and Distributed Systems, vol. 5, no. 9, pp. 968-976 Sept. 1994.
[14] V. Bharadwaj, D. Ghose, and V. Mani, “Multi-Installment Load Distribution in Tree Networks with Delays,” IEEE Trans. Aerospace and Electronc Systems, vol. 31, no. 2, pp. 555-567, 1995.
[15] J. Blazewicz and M. Drozdowski, “Distributed Processing of Divisible Jobs with Communication Startup Costs,” Discrete Applied Math., vol. 76, pp. 21-41, 1997.
[16] V. Bharadwaj, X. Li, and C.C. Ko, “On the Influence of Start-Up Costs in Scheduling Divisible Loads on Bus Networks,” IEEE Trans. Aerospace and Electronic Systems, vol. 11, no. 12, pp. 1288-1305, 2000.
[17] O. Beaumont, A. Legrand, and Y. Robert, “Optimal Algorithms for Scheduling Divisible Workloads on Heterogeneous Systems,” Technical Report 2002-36, Ecole Normale Superieure de Lyon, Oct. 2002.
[18] S. Bataineh, T.-Y. Hsiung, and T.G. Robertazzi, “Closed Form Solutions for Bus and Tree Networks of Processors Load Sharing a Divisible Job,” IEEE Trans. Computers, vol. 43, no. 10, Oct. 1994.
[19] K. Li, “Scheduling Divisible Tasks on Heterogeneous Linear Arrays with Applications to Layered Networks,” Proc. Third Int'l Workshop Parallel and Distributed Scientific and Eng. Computing with Applications, 2002.
[20] M. Drozdowski and W. Glazek, “Scheduling Divisible Loads in a Three-Dimensional Mesh of Processors,” Parallel Computing, vol. 25, no. 4, 1999.
[21] K. Li, “Parallel Processing of Divisible Loads on Partitionable Static Interconnection Networks,” Cluster Computing, vol. 6, no. 1, pp. 47-55, 2003.
[22] D. Ghose and V. Mani, “Distributed Computation with Communication Delays: Asymptotic Performance Analysis,” J. Parallel and Distributed Computing, vol. 23, no. 3, 1994.
[23] J. Blazewicz, “Performance Limits of a Two-Dimensional Network of Load Sharing Processors,” Foundations of Computing and Decision Sciences, vol. 21, no. 1, pp. 3-15, 1996.
[24] H.-J. Kim, G.-I. Jee, and J.-G. Lee, “Optimal Load Distribution for Tree Network Processors,” IEEE Trans. Aerospace and Electronic Systems, vol. 32, no. 2, pp. 607-611, 1996.
[25] A.L. Rosenberg, “Sharing Partitionable Workloads in Heterogeneous NOWs: Greedier Is Not Better,” Proc. Third IEEE Int'l Conf. Cluster Computing (CLUSTER '01), pp. 124-131, 2001.
[26] M. Drozdowski and P. Wolniewicz, “Divisible Load Scheduling in Systems with Limited Memory,” Cluster Computing, vol. 6, no. 1, pp. 19-29, 2003.
[27] S.F. Hummel, “Factoring: A Method for Scheduling Parallel Loops,” Comm. ACM, vol. 35, no. 8, pp. 90-101, Aug. 1992.
[28] T. Hagerup, “Allocating Independent Tasks to Parallel Processors: An Experimental Study,” J. Parallel and Distributed Computing, vol. 47, pp. 185-197, 1997.
[29] Y. Yang and H. Casanova, “RUMR: Robust Scheduling for Divisible Workloads,” Proc. 12th IEEE Symp. High-Performance Distributed Computing (HPDC-12), pp. 114-125, June 2003.
[30] D. Altilar and Y. Paker, “An Optimal Scheduling Algorithm for Parallel Video Processing,” Proc. IEEE Int'l Conf. Multimedia Computing and Systems, pp. 245-248, 1998.
[31] O. Beaumont, L. Carter, J. Ferrante, A. Legrand, and Y. Robert, “Bandwidth-Centric Allocation of Independent Tasks on Heterogeneous Platforms,” Proc. Int'l Parallel and Distributed Processing Symp. (IPDPS), June 2002.
[32] O. Beaumont, A. Legrand, and Y. Robert, “The Master-Slave Paradigm with Heterogeneous Processors,” Technical Report RR2001-13, Ecole Normale Superieure de Lyon, Mar. 2001.
[33] HMMER Webpage, http:/, 2003.
[34] A. Watt, 3D Computer Graphics, chapter 13, third ed. Addison-Wesley, 2000.
[35] K. Shen, L.A. Rowe, and E.J. Delp, “A Parallel Implementation of an MPEG1 Encoder: Faster than Real-Time!,” Proc. SPIE Conf. Digital Video Compression: Algorithms and Technologies, pp. 407-418, Feb. 1995.
[36] F.J. Gonzalez-Castano, R. Asorey-Cacheda, R.P. Martinez-Alvarez, F. Comesana-Seijo, and J. Vales-Alonso, “DVD Transcoding via Linux Metacomputing,” Linux J., vol. 116, p. 8, 2003.
[37] N. Amano, J. Gama, and F. Silva, “Exploiting Parallelism in Decision Tree Induction,” Proc. ECML/PKDD Workshop Parallel and Distributed Computing for Machine Learning, pp. 13-22, Sept. 2003.
[38] S. Goil and A. Choudhary, “High Performance Multidimensional Analysis of Large Data Sets,” Proc. First ACM Int'l Workshop Data Warehousing and OLAP, pp. 34-39, 1998.
[39] D. Skillicorn, “Strategies for Parallel Data Mining,” IEEE Concurrency, vol. 7, no. 4, pp. 26-35, 1999.
[40] Mencoder Media Player, http:/, 2004.
[41] VFleet Webpage,, 2003.
[42] M. Tamura, T. an Oguchi, and M. Kitsuregawa, “Parallel Database Processing on a 100 Node PC Cluster: Cases for Decision Support Query Processing and Data Mining,” Proc. Int'l Conf. High Performance Computing and Comm., pp. 1-16, Nov. 1997.
[43] Y. Yang and H. Casanova, “Extensions to the Multi-Installment Algorithm: Affine Costs and Output Data Transfers,” Technical Report CS2003-0754, Dept. of Computer Science and Eng., Univ. of California, San Diego, July 2003.
[44] G. Chun, H. Dail, H. Casanova, and A. Snavely, “Benchmark Probes for Grid Assessment,” Proc. High-Performance Grid Computing Workshop, Apr. 2004.
[45] R.L. Graham, D.E. Knuth, and O. Patashnik, Concrete Mathematics. Wiley, 1994.
[46] Y. Yang and H. Casanova, “Multi-Round Algorithm for Scheduling Divisible Workload Applications: Analysis and Experimental Evaluation,” Technical Report CS2002-0721, Dept. of Computer Science and Eng., Univ. of California, San Diego, 2002.
[47] D. Bertsekas, Constrained Optimization and Lagrange Multiplier Methods. Belmont, Mass.: Athena Scientific, 1996.
[48] A. Legrand, L. Marchal, and H. Casanova, “Scheduling Distributed Applications: The SimGrid Simulation Framework,” Proc. Third IEEE Int'l Symp. Cluster Computing and the Grid (CCGrid '03), May 2003.
[49] Y. Yang and H. Casanova, “UMR: A Multi-Round Algorithm for Scheduling Divisible Workloads,” Proc. Int'l Parallel and Distributed Processing Symp. (IPDPS 2003), Apr. 2003.
[50] K. van der Raadt, Y. Yang, and H. Casanova, “APST-DV: Divisible Load Scheduling and Deployment on the Grid,” Technical Report CS2004-0785, Dept. of Computer Science and Eng., Univ. of California, San Diego, Apr. 2004.
[51] H. Casanova, “Modeling Large-Scale Platforms for the Analysis and the Simulation of Scheduling Strategies,” Proc. Sixth Workshop Advances in Parallel and Distributed Computational Models, Apr. 2004.

Index Terms:
Parallel processing, scheduling, divisible loads, multiround algorithms.
Yang Yang, Krijn van der Raadt, Henri Casanova, "Multiround Algorithms for Scheduling Divisible Loads," IEEE Transactions on Parallel and Distributed Systems, vol. 16, no. 11, pp. 1092-1102, Nov. 2005, doi:10.1109/TPDS.2005.139
Usage of this product signifies your acceptance of the Terms of Use.