This Article 
 Bibliographic References 
 Add to: 
Scheduling of Computing Services on Intranet Networks
July-September 2011 (vol. 4 no. 3)
pp. 207-215
Blaise Omer Yenke, Ngaoundere Institute of Technology, Ngaoundere and INRIA Mescal team, CNRS LIG lab., Grenoble
Jean-François Mehaut, INRIA Mescal team, CNRS LIG lab., Grenoble
Maurice Tchuente, IRD UMI 209 UMMISCO, Bondy and University of Yaounde I, Cameroon
Nowadays, enterprises can provide computing services through their intranet networks by letting their available resources be used as virtual clusters for scientific computation during idle periods such as nights, weekends, and holidays. Generally, these idle periods do not permit to carry out the computations completely. It is therefore necessary to save the context of uncompleted applications for possible restart. This checkpointing mechanism is subject to resource constraints: the network bandwidth, the disk bandwidth, and the delay T imposed for releasing the workstations. We first introduce a function bw that gives the bandwidth bw(m,V) of a system during the checkpointing of m applications with aggregated memory requirement V. Assuming that this bandwidth is shared equitably among the applications, the scheduling problem becomes a sequence of knapsack problems with nonlinear constraints for which we propose approximate solutions. Experiments carried out on Grid5000 show that the running time of this algorithm is negligible compared to the delay T which is of the order of few minutes. This means that the proposed scheduling algorithm does not induce a significant overhead on the checkpointing process. As a consequence, our mechanism can be incorporated in a batch scheduler.

[1] http:/, 2011.
[2] seti@home: Search for Extraterrestrial Intelligence at Home, http:/, 2011.
[3] http:/, 2011.
[4] libraryrapport-grid 5000-v3.pdf, 2011.
[5] http:/, 2011.
[6] http:/, 2011.
[7] http:/, 2011.
[8] N. Capit, G. Da Costa, Y. Georgiou, G. Huard, C. Martin, G. Mounie, P. Neyron, and O. Richard, "A Batch Scheduler with High Level Components," Proc. Fifth Int'l Symp. Cluster Computing and Grid (CCGrid '05), May 2005.
[9] J.T. Daly, "A Higher Order Estimate of the Optimum Checkpoint Interval for Restart Dumps," Future Generation Computer Systems, vol. 22, no. 3, pp. 303-312, 2006.
[10] R.Y. de Camargo, R. Cerqueira, and F. Kon, "Strategies for Checkpoint Storage on Opportunistic Grids," IEEE Distributed Systems Online, vol. 7, no. 9, p. 1, Sept. 2006.
[11] J. Duell, P. Hargrove, and E. Roman, "The Design and Implementation of Berkeley Lab's Linux Checkpoint/Restart," Technical Report LBNL-54941, Berkeley Lab, Nov. 2003.
[12] F. Dupros, F. Boulahya, J. Vairon, P. Lombard, N. Capit, and J-F. Mehaut, "IGGI, a Computing Framework for Large Scale Parametric Simulations: Application to Uncertainty Analysis with Toughreact," Proc. Tough Symp., 2006.
[13] C. Eddy and U. Gil, "On the Performance of Parallel Factorization of Out-of-Core Matrices," Parallel Computing, vol. 30, no. 3, pp. 357-375, Feb. 2004.
[14] I. Foster, T. Freeman, K. Keahy, D. Scheftner, B. Sotomayer, and X. Zhang, "Virtual Clusters for Grid Communities," Proc. Sixth IEEE Int'l Symp. Cluster Computing and the Grid (CCGrid '06), pp. 513-520, 2006.
[15] W. Gentzsh, "Sun Grid Engine: Towards Creating a Compute Power Grid," Proc. Int'l Symp. Cluster Computing and Grid (CCGrid '01), pp. 35-39, 2001.
[16] E. Horowitz and S. Sahni, Fundamentals of Computer Algorithms. Computer Science, 1978.
[17] A.J. Bryant, "Greedy, Genetic, and Greedy Genetic Algorithms for the Quadratic Knapsack Problem," Proc. Conf. Genetic and Evolutionary Computation (GECCO '05), pp. 607-614, June 2005.
[18] J. Janakiraman, J.R. Santos, D. Subhraveti, and Y. Turner, "Cruz: Application-Transparent Distributed Checkpoint-Restart on Standard Operating Systems," Proc. Int'l Conf. Dependable Systems and Network (DSN '05), June 2005.
[19] R. Kumar, A.H. Joshi, K.K. Banka, and P.I. Rockett, "Evolution of Hyperheuristics for the Biobjective 0/1 Knapsack Problem by Multiobjective Genetic Programming," Proc. 10th Ann. Conf. Genetic and Evolutionary Computation (GECCO '08), July 2008.
[20] O. Laadan and J. Nieh, "Transparent Checkpoint-Restart of Multiple Processes on Commodity Operating Systems," Proc. USENIX Ann. Technical Conf., pp. 323-336, June 2007.
[21] O. Laadan, D. Phung, and J. Nieh, "Transparent Checkpoint-Restart of Distributed Apications on Commodity Clusters," Proc. IEEE Int'l Conf. Cluster Computing, Sept. 2005.
[22] S.M. Larson, C.D. Snow, M.R. Shirts, and V.S. Pande, "Folding@Home and Genome@Home: Using Distributed Computing to Tackle Previously Intractable Problems in Computational Biology,", 2003.
[23] S. Martello and P. Toth, "A New Algorithm for the 0-1 Knapsack Problem," J. Management Science, vol. 34, no. 5, pp. 633-644, 1988.
[24] S. Martello and P. Toth, Knapsack Problems: Algorithms and Computer Implementations. John Wiley and Sons, 1990.
[25] J.S. Plank, "An Overview of Checkpointing in Uniprocessor and Distributed Systems, Focusing on Implementation and Performance," Technical Report UT-CS-97-372, Dept. of Computer Science, Univ. of Tennessee, July 1997.
[26] J.S. Plank, K. Li, and M.A. Puening, "Diskless Checkpointing," IEEE Trans. Parallel and Distributed Systems, vol. 9, no. 10, pp. 972-986, Oct. 1998.
[27] M.O. Rabin, "Efficient Dispersal of Information for Security, Load Balancing and Fault Taulerence," J. ACM, vol. 36, no. 2, pp. 335-348, 1989.
[28] X. Ren, R. Eigenmann, and S. Bagchi, "Failure-Aware Checkpointing in Fine-Grained Cycle Sharing Systems," Proc. 16th Int'l Symp. High performance Distributed Computing (HPDC '07), June 2007.
[29] E. Roman, "A Survey of Checkpoint/Restart Implementation," technical report, Publication LBNL-54942C, Berkeley Lab, 2002.
[30] S. Sahni, "Approximate Algorithms for the 0/1 Knapsack Problem," J. ACM, vol. 22, no. 1, pp. 115-124, Jan. 1975.
[31] S. Sahni, "Some Related Problems from Network Flows, Game Theory and Integer Programming," Proc. IEEE 13th Ann. Symp. Switching and Automata Theory, Oct. 1972.
[32] S. Sankaran, J.M. Squyres, B. Barrett, and A. Lumsdaine, "The LAM/MPI Checkpoint/Restart Framework: System-Initiated Checkpointing," Proc. LACSI Symp., Oct. 2003.
[33] D. Thain, T. Tannenbaum, and M. Livny, "Distributed Computing in Practice: The Condor Experience," Concurrency—Practice and Experience, vol. 17, nos. 2-4, pp. 323-356, 2004.
[34] B.O. Yenke, "Prédiction Des Performances Des Opérations de Sauvegarde/Reprise Sur Cluster Virtuel," RENPAR '18/SympAAA 2008/CFSE '3/Fribourg, Suisse, du 11 au 13 février, 2008.
[35] A. Ziv and J. Bruck, "An On-Line Algorithm for Checkpoint Placement," IEEE Trans. Computer, vol. 46, no. 9, pp. 976-985, Sept. 1997.

Index Terms:
Computing services, intranet networks, checkpoint scheduling, virtual clusters.
Blaise Omer Yenke, Jean-François Mehaut, Maurice Tchuente, "Scheduling of Computing Services on Intranet Networks," IEEE Transactions on Services Computing, vol. 4, no. 3, pp. 207-215, July-Sept. 2011, doi:10.1109/TSC.2011.28
Usage of this product signifies your acceptance of the Terms of Use.