This Article 
 Bibliographic References 
 Add to: 
Risk-Resilient Heuristics and Genetic Algorithms for Security-Assured Grid Job Scheduling
June 2006 (vol. 55 no. 6)
pp. 703-719
In scheduling a large number of user jobs for parallel execution on an open-resource Grid system, the jobs are subject to system failures or delays caused by infected hardware, software vulnerability, and distrusted security policy. This paper models the risk and insecure conditions in Grid job scheduling. Three risk-resilient strategies, preemptive, replication, and delay-tolerant, are developed to provide security assurance. We propose six risk-resilient scheduling algorithms to assure secure Grid job execution under different risky conditions. We report the simulated Grid performances of these new Grid job scheduling algorithms under the NAS and PSA workloads. The relative performance is measured by the total job makespan, Grid resource utilization, job failure rate, slowdown ratio, replication overhead, etc. In addition to extending from known scheduling heuristics, we developed a new space-time genetic algorithm (STGA) based on faster searching and protected chromosome formation. Our simulation results suggest that, in a wide-area Grid environment, it is more resilient for the global job scheduler to tolerate some job delays instead of resorting to preemption or replication or taking a risk on unreliable resources allocated. We find that delay-tolerant Min-Min and STGA job scheduling have 13-23 percent higher performance than using risky or preemptive or replicated algorithms. The resource overheads for replicated job scheduling are kept at a low 15 percent. The delayed job execution is optimized with a delay factor, which is 20 percent of the total makespan. A Kiviat graph is proposed for demonstrating the quality of Grid computing services. These risk-resilient job scheduling schemes can upgrade Grid performance significantly at only a moderate increase in extra resources or scheduling delays in a risky Grid computing environment.

[1] J.H. Abawajy, “Fault-Tolerant Scheduling Policy for Grid Computing Systems,” Proc. IEEE Int'l Parallel and Distributed Processing Symp. (IPDPS '04), Apr. 2004.
[2] M.J. Atallah, K.N. Pantazopoulos, J.R. Rice, and E.H. Spafford, “Secure Outsourcing of Scientific Computations,” Advances in Computers, vol. 54, chapter 6, pp. 215-272, 2001.
[3] F. Azzedin and M. Maheswaran, “Integrating Trust into Grid Resource Management Systems,” Proc. Int'l Conf. Parallel Processing, Aug. 2002.
[4] S. Bansal, P. Kumar, and K. Singh, “An Improved Duplication Strategy for Scheduling Precedence Constrained Graphs in Multiprocessor Systems,” IEEE Trans. Parallel and Distributed Systems, vol. 14, no. 6, pp. 533-544, June 2003.
[5] R. Bajaj and D.P. Agrawal, “Improving Scheduling of Tasks in a Heterogeneous Environment,” IEEE Trans. Parallel and Distributed Systems, vol. 15, no. 2, pp. 107-118, Feb. 2004.
[6] Grid Computing: Making the Global Infrastructure a Reality, F. Berman, G. Fox, and T. Hey, eds. John Wiley & Sons, 2003.
[7] F. Berman et al., “Adaptive Computing on the Grid Using AppLeS,” IEEE Trans. Parallel and Distributed Systems, vol. 14, no. 4, pp. 369-382, Apr. 2003.
[8] T.D. Braun, D. Hensgen, R. Freund, H.J. Siegel, N. Beck, L. Boloni, M. Maheswaran, A. Reuther, J. Robertson, M. Theys, and B. Yao, “A Comparison of Eleven Static Heuristics for Mapping a Class of Independent Tasks onto Heterogeneous Distributed Computing Systems,” J. Parallel and Distributed Computing, vol. 61, no. 6, pp. 810-837, 2001.
[9] R. Buyya, M. Murshed, and D. Abramson, “A Deadline and Budge Constrained Cost-Time Optimization Algorithm for Scheduling Task Farming Applications on Global Grids,” Proc. Int'l Conf. Parallel and Distributed Processing Techniques and Applications, 2002.
[10] H. Casanova, A. Legrand, D. Zagorodnov, and F. Berman, “Heuristics for Scheduling Parameter Sweep Applications in Grid Environments,” Proc. Heterogeneous Computing Workshop (HCW), 2000.
[11] K. Czajkowski, I. Foster, and C. Kesselman, “Resource Co-Allocation in Computational Grids,” Proc. IEEE Int'l Symp. High Performance Distributed Computing (HPDC-8), 1999.
[12] H. Dail, F. Berman, and H. Casanova, “A Decoupled Scheduling Approach for Grid Application Development Environments,” J. Parallel and Distributed Computing, vol. 63, pp. 505-524, 2003.
[13] A. Dogan and F. Ozguner, “Matching and Scheduling Algorithms for Minimizing Execution Time and Failure Probability of Applications in Heterogeneous Computing,” IEEE Trans. Parallel and Distributed Systems, vol. 13, no. 3, pp. 308-323, Mar. 2002.
[14] A. Dogan and F. Ozguner, “A Duplication Based Scheduling Algorithm for Heterogeneous Computing Systems,” Proc. Int'l Conf. Parallel Processing (ICPP '02), pp. 352-358, Aug. 2002.
[15] D.G. Feitelson and B. Nitzberg, “Job Characteristics of a Production Parallel Scientific Workload on the NASA Ames iPSC/860,” Research Report RC 19790 (87657), IBM T.J. Watson Research Center, Oct. 1994.
[16] N. Fujimoto and K. Hagihara, “Near-Optimal Dynamic Task Scheduling of Independent Coarse-Grained Tasks onto a Computational Grid,” Proc. Int'l Conf. Parallel Processing, 2003.
[17] M. Gupta, P. Judge, and M. Ammar, “A Reputation System for Peer-to-Peer Networks,” Proc. ACM Int'l Workshop Network and Operating Systems Support for Digital Audio and Video (NOSSDAV), 2003.
[18] X. He, X.H. Sun, and G. Laszewski, “A QoS Guided Scheduling Algorithm for the Computational Grid,” Proc. Workshop Grid and Cooperative Computing, Dec. 2002.
[19] M. Humphrey and M.R. Thompson, “Security Implications of Typical Grid Computing Usage Scenarios,” Proc. High Performance Distributed Computing, Aug. 2001.
[20] K. Hwang, Y. Chen, and H. Liu, “Protecting Network-Centric Computing System from Intrusive and Anomalous Attacks,” keynote paper, Proc. IEEE Workshop System and Network Security (SNS-225), Apr. 2005.
[21] K. Hwang, Y.K. Kwok, S. Song, M. Cai, Y. Chen, and Y. Chen, “Security Binding and Worm/DDoS Defense Infrastructure for Trusted Grid Computing,” Int'l J. Critical Infrastructures, 2005.
[22] K. Hwang and Z. Xu, Scalable Parallel Computing: Technology, Architecture, Programming. San Francisco: McGraw-Hill, Feb. 1998.
[23] S. Hwang and C. Kesselman, “A Flexible Framework for Fault Tolerance in the Grid,” J. Grid Computing, vol. 1, no. 3, pp. 251-272, 2003.
[24] O.H. Ibarra and C.E. Kim, “Heuristic Algorithms for Scheduling Independent Tasks on Nonidentical Processors,” J. ACM, vol. 24, no. 2, pp. 280-289, Apr. 1977.
[25] J. In, P. Avery, R. Cavanaygh, and S. Ranka, “Policy-Based Scheduling for Simple Quality of Service in Grid Computing,” Proc. Int'l Parallel and Distributed Processing Symp. (IPDPS 2004), Apr. 2004.
[26] K. Krauter, R. Buyya, and M. Maheswaran, “A Taxonomy of Grid Resource Management Systems for Distributed Computing,” Software-Practice and Experience, vol. 32, no. 2, pp. 135-164, 2002.
[27] D. Knjazew and D.E. Goldberg, “Solving Permutation Problems with the Ordering Messy Genetic Algorithm,” Advances in Evolutionary Computing, A. Ghosh and S. Tsutsui, eds., pp. 321-350, Springer, 2003.
[28] Y.K. Kwok, S. Song, and K. Hwang, “Selfish Grid Computing: Game-Theoretic Modeling and NAS Performance Results,” ACM/IEEE CC-Grid-2005, May 2005.
[29] Y.K. Kwok and I. Ahmad, “Link-Constrained Scheduling and Mapping of Tasks and Messages to a Network of Heterogeneous Processors,” Cluster Computing, vol. 3, no. 2, pp. 113-124, Sept. 2000.
[30] C. Lin, V. Varadharajan, Y. Wang, and V. Pruthi, “Enhancing Grid Security with Trust Management,” Proc. Services Computing 2004 (SCC 2004), 2004.
[31] C. Liu, L. Yang, I. Foster, and D. Angulo, “Design and Evaluation of a Resource Selection Framework for Grid Applications,” Proc. Int'l Symp. High Performance Distributed Computing (HPDC-11), 2002.
[32] V. Lo and J. Mache, “Job Scheduling for Prime Time vs. Non-Prime Time,” Proc. IEEE Cluster Computing, 2002.
[33] M. Maheswaran, S. Ali, and H.J. Sigel, “Dynamic Mapping and Scheduling of Independent Tasks onto Heterogeneous Computing Systems,” J. Parallel and Distributed Computing, pp. 107-131, 1999.
[34] X. Qin, H. Jiang, and D.R. Swanson, “An Efficient Fault-Tolerant Scheduling Algorithm for Real-Time Tasks with Precedence Constraints in Heterogeneous Systems,” Proc. Int'l Conf. Parallel Processing, pp. 360-368, Aug. 2002.
[35] F.B. Schneider, “Byzantine Generals in Action: Implementing Failstop Processors,” ACM Trans. Computer System, vol. 2, no. 2, pp. 145-154, 1984.
[36] F. Siebenlist, N. Nagaratnam, V. Welch, and C. Neuman, “Security for Virtual Organizations: Federating Trust and Policy Domains,” The Grid: Blueprint for a New Computing Infrastructure, I. Foster and C. Kesselman, eds., pp. 353-387, Elsevier, 2004.
[37] S. Song, K. Hwang, R Zhou, and Y.K. Kwok, “Trusted P2P Transactions with Fuzzy Reputation Aggregation,” IEEE Internet Computing, pp. 18-28, Nov/Dec. 2005.
[38] S. Song, K. Hwang, and Y.K. Kwok, “Trusted Grid Computing with Security Binding and Trust Integration,” J. Grid Computing, Sept. 2005.
[39] S. Song, Y.K. Kwok, and K. Hwang, “Trusted Job Scheduling in Open Computational Grids: Security-Driven Heuristics and a Fast Genetic Algorithm,” Proc. IEEE Int'l Parallel and Distributed Processing Symp. (IPDPS '05), Apr. 2005.
[40] V. Welch, F. Siebenlist, I. Foster, J. Bresnahan, K. Czajkowski, J. Gawor, C. Kesselman, S. Meder, L. Pearlman, and S. Tuecke, “Security for Grid Services,” Proc. Int'l Symp. High Performance Distributed Computing (HPDC-12), 2003.
[41] M. Wu and X. Sun, “A General Self-Adaptive Task Scheduling System for Non-Dedicated Heterogeneous Computing,” Proc. IEEE Int'l Conf. Cluster Computing, Dec. 2003.
[42] T. Xie and X. Qin, “Enahancing Security of Real-Time Applications on Grids through Dynamic Scheduling,” Proc. 11th Workshop Job Scheduling Strategies for Parallel Processing (JSSPP2005), pp. 146-158, June 2005.
[43] T. Xie, X. Qin, and A. Sung, “SAREC: A Security-Aware Scheduling Strategy for Real-Time Application on Clusters,” Proc. Int'l Conf. Parallel Processing (ICPP-2005), pp. 5-12, June 2005.
[44] L. Xiong and L. Liu, “PeerTrust: Supporting Reputation-Based Trust to P2P E-Communities,” IEEE Trans. Knowledge and Data Eng., pp. 843-857, July 2004.
[45] A.Y. Zomaya, R.C. Lee, and S. Olariu, “An Introduction to Genetic-Based Scheduling in Parallel-Processor Systems,” Solutions to Parallel and Distributed Computing Problems: Lessons from Biological Science, A.Y. Zomaya, F. Ercal, and S. Olariu, eds., pp. 111-133, chapter 5. New York: Wiley, 2001.

Index Terms:
Grid computing, job scheduling heuristics, genetic algorithm, replication scheduling, risk resilience, NAS and PSA benchmarks, performance metrics, distributed supercomputing.
Shanshan Song, Kai Hwang, Yu-Kwong Kwok, "Risk-Resilient Heuristics and Genetic Algorithms for Security-Assured Grid Job Scheduling," IEEE Transactions on Computers, vol. 55, no. 6, pp. 703-719, June 2006, doi:10.1109/TC.2006.89
Usage of this product signifies your acceptance of the Terms of Use.