Parallel and Distributed Systems, International Conference on (2007)
Dec. 5, 2007 to Dec. 7, 2007
Agustin Caminero , Department of Computing Systems, The University of Castilla, La Mancha, Spain
Anthony Sulistio , Dept. of Computer Sc.&Software Eng., The University of Melbourne, Australia
Blanca Caminero , Department of Computing Systems, The University of Castilla, La Mancha, Spain
Carmen Carrion , Department of Computing Systems, The University of Castilla, La Mancha, Spain
Rajkumar Buyya , Dept. of Computer Sc.&Software Eng., The University of Melbourne, Australia
Grid technologies are emerging as the next generation of distributed computing, allowing the aggregation of resources that are geographically distributed across different locations. However, these resources are independent and managed separately by various organizations with different policies. This will have a major impact to users who submit their jobs to the Grid, as they have to deal with issues such as policy heterogeneity, security and fault tolerance. Moreover, the changes of Grid conditions, such as resources that may become unavailable for a period of time due to maintenance and/or suffer failures, would significantly affect the Quality of Service (QoS) requirements of users. Therefore, it is essential for users to take into account the effects of resource failures during jobs execution.In this paper, we present our work on introducing resource failures and failure detection into the GridSim simulation toolkit. As we need to conduct repeatable and controlled experiments, it is easier to use simulation as a means of studying complex scenarios. We also give a detailed description of the overall design and a use case scenario demonstrating the conditions of resources varied over time.
A. Caminero, C. Carrion, A. Sulistio, R. Buyya and B. Caminero, "Extending GridSim with an architecture for failure detection," Parallel and Distributed Systems, International Conference on(ICPADS), Hsinchu, Taiwan, 2007, pp. 1-8.