This Article 
 Bibliographic References 
 Add to: 
Decentralized QoS-Aware Checkpointing Arrangement in Mobile Grid Computing
August 2010 (vol. 9 no. 8)
pp. 1173-1186
Paul J. Darby III, University of Louisiana at Lafayette, Lafayette, LA
Nian-Feng Tzeng, University of Louisiana at Lafayette, Lafayette, LA
This paper deals with decentralized, QoS-aware middleware for checkpointing arrangement in Mobile Grid (MoG) computing systems. Checkpointing is more crucial in MoG systems than in their conventional wired counterparts due to host mobility, dynamicity, less reliable wireless links, frequent disconnections, and variations in mobile systems. We've determined the globally optimal checkpoint arrangement to be NP-complete and so consider Reliability Driven (ReD) middleware, employing decentralized QoS-aware heuristics, to construct superior checkpointing arrangements efficiently. With ReD, an MH (mobile host) simply sends its checkpointed data to one selected neighboring MH, and also serves as a stable point of storage for checkpointed data received from a single approved neighboring MH. ReD works to maximize the probability of checkpointed data recovery during job execution, increasing the likelihood that a distributed application, executed on the MoG, completes without sustaining an unrecoverable failure. It allows collaborative services to be offered practically and autonomously by the MoG. Simulations and actual testbed implementation show ReD's favorable recovery probabilities with respect to Random Checkpointing Arrangement (RCA) middleware, a QoS-blind comparison protocol producing random arbitrary checkpointing arrangements.

[1] SUN Microsystems, "Sun Grid Compute Utility," http://www., 2006.
[2] Hewlett-Packard Development Company, L.P., "Grid-Computing —Extending the Boundaries of Distributed IT," jumpid=reg_R1002_USEN , Jan. 2007.
[3] "IBM Grid Computing," gridwhat_is.shtml , Jan. 2007.
[4] S. Wesner et al., "Mobile Collaborative Business Grids—A Short Overview of the Akogrimo Project," white paper, Akogrimo Consortium, 2006.
[5] Computerworld, "HP Promises Global Wireless for Notebook PCs," mobile/story0,10801,110218,00.html?source=NLT_AM&nid=110218 , Apr. 2006.
[6] J. Long, W. Fuchs, and J. Abraham, "Compiler-Assisted Static Checkpoint Insertion," Proc. Symp. Fault-Tolerant Computing, pp. 58-65, July 1992.
[7] K. Ssu, B. Yao, and W. Fuchs, "An Adaptive Checkpointing Protocol to Bound Recovery Time with Message Logging," Proc. 18th Symp. Reliable Distributed Systems, pp. 244-252, Oct. 1999.
[8] N. Neves and W. Fuchs, "Coordinated Checkpointing without Direct Coordination," Proc. Int'l Computer Performance and Dependability Symp., pp. 23-31, Sept. 1998.
[9] W. Gao, M. Chen, and T. Nanya, "A Faster Checkpointing and Recovery Algorithm with a Hierarchical Storage Approach," Proc. Eighth Int'l Conf. High-Performance Computing in Asia-Pacific Region, pp. 398-402, Nov. 2005.
[10] R. de Camargo, F. Kon, and A. Goldman, "Portable Checkpointing and Communications for BSP Applications on Dynamic Heterogenous Grid Environments," Proc. Int'l Symp. Computer Architecture and High Performance Computing, pp. 226-234, Oct. 2005.
[11] L. Wang et al., "Modeling Coordinated Checkpointing for Large-Scale Supercomputers," Proc. Int'l Conf. Dependable Systems and Networks, pp. 812-821, July 2005.
[12] A. Agbaria and W. Sanders, "Application-Driven Coordination-Free Distributed Checkpointing," Proc. 25th IEEE Conf. Distributed Computing Systems, pp. 177-186, June 2005.
[13] A. Oliner, R. Sahoo, J. Moreira, and M. Gupta, "Performance Implications of Periodic Checkpointing on Large-Scale Cluster Systems," Proc. 19th IEEE Int'l Conf. Parallel and Distributed Processing Symp., Apr. 2005.
[14] C. Lin, S. Kuo, and Y. Huang, "A Checkpointing Tool for Palm Operating System," Proc. Int'l Conf. Dependable Systems and Networks, pp. 71-76, July 2001.
[15] D. Pradhan, P. Krishna, and N. Vaidya, "Recoverable Mobile Environment: Design and Trade-Off Analysis," Proc. Symp. Fault-Tolerant Computing, pp. 16-25, June 1996.
[16] T. Park, I. Byun, H. Kim, and H. Yeom, "The Performance of Checkpointing and Replication Schemes for Fault Tolerant Mobile Agent Systems," Proc. 21st IEEE Symp. Reliable Distrinuted Systems, pp. 256-261, Oct. 2002.
[17] H. Higaki and M. Takizawa, "Checkpoint-Recovery Protocol for Reliable Mobile Systems," Proc. 17th IEEE Symp. Reliable Distributed Systems, pp. 93-99, Oct. 1998.
[18] C. Ou, K. Ssu, and H. Jiau, "Connecting Network Partitions with Location-Assisted Forwarding Nodes in Mobile Ad Hoc Environments," Proc. 10th IEEE Pacific Rim Int'l Symp. Dependable Computing, pp. 239-247, Mar. 2004.
[19] K. Ssu et al., "Adaptive Checkpointing with Storage Management for Mobile Environments," IEEE Trans. Reliability, vol. 48, no. 4, pp. 315-324, Dec. 1999.
[20] G. Cao and M. Singhal, "Mutable Checkpoints: A New Checkpointing Approach for Mobile Computing Systems," IEEE Trans. Parallel and Distributed Systems, vol. 12, no. 2, pp. 157-172, Feb. 2001.
[21] A. Acharya and B. Badrinath, "Checkpointing Distributed Applications on Mobile Computers," Proc. Third Int'l Conf. Parallel and Distributed Information Systems, pp. 73-80, 1994.
[22] R. Prakash and M. Singhal, "Low-Cost Checkpointing and Failure Recovery in Mobile Computing Systems," IEEE Trans. Parallel and Distributed Systems, vol. 7, no. 10, pp. 1035-1048, Oct. 1996.
[23] N. Neves and W. Fuchs, "Adaptive Recovery for Mobile Environments," Comm. ACM, vol. 40, no. 1, pp. 68-84, 1997.
[24] G. Cao and M. Singhal, "On the Impossiblity of Min-Process Non-Blocking Checkpointing and an Efficient Checkpointing Algorithm for Mobile Computing Systems," Proc. Int'l Conf. Parallel Processing, pp. 37-44, 1998.
[25] G. Cao and M. Singhal, "Low-Cost Checkpointing with Mutable Checkpoints in Mobile Computing Systems," Proc. 18th Int'l Conf. Distributed Computing Systems, pp. 464-471, 1998.
[26] B. Yao, K. Ssu, and W. Fuchs, "Message Logging in Mobile Computing," Proc. IEEE Symp. Fault-Tolerant Computing, pp. 294-301, 1999.
[27] B. Yao and W.K. Fuchs, "Proxy-Based Recovery for Applications on Wireless Hand-Held Devices," Proc. 19th IEEE Symp. Reliable Distributed Systems, pp. 2-10, Oct. 2000.
[28] M. Chaterjee, S. Das, and D. Turgut, "WCA: A Weighted Clustering Algorithm for Mobile Ad Hoc Networks," Cluster Computing, vol. 5, no. 2, pp. 193-204, Apr. 2002.
[29] S. Bandyopadhyay and E. Coyle, "An Energy Efficient Hierarchical Clustering Algorithm for Wireless Sensor Networks," Proc. IEEE INFOCOM, pp. 1713-1723, Mar./Apr. 2003.
[30] T. Kanungo et al., "An Efficient K-Means Clustering Algorithm: Analysis and Implementation," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 7, pp. 881-892, July 2002.
[31] U. Brandes, M. Gaertler, and D. Wagner, Experiments on Graph Clustering Algorithms, pp. 568-579. Springer, 2003.
[32] S. Cook, "The Complexity of Theorem-Proving Procedures," Proc. Third Ann. ACM Symp. Theory of Computing, pp. 151-158, 1971.
[33] A. Howard, S. Siddiqi, and G. Sukhatme, "An Experimental Study of Localization Using Wireless Ethernet," Proc. Fourth Int'l Conf. Field and Service Robotics, pp. 2-10, July 2003.
[34] P. Ghosh, N. Roy, S. Das, and K. Basu, "A Game Theory Based Pricing Strategy for Job Allocation in Mobile Grids," Proc. 18th Int'l Parallel and Distributed Processing Symp., pp. 82-91, Apr. 2004.
[35] S. Wong and K. Ng, "A Middleware Framework for Secure Mobile Grid Services," Proc. Sixth IEEE Int'l Symp. Cluster Computing and Grid Workshops (CCGRIDW '06), pp. 2-12, Nov. 2006.
[36] The K∗Grid Project, , 2010.
[37] The Akogrimo Project, http:/, 2010.
[38] P. Gummadi, D. Wetherall, B. Geenstein, and S. Seshan, "Understanding and Mitigating the Impact of RF Interference on 802.11 Networks," Proc. SIGCOMM, Aug. 2007.
[39] Disaster Handling and Crisis Management—Akogrimio DHCM, http://www.akogrimo.orgmodules.php?name=AddFile&file= addfile&datei=scen_crisismgmt , 2008.
[40] "Bufferfly and IBM to Introduce First Computing Grid for Video Game Industry," weeklyAE-PR 06-02-23.html, 2008.
[41] A. Hampshire, "Extending the Open Grid Services Architecture to Imtermittently Available Wireless Networks," UKeScience All Hands, 2004.
[42] Integrated Project on Pervasive Gaming, http:/, 2010.
[43] K. Iwama, D. Manlove, S. Miyazaki, and Y. Morita, "Stable Marriage with Ties and Incomplete Lists," Proc. 26th Int'l Colloquium of Automata, Languages and Programming (ICALP '99), pp. 443-452, July 1999.
[44] G. Brassard and P. Bratey, Fundamentals of Algorithmics, p. 451. Prentice Hall, 1995.
[45] D. Gusfield and R. Irving, The Stable Marriage Problem, Structure and Algorithms. The MIT Press, 1989.
[46] T. Phan, L. Huang, and C. Dulan, "Challenge: Integrating Mobile Wireless Devices into the Computational Grid," Proc. ACM MOBICOM, pp. 271-278, Sept. 2002.
[47] M. Messig and A. Goscinski, "Autonomic System Management in Mobile Grid Environments," Proc. Australasian Symp. Grid Computing and Research (AUSGrid), 2007.
[48] X. Ren, R. Eigenmann, and S. Bagchi, "Failure-Aware Checkpointing in Fine-Grained Cycle Sharing Systems," Proc. 16th IEEE Int'l Symp. High Performance Distributed Computing (HPDC-16), pp. 33-42, June 2007.
[49] P. Darby and N. Tzeng, "Peer-to-Peer Checkpointing Arrangement for Mobile Grid Computing Systems," Proc. 16th IEEE Int'l Symp. High Performance Distributed Computing (HPDC-16), June 2007.
[50] S. Song, H. Youn, and U. Kim, "A New Reflective and Reliable Context-Oriented Event Service Architecture for Pervasive Computing," Proc. Int'l Conf. Computational Science and Its Applications (ICCSA-5), pp. 139-148, May 2006.
[51] J. Lee, R. Mateo, B. Gerardo, and S. Go, "Location-Aware Agent Using Data Mining for the Distributed Location-Based Services," Proc. Int'l Conf. Computational Science and Its Applications (ICCSA-5), pp. 867-876, May 2006.
[52] "HP and Cingular Wireless Introduce First Global Broadband Notebook PC in U.S.," HP Press Release, 2006061211a.html?jumpid=reg_ R1002_USEN , Dec. 2006.
[53] W. Woerndl and R. Eigner, "Collaborative, Context-Aware Applications for Inter-Networked Cars," Proc. 16th Int'l Workshops Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE '07), pp. 180-185, June 2007.
[54] A. Chandler and J. Finney, "Rendezvous: Supporting Real-Time Collaborative Mobile Gaming in High Latency Environments," Proc. ACM SIGCHI Int'l Conf. Advances in Computer Entertainment Technology, pp. 310-313, 2005.
[55] W. O'Neil, "The Cooperative Engagement Capability (CEC) Transforming Naval Anti-Air Warfare," Case Studies in National Security Transformation, Center for Technology and National Security Policy, Number 11, Aug. 2007.
[56] P PEO Soldier, "Documentary Highlights Success of Land Warrior System," documentary-highlights-success-of-land-warrior-system , May 2009.

Index Terms:
Checkpointing, computational Grids, mobile Grid systems, decentralized checkpointing, simulation and testbeds.
Paul J. Darby III, Nian-Feng Tzeng, "Decentralized QoS-Aware Checkpointing Arrangement in Mobile Grid Computing," IEEE Transactions on Mobile Computing, vol. 9, no. 8, pp. 1173-1186, Aug. 2010, doi:10.1109/TMC.2010.80
Usage of this product signifies your acceptance of the Terms of Use.