Subscribe

Issue No.09 - September (2010 vol.21)

pp: 1281-1289

Ajay D. Kshemkalyani , University of Illinois at Chicago, Chicago

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TPDS.2010.24

ABSTRACT

Large-scale distributed systems such as supercomputers and peer-to-peer systems typically have a fully connected logical topology over a large number of processors. Existing snapshot algorithms in such systems have high response time and/or require a large number of messages, typically O(n^2), where n is the number of processes. In this paper, we present a suite of two algorithms: simple_tree, and hypercube, that are both fast and require a small number of messages. This makes the algorithms highly scalable. Simple_tree requires O(n) messages and has O(\log n) response time. Hypercube requires O(n \log n) messages and has O(\log n) response time, in addition to having the property that the roles of all the processes are symmetrical. Process symmetry implies greater potential for balanced workload and congestion-freedom. All the algorithms assume non-FIFO channels.

INDEX TERMS

Distributed system, global state, message passing, distributed snapshot, checkpoint, hypercube, supercomputer, cluster, overlay.

CITATION

Ajay D. Kshemkalyani, "Fast and Message-Efficient Global Snapshot Algorithms for Large-Scale Distributed Systems",

*IEEE Transactions on Parallel & Distributed Systems*, vol.21, no. 9, pp. 1281-1289, September 2010, doi:10.1109/TPDS.2010.24REFERENCES

- [1] M. Adler, E. Halperin, R.M. Karp, and V.V. Vazirani, "A Stochastic Process on the Hypercube with Applications to Peer-to-Peer Networks,"
Proc. ACM Symp. Theory of Computing (STOC), pp. 575-584, 2003.- [2] S. Agarwal, R. Garg, M. Gupta, and J. Moreira, "Adaptive Incremental Checkpointing for Massively Parallel Systems,"
Proc. Int'l Conf. Supercomputing, pp. 277-286, 2004.- [3] E. Anceaume, R. Ludinard, A. Ravoaja, and F. Brasileiro, "PeerCube: A Hypercube-Based P2P Overlay Robust against Collusion and Churn,"
Proc. Second Int'l Conf. Self-Adaptive and Self-Organizing Systems, pp. 15-24, 2008.- [4] J. Aspnes and G. Shah, "Skip Graphs,"
Proc. 14th Ann. ACM-SIAM Symp. Discrete Algorithms, pp. 384-393, 2003.- [5] B. Awerbuch, "Complexity of Network Synchronization,"
J. ACM, vol. 32, no. 4, pp. 804-823, 1985.- [6] R. Baldoni, R. Jimenez-Peris, M. Patino-Martinez, L. Querzoni, and A. Virgillito, "Dynamic Quorums for DHT-Based Enterprise Infrastructures,"
J. Parallel and Distributed Computing, vol. 68, no. 9, pp. 1235-1249, 2008.- [7] R. Baldoni, F. Quaglia, and P. Fornara, "An Index-Based Checkpointing Algorithm for Autonomous Distributed Systems,"
IEEE Trans. Parallel and Distributed Systems, vol. 10, no. 2, pp. 181-192, Feb. 1999.- [8] P. Berndt, "Using Symmetric Distributed Processing for Peer-to-Peer VoIP Conferencing in Auditory Virtual Environments,"
Proc. Seventh Int'l Workshop Peer-to-Peer Systems (IPTPS), 2008.- [9] W.J. Bolosky, J.R. Douceur, D. Ely, and M. Theimer, "Feasability of a Serverless Distributed File System Deployed on an Existing Set of Desktop PCs,"
Proc. ACM SIGMETRICS '00, pp. 34-43, 2000.- [10] G. Bronevetsky, D. Marques, K. Pingali, and P. Stodghill, "Automated Application-Level Checkpointing of MPI Programs,"
Proc. Symp. Principles and Practice of Parallel Programming (PPoPP '03), pp. 84-94, 2003.- [11] G. Bronevetsky, D. Marques, K. Pingali, and P. Stodghill, "Collective Operations in Application-Level Fault-Tolerant MPI,"
Proc. Int'l Conf. Supercomputing, pp. 234-243, 2003.- [12] G. Cao and M. Singhal, "On Coordinated Checkpointing in Distributed Systems,"
IEEE Trans. Parallel and Distributed Systems, vol. 9, no. 12, pp. 1213-1225, Dec. 1998.- [13] K.M. Chandy and L. Lamport, "Distributed Snapshots: Determining Global States of Distributed Systems,"
ACM Trans. Computer Systems, vol. 3, no. 1, pp. 63-75, 1985.- [14] C. Coti, T. Herault, P. Lemarinier, L. Pilard, A. Rezmerita, E. Rodriguez, and F. Cappello, "Blocking vs. Non-Blocking Coordinated Checkpointing for Large-Scale Fault Tolerant MPI,"
Proc. Int'l Conf. Supercomputing '06, Nov. 2006.- [15] R. Crowther and S. Woodfield, "Hypercubes: A Superior Topology for Real-Time Genealogical Collaboration Networks,"
Proc. Family History Technology Workshop, 2001.- [16] G. De Candia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels, "Dynamo: Amazon's Highly Available Key-Value Store,"
Proc. 21st ACM SIGOPS Symp. Operating Systems Principles, pp. 205-220, 2007.- [17] E. Elnozahy, L. Alvisi, Y.-M. Wang, and D. Johnson, "A Survey of Rollback-Recovery Protocols in Message-Passing Systems,"
ACM Computing Surveys, vol. 34, no. 3, pp. 375-408, Sept. 2002.- [18] D. Fahrenholtz and V. Turau, "Improving Churn Resistance of P2P Data Stores Based on the Hypercube,"
Proc. Fifth Int'l Symp. Parallel and Distributed Computing, pp. 263-270, 2006.- [19] D. Fahrenholtz and A. Wombacher, "A Formal Communication Model for Lookup Operations in a Hypercube-Based P2P Data Store,"
Proc. First Int'l Conf. Collaborative Computing: Networking, Applications and Worksharing, Dec. 2005.- [20] D. Fahrenholtz, "A Hypercube-Based Peer-to-Peer Data Store Resilient against Peer Population Fluctuation," PhD thesis, Hamburg Univ. of Tech nology, 2008.
- [21] L. Fan, H. Taylor, and P. Trinder, "Design Issues for Peer-to-Peer Massively Multiplayer Online Games,"
Proc. Second Int'l Workshop Massively Multiplayer Virtual Environments, 2008.- [22] S. Figueira and V. Reddi, "Topology-Based Hypercube Structures for Global Communication in Heterogeneous Networks,"
Proc. European Conf. Parallel Computing (Euro-Par '05), pp. 994-1004, 2005.- [23] R. Garg, V. Garg, and Y. Sabharwal, "Scalable Algorithms for Global Snapshots in Distributed Systems,"
Proc. 20th Ann. Conf. Supercomputing, pp. 269-277, Nov. 2006.- [24] R. Garg, V. Garg, and Y. Sabharwal, "Efficient Algorithms for Global Snapshots in Large Distributed Systems,"
IEEE Trans. Parallel and Distributed Systems, vol. 21, no. 5, pp. 620-630, May 2010.- [25] A. Grama, A. Gupta, G. Karypis, and V. Kumar,
Introduction to Parallel Computing, second ed. Addison-Wesley, 2003.- [26] D. Grolimund, L. Meisser, S. Schmid, and R. Wattenhofer, "Havelaar: A Robust and Efficient Reputation System for Active Peer-to-Peer Systems,"
Proc. First Workshop Economics of Networked Systems (NetEcon), June 2006.- [27] T. Hampel, T. Bopp, and R. Hinn, "A Peer-to-Peer Architecture for Massive Multiplayer Online Games,"
Proc. Fifth ACM SIGCOMM Workshop Network and System Support for Games, 2006.- [28] S.C. Han and Y. Xia, "Optimal Leader Election Scheme for Peer-to-Peer Applications,"
Proc. Sixth Int'l Conf. Networking, 2007.- [29] S.C. Han, "Distributed Many-to-Many Mapping Algorithm in the Hypercube Network,"
Proc. Fourth Int'l Conf. Networked Computing and Advanced Information Management, 2008.- [30] S.C. Han, "Load-Balancing Content Distribution in Structured Peer-to-Peer Networks,"
Proc. Fourth Int'l Conf. Networked Computing and Advanced Information Management, 2008.- [31] J.-M. Helary, A. Mostefaoui, and M. Raynal, "Communication-Induced Determination of Consistent Snapshots,"
IEEE Trans. Parallel and Distributed Systems, vol. 10, no. 9, pp. 865-877, Sept. 1999.- [32] R. Jacob, A. Richa, C. Scheideler, S. Schmid, and H. Taubig, "A Polylogarithmic Time Algorithm for Distributed Self-Stabilizing Skip Graphs,"
Proc. 28th ACM Symp. Principles of Distributed Computing (PODC), pp. 131-140, Aug. 2009.- [33] Y.-J. Joung and L.-W. Yang, "Wildcard Search in Structured Peer-to-Peer Networks,"
IEEE Trans. Knowledge and Data Eng., vol. 19, no. 11, pp. 1524-1540, Nov. 2007.- [34] A. Kangarlou, P. Ruth, D. Xu, and P. Eugster, "Taking Snapshots of Virtual Networked Environments,"
Proc. Virtualization Technologies in Distributed Computing Workshop '07, Nov. 2007.- [35] A. Kshemkalyani, M. Raynal, and M. Singhal, "An Introduction to Snapshot Algorithms in Distributed Computing,"
Distributed Systems Eng., vol. 2, no. 4, pp. 224-233, 1995.- [36] A. Kshemkalyani and B. Wu, "Detecting Arbitrary Stable Properties Using Efficient Snapshots,"
IEEE Trans. Software Eng., vol. 33, no. 5, pp. 330-346, May 2007.- [37] A. Kshemkalyani, "A Symmetric O(n log n) Message Distributed Algorithm for Large-Scale Systems,"
Proc. IEEE Int'l Cluster Computing Conf., 2009.- [38] F. Kuhn, S. Schmid, and R. Wattenhofer, "A Self-Repairing Peer-to-Peer System Resilient to Dynamic Adversarial Churn,"
Proc. Fourth Int'l Workshop Peer-To-Peer Systems (IPTPS), pp. 13-23, Feb. 2005.- [39] T.-H. Lai and T. Yang, "On Distributed Snapshots,"
Information Processing Letters, vol. 25, no. 3, pp. 153-158, 1987.- [40] S.S. Lam and H. Liu, "Failure Recovery for Structured P2P Networks: Protocol Design and Performance Evaluation,"
Computer Networks, vol. 50, pp. 3083-3104, 2006.- [41] Y. Li, M.T. Ozsu, and K.-L. Tan, "XCube: Processing XPath Queries in a Hypercube Overlay Network,"
Peer-to-Peer Networking Applications, vol. 2, pp. 128-145, 2009.- [42] H. Liu and S.S. Lam, "Neighbour Table Construction and Update in a Dynamic Peer-to-Peer Network,"
Proc. IEEE Int'l Conf. Distributed Computing Systems, 2003.- [43] T. Locher, S. Schmid, and R. Wattenhofer, "eQuus: A Provably Robust and Locality-Aware Peer-to-Peer System,"
Proc. Sixth IEEE Int'l Conf. Peer-to-Peer Computing (P2P), pp. 3-11, Sept. 2006.- [44] T. Locher, R. Meier, S. Schmid, and R. Wattenhofer, "Push-to-Pull Peer-to-Peer Live Streaming,"
Proc. 21st Int'l Symp. Distributed Computing (DISC), pp. 388-402, Sept. 2007.- [45] T. Locher, R. Meier, S. Schmid, and R. Wattenhofer, "Robust Live Media Streaming in Swarms,"
Proc. 19th Int'l Workshop Network and Operating Systems Support for Digital Audio and Video (NOSSDAV), pp. 121-126, June 2009.- [46] D. Manivannan, R.H.B. Netzer, and M. Singhal, "Finding Consistent Global Checkpoints in a Distributed Computation,"
IEEE Trans. Parallel and Distributed Systems, vol. 8, no. 6, pp. 623-627, June 1997.- [47] F. Mattern, "Virtual Time and Global States of Distributed Systems,"
Proc. Workshop Parallel and Distributed Algorithms, M. Cosnard, ed., pp. 215-226, 1988.- [48] F. Mattern, "Efficient Algorithms for Distributed Snapshots and Global Virtual Time Approximation,"
J. Parallel and Distributed Computing, vol. 18, no. 4, pp. 423-434, 1993.- [49] P. Maymounkov and D. Mazieres, "Kademlia: A Peer-to-Peer Information System Based on the XOR Metric,"
Proc. First Int'l Workshop Peer-to-Peer Systems (IPTPS), pp. 53-65, 2002.- [50] A. Mowat, R. Schmidt, M. Schumacher, and I. Constantinescu, "Extending Peer-to-Peer Networks for Approximate Search,"
Proc. 23rd ACM Symp. Applied Computing, pp. 455-459, 2008.- [51] M. Naor and U. Wieder, "Novel Achitectures for P2P Applications: The Continuous-Discrete Approach,"
ACM Trans. Algorithms, vol. 3, no. 3, 2007.- [52] L. Ni, A. Harwood, and P.J. Stuckey, "Realizing the E-Science Desktop Peer Using a Peer-to-Peer Distributed Virtual Machine Middleware,"
Proc. Fourth Int'l Workshop Middleware for Grid Computing (MCG '06), pp. 7-12, 2006.- [53] A. Oliner, L. Rudolph, and R. Sahoo, "Cooperative Checkpointing: A Robust Approach to Large-Scale Systems Reliability,"
Proc. Int'l Conf. Supercomputing '06, pp. 14-23, 2006.- [54] G. Plaxton, R. Rajaraman, and A. Richa, "Accessing Nearby Copies of Replicated Objects in a Distributed Environment,"
Proc. ACM Symp. Parallel Architectures and Algorithms, pp. 311-320, June 1997.- [55] H. Ren, Z. Wang, and Z. Liu, "A Hypercube Based P2P Information Service for Data Grid,"
Proc. Fifth Int'l Conf. Grid and Cooperative Computing, pp. 508-513, 2006.- [56] J. Risson and T. Moors, "Survey of Research towards Robust Peer-to-Peer Networks: Search Methods,"
Computer Networks, vol. 50, no. 17, pp. 3485-3521, 2006.- [57] A. Rowstron and P. Druschel, "Pastry: Scalable Distributed Object Location and Routing for Large-Scale Peer-to-Peer Systems,"
Proc. IFIP/ACM Int'l Conf. Distributed Systems Platforms, pp. 329-350, Nov. 2001.- [58] B. Schandl, "OPAX—An Open Peer-to-Peer Architecture for XML Message Exchange," Diploma thesis, Univ. of Vienna, 2004.
- [59] M. Schlosser, M. Sintek, S. Decker, and W. Nejdl, "Ontology-Based Search and Broadcast in HyperCuP,"
Proc. Int'l Semantic Web Conf., 2002.- [60] M. Schlosser, M. Sintek, S. Decker, and W. Nejdl, "HyperCuP— Hypercubes, Ontologies, and Efficient Search on Peer-to-Peer Networks,"
Agents and Peer-to-Peer Computing, pp. 112-124, Springer, 2003.- [61] M. Schulz, G. Bronevetsky, R. Fernandes, D. Marques, K. Pingali, and P. Stodghill, "Implementation and Evaluation of a Scalable Application-Level Checkpoint-Recovery Scheme for MPI Programs,"
Proc. Int'l Conf. Supercomputing '04, Nov. 2004.- [62] M. Snir, S. Otto, S. Huss-Lederman, D. Walker, and J. Dongarra,
MPI: The Complete Reference. MIT Press, 1996.- [63] C. Tang, R.N. Chang, and E. So, "A Distributed Service Management Infrastructure for Enterprise Data Centers Based on Peer-to-Peer Technology,"
Proc. IEEE Int'l Conf. Services Computing, pp. 52-59, 2006.- [64] C. Tang, M. Steinder, M. Spreitzer, and G. Pacifici, "A Scalable Application Placement Controller for Enterprise Data Centers,"
Proc. 16th Int'l Conf. World Wide Web (WWW), C.L. Williamson, M.E. Zurko, P.F. Patel-Schneider, and P.J. Shenoy, eds., pp. 331-340, 2007.- [65] S. Venkatesan, "Message-Optimal Incremental Snapshots,"
Proc. IEEE Int'l Conf. Distributed Computing Systems, pp. 53-60, 1989.- [66] Top 500 Supercomputer Sites, <http:/www.top500.org>, 2009.
- [67] B.Y. Zhao, L. Huang, J. Stribling, S.C. Rhea, A.D. Joseph, and J.D. Kubiatowicz, "Tapestry: A Resilient Global-Scale Overlay for Service Deployment,"
IEEE J. Selected Areas in Comm., vol. 22, no. 1, pp. 41-53, Jan. 2004. |