Issue No. 03 - March (2013 vol. 24)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TPDS.2012.139
Jichiang Tsai , National Chung Hsing University, Taichung
Most existing global-snapshot algorithms in distributed systems use control messages to coordinate the construction of a global snapshot among all processes. Since these algorithms typically assume the underlying logical overlay topology is fully connected, the number of control messages exchanged among the whole processes is proportional to the square of number of processes, resulting in higher possibility of network congestion. Hence, such algorithms are neither efficient nor scalable for a large-scale distributed system composed of a huge number of processes. Recently, some efforts have been presented to significantly reduce the number of control messages, but doing so incurs higher response time instead. In this paper, we propose an efficient global-snapshot algorithm able to let every process finish its local snapshot in a given number of rounds. Particularly, such an algorithm allows a tradeoff between the response time and the message complexity. Moreover, our global-snapshot algorithm is symmetrical in the sense that identical steps are executed by every process. This means that our algorithm is able to achieve better workload balance and less network congestion. Most importantly, based on our framework, we demonstrate that the minimum number of control messages required by a symmetrical global-snapshot algorithm is \Omega (N\log N), where N is the number of processes. Finally, we also assume non-FIFO channels.
Process control, Program processors, Time factors, Vectors, Algorithm design and analysis, Hypercubes, Complexity theory, checkpointing, Distributed systems, global snapshots, process symmetry, message passing
J. Tsai, "Flexible Symmetrical Global-Snapshot Algorithms for Large-Scale Distributed Systems," in IEEE Transactions on Parallel & Distributed Systems, vol. 24, no. , pp. 493-505, 2013.