This Article 
 Bibliographic References 
 Add to: 
Bandwidth Efficient String Reconciliation Using Puzzles
November 2006 (vol. 17 no. 11)
pp. 1217-1225

Abstract—Of considerable interest in recent years has been the problem of exchanging correlated data with minimum communication. We thus consider the problem of exchanging two similar strings held by different hosts. Our approach involves transforming a string into a multiset of substrings that are reconciled efficiently using known multiset reconciliation algorithms, and then put back together on a remote host using tools from graph theory. We present analyses, experiments, and results to show that the communication complexity of our approach for high-entropy data compares favorably to existing algorithms including rsync, a widely-used string reconciliation engine. We also quantify the trade-off between communication and the computation complexity of our approach.

[1] S. Agarwal, D. Starobinski, and A. Trachtenberg, “On the Scalability of Data Synchronization Protocols for PDAs and Mobile Devices,” IEEE Network, vol. 16, no. 4, pp. 22-28, July-Aug. 2002.
[2] A.D. Birrell, A. Hisgen, C. Jerian, T. Mann, and G. Swart, “The Echo Distributed File System,” Technical Report 111, Palo Alto, Calif., 10, 1993.
[3] A.Z. Broder, “On the Resemblance and Containment of Documents,” Proc. SEQS: Sequences '91, 1998.
[4] A.Z. Broder, M. Charikar, A.M. Frieze, and M. Mitzenmacher, “Min-Wise Independent Permutations,” J. Computer and System Sciences, vol. 60, no. 3, pp. 630-659, 2000.
[5] Nat'l Center for Biotechnology Information, NCBI Human Genome Resources, human/, 2005.
[6] W. Churchill, Their Finest Hour, http://www.winstonchurchill. org/i4a/pages index.cfm?pageid=418, 1940.
[7] B. Cohen, Bittorrent, http:/, 2003.
[8] R.M. Corless, G.H. Gonnet, D.E.G. Hare, D.J. Jeffrey, and D.E. Knuth, “On the LambertW Function,” Advances in Computational Math., vol. 5, pp. 329-359, 1996.
[9] G. Cormode, M. Paterson, S. Cenk Sahinalp, and U. Vishkin, “Communication Complexity of Document Exchange,” Proc. Symp. Discrete Algorithms, pp. 197-206, 2000.
[10] Darpa Internet Program, Rfc 793: Transmission Control Protocol,, 2006.
[11] B. Eckel, Thinking in Java, MindView, Inc., first ed., 1998.
[12] B. Eckel, Thinking in C++, vol. 2, MindView, Inc., second ed., 2001.
[13] A.V. Evfimievski, “A Probabilistic Algorithm for Updating Files over a Communication Link,” Proc. Ninth Ann. ACM-SIAM Symp. Discrete Algorithms, pp. 300-305, 1998.
[14] H. Fleischner, Eulerian Graphs and Related Topics, Part 1, vols. 1/2, Elsevier Science Publishers B.V., 1991.
[15] A. Guénoche, “Can We Recover a Sequence, Just Knowing All Its Subsequences of Given Length,” Computer Applications in the Biosciences, vol. 8, no. 6, pp. 569-574, 1992.
[16] R.G. Guy, J.S. Heidemann, W. Mak, T.W. Page Jr., G.J. Popek, and D. Rothmeir, “Implementation of the Ficus Replicated File System,” USENIX Conf. Proc., pp. 63-71, June 1990.
[17] B. Hao, H. Xie, and S. Zhang, Compositional Representation of Protein Sequences and the Number of Eulerian Loops,, vol. 1, 2001.
[18] F.J. MacWilliams and N.J.A. Sloane, The Theory of Error-Correcting Codes. Elsevier, 1977.
[19] Y. Minsky and A. Trachtenberg, “Scalable Set Reconciliation,” Proc. 40th Allerton Conf. Comm., Control, and Computing, Oct. 2002.
[20] Y. Minsky, A. Trachtenberg, and R. Zippel, “Set Reconciliation with Nearly Optimal Communication Complexity,” Proc. Int'l Symp. Information Theory, p. 232, June 2001.
[21] Y. Minsky, A. Trachtenberg, and R. Zippel, “Set Reconciliation with Nearly Optimal Communication Complexity,” IEEE Trans. Information Theory, Sept. 2003.
[22] A. Nijenhuis and H.S. Wilf, Combinatorial Algorithms. Academic Press, 1975.
[23] M. Ohta, “Incremental Zone Transfer in DNS,” IETF, Aug. 1996.
[24] A. Orlitsky, “Interactive Communication: Balanced Distributions, Correlated Files, and Average-Case Complexity,” Proc. IEEE Symp. Foundations of Computer Science, pp. 228-238, 1991.
[25] A. Orlitsky and K. Viswanathan, “Practical Protocols for Interactive Communication,” Proc. IEEE Int'l Symp. Information Theory, p. 115, 2001.
[26] J. Postel, Rfc 768: User Datagram Protocol,, 1980.
[27] R. Rivest, RFC 1320—The MD4 Message-Digest Algorithm, Internet-draft, Mass. Inst. of Tech nology, Apr. 1992.
[28] M. Satyanarayanan, J.J. Kistler, P. Kumar, M.E. Okasaki, E.H. Siegel, and D.C. Steere, “CODA: A Highly Available File System for a Distributed Workstation Environment,” IEEE Trans. Computers, vol. 39, no. 4, pp. 447-459, 1990.
[29] C.E. Shannon, “A Mathematical Theory of Communication,” Bell Systems Technical J., vol. 27, pp. 379-423, 623-656, 1948.
[30] S.S. Skiena and G. Sundaram, “Reconstructing Strings from Substrings,” J. Computational Biology, vol. 2, pp. 333-353, 1995.
[31] T. Suel and N. Memon, Algorithms for Delta Compression and Remote File Synchronization. Academic Press, Aug. 2002.
[32] A. Tridgell, “Efficient Algorithms for Sorting and Synchronization,” Phd thesis, The Australian Nat'l Univ., 2000.

Index Terms:
Efficient file synchronization, string reconstruction, rsync.
Sachin Agarwal, Vikas Chauhan, Ari Trachtenberg, "Bandwidth Efficient String Reconciliation Using Puzzles," IEEE Transactions on Parallel and Distributed Systems, vol. 17, no. 11, pp. 1217-1225, Nov. 2006, doi:10.1109/TPDS.2006.148
Usage of this product signifies your acceptance of the Terms of Use.