Subscribe
Issue No.10 - Oct. (2013 vol.25)
pp: 2367-2380
Deke Guo , National University of Defense Technology, Changsha
Mo Li , Nanyang Technological University, Singapore
ABSTRACT
In this paper, we study the set reconciliation problem, in which each member of a node pair has a set of objects and seeks to deliver its unique objects to the other member. How could each node compute the set difference, however, is challenging in the set reconciliation problem. To address such an issue, we propose a lightweight but efficient method that only requires the pair of nodes to represent objects using a counting Bloom filter (CBF) of size $(O(d))$ and exchange with each other, where $(d)$ denotes the total size of the set differences. A receiving node then subtracts the received CBF from its local one via minus operation proposed in this paper. The resultant CBF can approximately represent the union of the set differences and thus the set difference to each node can be identified after querying the resultant CBF. In this paper, we propose a novel estimator through which each node can accurately estimate not only the value of $(d)$ but also the size of the set difference to each node. Such an estimation result can be used to optimize the parameter setting of the CBF to achieve less false positives and false negatives. Comprehensive analysis and evaluation demonstrates that our method is more efficient than prior BF-based methods in terms of achieving the same accuracy with less communication cost. Moreover, our reconciliating method needs no prior context logs and it is very useful in networking and distributed applications.
INDEX TERMS
Peer to peer computing, Accuracy, Estimation, Synchronization, Context, Educational institutions, Approximation methods, set difference, Peer to peer computing, Accuracy, Estimation, Synchronization, Context, Educational institutions, Approximation methods, Bloom filters, Set reconciliation
CITATION
Deke Guo, Mo Li, "Set Reconciliation via Counting Bloom Filters", IEEE Transactions on Knowledge & Data Engineering, vol.25, no. 10, pp. 2367-2380, Oct. 2013, doi:10.1109/TKDE.2012.215
REFERENCES
 [1] J.W. Byers, J. Considine, M. Mitzenmacher, and S. Rost, "Informed Content Delivery across Adaptive Overlay Networks," IEEE/ACM Trans. Networking, vol. 12, no. 5, pp. 767-780, Oct. 2004. [2] Y. Liu, "A Two-Hop Solution to Solving Topology Mismatch," IEEE Trans. Parallel and Distributed Systems, vol. 19, no. 11, pp. 1591-1600, Nov. 2008. [3] X. Cheng and J. Liu, "Nettube: Exploring Social Networks for Peer-to-Peer Short Video Sharing," Proc. IEEE INFOCOM, pp. 1152-1160, 2009. [4] Y. Zhu and L.M. Ni, "Probabilistic Approach to Provisioning Guaranteed QoS for Distributed Event Detection," Proc. IEEE INFOCOM, pp. 592-600, Apr. 2008. [5] Y. Liu, L.M. Ni, and C. Hu, "Generalized Probabilistic Topology Control in Wireless Sensor Networks," ACM/IEEE J. Selected Area in Comm., vol. 30, no. 9, pp. 1780-1788, 2012. [6] K. Lin and P. Levis, "Data Discovery and Dissemination with Dip," Proc. Seventh Int'l Conf. Information Processing in Sensor Networks, pp. 433-444, 2008. [7] W. Dong, C. Chen, X. Liu, J. Bu, and Y. Gao, "A Lightweight and Density-Aware Reprogramming Protocol for Wireless Sensor Networks," IEEE Trans. Mobile Computing, vol. 10, no. 10, pp. 1403-1415, Oct. 2011. [8] A.J. Feldman, W.P. Zeller, M.J. Freedman, and E.W. Felten, "Sporc: Group Collaboration Using Untrusted Cloud Resources," Proc. Ninth USENIX Conf. Operating Systems Design and Implementation (OSDI), pp. 337-350, Oct. 2010. [9] K.P.N. Puttaswamy, C.C. Marshall, V. Ramasubramanian, P. Stuedi, D.B. Terry, and T. Wobber, "Docx2go: Collaborative Editing of Fidelity Reduced Documents on Mobile Devices," Proc. Eighth Int'l Conf. Mobile Systems, Applications, and Services (MobiSys), pp. 345-356, June 2010. [10] Q. Zondervan and A. Lee, "Data Synchronization of Portable Mobile Devices in a Distributed Database System," Technical Report: 98-02, IBM Watson Research Center, Mar. 2011. [11] E. Lagerspetz, S. Tarkoma, and T. Lindholm, "Dessy: Demonstrating Mobile Search and Synchronization," Proc. 11th Int'l Conf. Mobile Data Management Mobile Data Management (MDM), pp. 284-286, 2010. [12] Y. Minsky and A. Trachtenberg, "Scalable Set Reconciliation," Proc. 40th Ann. Allerton Conf. Comm., Control, and Computing, Oct. 2002. [13] D. Guo, Y. Liu, X.-Y. Li, and P. Yang, "False Negative Problem of Counting Bloom Filter," IEEE Trans. Knowledge and Data Eng., vol. 22, no. 5, pp. 651-664, May 2010. [14] D. Guo, J. Wu, H. Chen, Y. Yuan, and X. Luo, "The Dynamic Bloom Filters," IEEE Trans. Knowledge and Data Eng., vol. 22, no. 1, pp. 120-133, Jan. 2010. [15] S. Tarkoma, C.E. Rothenberg, and E. Lagerspetz, "Theory and Practice of Bloom Filters for Distributed Systems," IEEE Comm. Surveys and Tutorials, vol. 14, no. 1, pp. 131-155, Jan.-Mar. 2012. [16] L. Fan, P. Cao, J. Almeida, and A. Broder, "Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol," IEEE/ACM Trans. Networking, vol. 8, no. 3, pp. 281-293, June 2000. [17] P. Indyk and R. Motwani, "Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality," Proc. 13th Ann. ACM Symp. Theory of Computing (STOC), 1998. [18] A.Z. Broder and U. Feige, "Min-Wise versus Linear Independence (Extended Abstract)," Proc. 11th Ann. ACM-SIAM Symp. Discrete Algorithms (SODA), pp. 147-154, Jan. 2000. [19] M. Charikar, "Similarity Estimation Techniques from Rounding Algorithms," Proc. 34th Ann. ACM Symp. Theory of Computing (STOC), pp. 380-388, 2002. [20] J.W. Byers, J. Considine, M. Mitzenmacher, and S. Rost, "Informed Content Delivery across Adaptive Overlay Networks," IEEE/ACM Trans. Networking, vol. 12, no. 5, pp. 767-780, Oct. 2004. [21] G. Cormode and S. Muthukrishnan, "What's New: Finding Asignificant Differences in Network Data Streams," IEEE/ACM Trans. Networking, vol. 13, no. 6, pp. 1219-1232, Dec. 2005. [22] R. Schweller, Z. Li, Y. Chen, Y. Gao, A. Gupta, Y. Zhang, P.A. Dinda, M.-Y. Kao, and G. Memik, "Reversible Sketches: Enabling Monitoring and Analysis Over High-Speed Data Streams," IEEE/ACM Trans. Networking, vol. 15, pp. 1059-1072, Oct. 2007. [23] P. Flajolet and G.N. Martin, "Probabilistic Counting Algorithms for Data Base Applications," J. Computer and System Sciences, vol. 31, no. 2, pp. 182-209, 1985. [24] N. Ntarmos, P. Triantafillou, and G. Weikum, "Distributed Hash Sketches: Scalable, Efficient, and Accurate Cardinality Estimation for Distributed Multisets," ACM Trans. Computer Systems, vol. 27, no. 1, pp. 439-442, 2009. [25] N. Ntarmos, P. Triantafillou, and G. Weikum, "Statistical Structures for Internet-Scale Data Management," VLDB J, vol. 18, no. 6, pp. 1279-1312, 2009. [26] N. Ntarmos, P. Triantafillou, and G. Weikum, "Counting at Large: Efficient Cardinality Estimation in Internet-Scale Data Networks," Proc. IEEE 22nd Int'l Conf. Data Eng. (ICDE), 2006. [27] G. Cormode, S. Muthukrishnan, and I. Rozenbaum, "Summarizing and Mining Inverse Distributions on Data Streams via Dynamic Inverse Sampling," Proc. 31st Int'l Conf. Very Large Data Bases (VLDB), 2005. [28] A. Broder and M. Mitzenmacher, "Network Applications of Bloom Filters: A Survey," Internet Math., vol. 1, no. 4, pp. 485-509, 2005. [29] O. Papapetrou, W. Siberski, and W. Nejdl, "Cardinality Estimation and Dynamic Length Adaptation for Bloom Filters," Distributed and Parallel Databases, vol. 28, nos. 2/3, pp. 119-156, 2010. [30] A. Kirsch and M. Mitzenmacher, "Less Hashing, Same Performance: Building a Better Bloom Filter," Proc. 14th Ann. European Symp. Algorithms, pp. 456-467, 2006. [31] B. Godfrey, "Balls and Bins with Structure: Balanced Allocations on Hypergraphs," Proc. 19th Ann. ACM-SIAM Symp. Discrete Algorithms (SODA), pp. 511-517, 2008. [32] C. Lenzen and R. Wattenhofer, "Tight Bounds for Parallel Randomized Load Balancing: Extended Abstract," Proc. 43rd Ann. ACM Symp. Theory of Computing (STOC), pp. 11-20, 2011. [33] S. Dutta, S. Bhattacherjee, and A. Narang, "Perfectly Balanced Allocation with Estimated Average Using Approximately Constant Retries," Proc. Conf. CoRR, 2011.