Issue No.01 - January (2010 vol.22)
pp: 120-133
Deke Guo , National University of Defense Technology, Changsha
Jie Wu , Florida Atlantic University, Boca Raton
Honghui Chen , National university of Defense Technology, Changsha
Ye Yuan , Northeastern University, Shen Yang
Xueshan Luo , National university of Defense Technology, Chagnsha
A Bloom filter is an effective, space-efficient data structure for concisely representing a set, and supporting approximate membership queries. Traditionally, the Bloom filter and its variants just focus on how to represent a static set and decrease the false positive probability to a sufficiently low level. By investigating mainstream applications based on the Bloom filter, we reveal that dynamic data sets are more common and important than static sets. However, existing variants of the Bloom filter cannot support dynamic data sets well. To address this issue, we propose dynamic Bloom filters to represent dynamic sets, as well as static sets and design necessary item insertion, membership query, item deletion, and filter union algorithms. The dynamic Bloom filter can control the false positive probability at a low level by expanding its capacity as the set cardinality increases. Through comprehensive mathematical analysis, we show that the dynamic Bloom filter uses less expected memory than the Bloom filter when representing dynamic sets with an upper bound on set cardinality, and also that the dynamic Bloom filter is more stable than the Bloom filter due to infrequent reconstruction when addressing dynamic sets without an upper bound on set cardinality. Moreover, the analysis results hold in stand-alone applications, as well as distributed applications.
Bloom filters, dynamic Bloom filters, information representation.
Deke Guo, Jie Wu, Honghui Chen, Ye Yuan, Xueshan Luo, "The Dynamic Bloom Filters", IEEE Transactions on Knowledge & Data Engineering, vol.22, no. 1, pp. 120-133, January 2010, doi:10.1109/TKDE.2009.57
[1] B. Bloom, “Space/Time Tradeoffs in Hash Coding with Allowable Errors,” Comm. ACM, vol. 13, no. 7, pp. 422-426, 1970.
[2] J.K. Mullin, “Optimal Semijoins for Distributed Database Systems,” IEEE Trans. Software Eng., vol. 16, no. 5, pp. 558-560, May 1990.
[3] L.F. Mackert and G.M. Lohman, “R$^\ast$ Optimizer Validation and Performance Evaluation for Distributed Queries,” Proc. 12th Int'l Conf. Very Large Data Bases (VLDB), pp. 149-159, Aug. 1986.
[4] A. Broder and M. Mitzenmacher, “Network Applications of Bloom Filters: A Survey,” Internet Math., vol. 1, no. 4, pp. 485-509, 2005.
[5] J. Kubiatowicz, D. Bindel, Y. Chen, S. Czerwinski, P. Eaton, and D. Geels, “Oceanstore: An Architecture for Global-Scale Persistent Storage,” ACM SIGPLAN Notices, vol. 35, no. 11, pp. 190-201, 2000.
[6] J. Li, J. Taylor, L. Serban, and M. Seltzer, “Self-Organization in Peer-to-Peer System,” Proc. ACM SIGOPS, Sept. 2002.
[7] F.M. Cuena-Acuna, C. Peery, R.P. Martin, and T.D. Nguyen, “PlantP: Using Gossiping to Build Content Addressable Peer-to-Peer Information Sharing Communities,” Proc. 12th IEEE Int'l Symp. High Performance Distributed Computing, pp. 236-249, June 2003.
[8] S.C. Rhea and J. Kubiatowicz, “Probabilistic Location and Routing,” Proc. IEEE INFOCOM, pp. 1248-1257, June 2004.
[9] T.D. Hodes, S.E. Czerwinski, and B.Y. Zhao, “An Architecture for Secure Wide Area Service Discovery,” Wireless Networks, vol. 8, nos. 2/3, pp. 213-230, 2002.
[10] P. Reynolds and A. Vahdat, “Efficient Peer-to-Peer Keyword Searching,” Proc. ACM Int'l Middleware Conf., pp. 21-40, June 2003.
[11] D. Bauer, P. Hurley, R. Pletka, and M. Waldvogel, “Bringing Efficient Advanced Queries to Distributed Hash Tables,” Proc. IEEE Conf. Local Computer Networks, pp. 6-14, Nov. 2004.
[12] L. Fan, P. Cao, J. Almeida, and A. Broder, “Summary Cache: A Scalable Wide Area Web Cache Sharing Protocol,” IEEE/ACM Trans. Networking, vol. 8, no. 3, pp. 281-293, June 2000.
[13] C.D. Peter and M. Panagiotis, “Bloom Filters in Probabilistic Verification,” Proc. Fifth Int'l Conf. Formal Methods in Computer-Aided Design, pp. 367-381, Nov. 2004.
[14] C. Jin, W. Qian, and A. Zhou, “Analysis and Management of Streaming Data: A Survey,” J. Software, vol. 15, no. 8, pp. 1172-1181, 2004.
[15] F. Deng and D. Rafiei, “Approximately Detecting Duplicates for Streaming Data Using Stable Bloom Filters,” Proc. 25th ACM SIGMOD, pp. 25-36, June 2006.
[16] F. Bonomi, M. Mitzenmacher, R. Panigrahy, S. Singh, and G. Varghese, “Beyond Bloom Filters: From Approximate Membership Checks to Approximate State Machines,” Proc. ACM SIGCOMM, pp. 315-326, Sept. 2006.
[17] K. Li and Z. Zhong, “Fast Statistical Spam Filter by Approximate Classifications,” Proc. Joint Int'l Conf. Measurement and Modeling of Computer Systems, SIGMETRICS/Performance, pp. 347-358, June 2006.
[18] M. Mitzenmacher, “Compressed Bloom Filters,” IEEE/ACM Trans. Networking, vol. 10, no. 5, pp. 604-612, 2002.
[19] A. Kirsch and M. Mitzenmacher, “Distance-Sensitive Bloom Filters,” Proc. Eighth Workshop Algorithm Eng. and Experiments (ALENEX '06), Jan. 2006.
[20] A. Kirsch and M. Mitzenmacher, “Building a Better Bloom Filter,” Technical Report tr-02-05.pdf, Dept. of Computer Science, Harvard Univ., Jan. 2006.
[21] A. Kumar, J. Xu, J. Wang, O. Spatschek, and L. Li, “Space-Code Bloom Filter for Efficient Per-Flow Traffic Measurement,” Proc. 23rd IEEE INFOCOM, pp. 1762-1773, Mar. 2004.
[22] S. Cohen and Y. Matias, “Spectral Bloom Filters,” Proc. 22nd ACM SIGMOD, pp. 241-252, June 2003.
[23] R.P. Laufer, P.B. Velloso, and O.C.M.B. Duarte, “Generalized Bloom Filters,” Technical Report Research Report GTA-05-43, Univ. of California, Los Angeles (UCLA), Sept. 2005.
[24] B. Chazelle, J. Kilian, R. Rubinfeld, and A. Tal, “The Bloomier Filter: An Efficient Data Structure for Static Support Lookup Tables,” Proc. Fifth Ann. ACM-SIAM Symp. Discrete Algorithms (SODA), pp. 30-39, Jan. 2004.
[25] F. Hao, M. Kodialam, and T.V. Lakshman, “Building High Accuracy Bloom Filters Using Partitioned Hashing,” Proc. SIGMETRICS/Performance, pp. 277-287, June 2007.
[26] D. Guo, J. Wu, H. Chen, and X. Luo, “Theory and Network Applications of Dynamic Bloom Filters,” Proc. 25th IEEE INFOCOM, Apr. 2006.
[27] M. Xiao, Y. Dai, and X. Li, “Split Bloom Filters,” Chinese J. Electronic, vol. 32, no. 2, pp. 241-245, 2004.
[28] P.S. Almeida, C. Baquero, N.M. Preguiça, and D. Hutchison, “Scalable Bloom Filters,” Information Processing Letters, vol. 101, no. 6, pp. 255-261, 2007.
[29] A. Kirsch and M. Mitzenmacher, “Less Hashing, Same Performance: Building a Better Bloom Filter,” Proc. 14th Ann. European Symp. Algorithms (ESA '06), pp. 456-467, Sept. 2006.
[30] J. Wang, M. Xiao, J. Jiang, and B. Min, “I-DBF: An Improved Bloom Filter Representation Method on Dynamic Set,” Proc. Fifth Int'l Conf. Grid and Cooperative Computing Workshops, pp. 156-162, Sept. 2006.
[31] A. Kent and R.S. Davis, “A Signature File Scheme Based on Multiple Organizations for Indexing Very Large Text Databases,” J. Am. Soc. for Information Science, vol. 41, no. 7, pp. 508-534, 1990.
[32] M. Faloutsos, C. Faloutsos, and P. Faloutsos, “On Power-Law Relationships of the Internet Topology,” Proc. ACM SIGCOMM, pp. 251-262, Aug. 1999.
[33] F. Hao, M. Kodialam, and T.V. Lakshman, “Incremental Bloom Filters,” Proc. IEEE INFOCOM, 2008.
[34] S. Melnik and H.C. Molina, “Adaptive Algorithms for Set Containment Joins,” ACM Trans. Database Systems, vol. 28, pp.56-99, 2003.