Subscribe

Issue No.05 - May (2010 vol.22)

pp: 651-664

Yunhao Liu , Hong Kong University of Science and Technolgy, Hong Kong

Xiangyang Li , Hangzhou Dianzi University, Hangzhou and Illinois Institute of Technology, Chicago

Panlong Yang , P.L.A. University of Science and Technology, Nanjing

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.209

ABSTRACT

Bloom filter is effective, space-efficient data structure for concisely representing a data set and supporting approximate membership queries. Traditionally, researchers often believe that it is possible that a Bloom filter returns a false positive, but it will never return a false negative under well-behaved operations. By investigating the mainstream variants, however, we observe that a Bloom filter does return false negatives in many scenarios. In this work, we show that the undetectable incorrect deletion of false positive items and detectable incorrect deletion of multiaddress items are two general causes of false negative in a Bloom filter. We then measure the potential and exposed false negatives theoretically and practically. Inspired by the fact that the potential false negatives are usually not fully exposed, we propose a novel Bloom filter scheme, which increases the ratio of bits set to a value larger than one without decreasing the ratio of bits set to zero. Mathematical analysis and comprehensive experiments show that this design can reduce the number of exposed false negatives as well as decrease the likelihood of false positives. To the best of our knowledge, this is the first work dealing with both the false positive and false negative problems of Bloom filter systematically when supporting standard usages of item insertion, query, and deletion operations.

INDEX TERMS

Bloom filter, false negative, multichoice counting Bloom filter.

CITATION

Yunhao Liu, Xiangyang Li, Panlong Yang, "False Negative Problem of Counting Bloom Filter",

*IEEE Transactions on Knowledge & Data Engineering*, vol.22, no. 5, pp. 651-664, May 2010, doi:10.1109/TKDE.2009.209REFERENCES

- [1] B. Bloom, "Space/Time Tradeoffs in Hash Coding with Allowable Errors,"
Comm. ACM, vol. 13, no. 7, pp. 422-426, 1970.- [2] A. Broder and M. Mitzenmacher, "Network Applications of Bloom Filters: A Survey,"
Internet Math., vol. 1, no. 4, pp. 485-509, 2005.- [3] J.K. Mullin, "Optimal Semijoins for Distributed Database Systems,"
IEEE Trans. Software Eng., vol. 16, no. 5, pp. 558-560, May 1990.- [4] L. Fan, P. Cao, J. Almeida, and A. Broder, "Summary Cache: A Scalable Wide Area Web Cache Sharing Protocol,"
IEEE/ACM Trans. Networking, vol. 8, no. 3, pp. 281-293, June 2000.- [5] J. Li, J. Taylor, L. Serban, and M. Seltzer, "Self-Organization in Peer-to-Peer System,"
Proc. 10th ACM SIGOPS European Workshop, Sept. 2002.- [6] S.C. Rhea and J. Kubiatowicz, "Probabilistic Location and Routing,"
Proc. IEEE INFOCOM, pp. 1248-1257, June 2004.- [7] A. Kumar, J. Xu, and E.W. Zegura, "Effcient and Scalable Query Routing for Unstructured Peer-to-Peer Networks,"
Proc. IEEE INFOCOM, pp. 1162-1173, Mar. 2005.- [8] F. Deng and D. Rafiei, "Approximately Detecting Duplicates for Streaming Data Using Stable Bloom Filters,"
Proc. 25th ACM SIGMOD, pp. 25-36, June 2006.- [9] F. Bonomi, M. Mitzenmacher, R. Panigrahy, S. Singh, and G. Varghese, "Beyond Bloom Filters: From Approximate Membership Checks to Approximate State Machines,"
Proc. ACM SIGCOMM, pp. 315-326, Sept. 2006.- [10] K. Li and Z. Zhong, "Fast Statistical Spam Filter by Approximate Classifications,"
Proc. SIGMETRICS/Performance, pp. 347-358, June 2006.- [11] M. Mitzenmacher, "Compressed Bloom Filters,"
IEEE/ACM Trans. Networking, vol. 10, no. 5, pp. 604-612, Oct. 2002.- [12] A. Kirsch and M. Mitzenmacher, "Distance-Sensitive Bloom Filters,"
Proc. Eighth Workshop Algorithm Eng. and Experiments (ALENEX '06), Jan. 2006.- [13] A. Kumar, J. Xu, J. Wang, O. Spatschek, and L. Li, "Space-Code Bloom Filter for Efficient Per-Flow Traffic Measurement,"
Proc. 23rd IEEE INFOCOM, pp. 1762-1773, Mar. 2004.- [14] S. Cohen and Y. Matias, "Spectral Bloom Filters,"
Proc. 22nd ACM SIGMOD, pp. 241-252, June 2003.- [15] R.P. Laufer, P.B. Velloso, and O.C.M.B. Duarte, "Generalized Bloom Filters," Technical Report GTA-05-43, Univ. of California, Los Angeles (UCLA), Sept. 2005.
- [16] B. Chazelle, J. Kilian, R. Rubinfeld, and A. Tal, "The Bloomier Filter: An Efficient Data Structure for Static Support Lookup Tables,"
Proc. Fifth Ann. Symp. Discrete Algorithms (SODA), pp. 30-39, Jan. 2004.- [17] D. Guo, J. Wu, H. Chen, and X. Luo, "Theory and Network Applications of Dynamic Bloom Filters,"
Proc. 25th IEEE INFOCOM, Apr. 2006.- [18] S. Lumetta and M. Mitzenmacher, "Using the Power of Two Choices to Improve Bloom Filters," http://www.eecs.harvard. edu/michaelmpostscripts /, 2009.
- [19] M. Jimeno, K. Christensen, and A. Roginsky, "A Power Management Proxy with a New Best-of-n Bloom Filter Design to Reduce False Fositives,"
Proc. 26th IEEE Int'l Performance Computing and Comm. Conf. (IPCCC), Apr. 2007.- [20] P.S. Almeida, C. Baquero, N.M. Preguiça, and D. Hutchison, "Scalable Bloom Filters,"
Information Processing Letters, vol. 101, no. 6, pp. 255-261, 2007.- [21] D. Benoit, B. Bruno, and F. Timur, "Retouched Bloom Filters: Allowing Networked Applications to Trade Off Selected False Positives against False Negatives,"
Proc. ACM Conf. Emerging Network Experiment and Technology (CoNEXT), Sept. 2006.- [22] Y. Zhu and H. Jiang, "False Rate Analysis of Bloom Filter Replicas in Distributed Systems,"
Proc. 35th Int'l Conf. Parallel Processing (ICPP), pp. 255-262, Aug. 2006.- [23] D. Forsgren, U. Jennehag, and P. Osterberg, "Objective End-to-End QoS Gain from Packet Prioritization and Layering in MPEG-2 Streaming Video," http://amp.ece.cmu.edu/packetvideo2002/papers 61-ananhseors.pdf, 2010.
- [24] T. Karargiannis, A. Broido, M. Faloutsos, and K.C. Claffy, "Transport Layer Identification of P2P Traffic,"
Proc. ACM SIGCOMM, 2004.- [25] K. Thomson, G.J. Miller, and R. Wilder, "Wide-Area Traffic Patterns and Characteristics,"
IEEE Network, vol. 11, no. 6, pp. 10-23, Nov./Dec. 1997.- [26] M. Mitzenmacher, "The Power of Two Choices in Randomized Load Balancing,"
IEEE Trans. Parallel and Distributed Systems, vol. 12, no. 10, pp. 1094-1104, Oct. 2001.- [27] A. Kirsch and M. Mitzenmacher, "Less Hashing, Same Performance: Building a Better Bloom Filter,"
Proc. 14th Ann. European Symp. Algorithms, pp. 456-467, 2006.- [28] T. Karargiannis, A. Broido, M. Faloutsos, and K.C. Claffy, "Transport Layer Identification of P2P Traffic,"
Proc. ACM SIGCOMM, Sept. 2004. |