The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.04 - April (2012 vol.24)
pp: 692-706
Hanhua Chen , Huazhong University of Science and Technology, Wuhan
Hai Jin , Huazhong University of Science and Technology, Wuhan
Lei Chen , The Hong Kong University of Science and Technology, Hong Kong
Yunhao Liu , TNLIST, Tsinghua University, China and The Hong Kong University of Science and Technology, Hong Kong
Lionel M. Ni , Shanghai Jiao Tong University, Shanghai and The Hong Kong University of Science and Technology, Hong Kong
ABSTRACT
Peer-to-Peer multikeyword searching requires distributed intersection/union operations across wide area networks, raising a large amount of traffic cost. Existing schemes commonly utilize Bloom Filters (BFs) encoding to effectively reduce the traffic cost during the intersection/union operations. In this paper, we address the problem of optimizing the settings of a BF. We show, through mathematical proof, that the optimal setting of BF in terms of traffic cost is determined by the statistical information of the involved inverted lists, not the minimized false positive rate as claimed by previous studies. Through numerical analysis, we demonstrate how to obtain optimal settings. To better evaluate the performance of this design, we conduct comprehensive simulations on TREC WT10G test collection and query logs of a major commercial web search engine. Results show that our design significantly reduces the search traffic and latency of the existing approaches.
INDEX TERMS
Bloom filter, DHT, multikeyword search, P2P.
CITATION
Hanhua Chen, Hai Jin, Lei Chen, Yunhao Liu, Lionel M. Ni, "Optimizing Bloom Filter Settings in Peer-to-Peer Multikeyword Searching", IEEE Transactions on Knowledge & Data Engineering, vol.24, no. 4, pp. 692-706, April 2012, doi:10.1109/TKDE.2011.14
REFERENCES
[1] H.V. Jagadish, B.C. Ooi, and Q.H. Vu, "Baton: A Balanced Tree Structure for Peer-to-Peer Networks," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 661-672, 2005.
[2] T. Suel, C. Mathur, J. wen Wu, J. Zhang, A. Delis, M. Kharrazi, X. Long, and K. Shanmugasundaram, "Odissea: A Peer-to-Peer Architecture for Scalable Web Search and Information Retrieval," Proc. Int'l Workshop Web and Databases (WebDB), 2003.
[3] I. Stoica, R. Morris, D. Karger, F. Kaashoek, and H. Balakrishnan, "Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications," Proc. ACM SIGCOMM, 2001.
[4] B.H. Bloom, "Space/Time Trade-Offs in Hash Coding with Allowable Errors," Comm. ACM, vol. 13, no. 7, pp. 422-426, 1971.
[5] P. Reynolds and A. Vahdat, "Efficient Peer-to-Peer Keyword Searching," Proc. Int'l Conf. Distributed Systems Platforms and Open Distributed Processing (Middleware), 2003.
[6] J. Zhang and T. Suel, "Efficient Query Evaluation on Large Textual Collections in a Peer-to-Peer Environment," Proc. IEEE Int'l Conf. Peer-to-Peer Computing (P2P), 2005.
[7] D. Hawking, "Overview of the TREC-9 Web Track," Proc. Text REtrieval Conf. (TREC-9), 2000.
[8] F.M. Cuenca-Acuna, C. Peery, R.P. Martin, and T.D. Nguyen, "Planetp: Using Gossiping to Build Content Addressable Peer-to-Peer Information Sharing Communities," Proc. IEEE Int'l Symp. High Performance Distributed Computing (HPDC), 2003.
[9] B. Yang and H. Garcia-Molina, "Designing a Super-Peer Network," Proc. Int'l Conf. Data Eng. (ICDE), 2003.
[10] H.T. Shen, Y.F. Shu, and B. Yu, "Efficient Semantic-Based Content Search in P2P Network," IEEE Trans. Knowledge and Data Eng., vol. 16, no. 7, pp. 813-826, July 2004.
[11] K. Sripanidkulchai, B. Maggs, and H. Zhang, "Efficient Content Location Using Interest-Based Locality in Peer-to-Peer Systems," Proc. IEEE INFOCOM, 2003.
[12] P. Haase, J. Broekstra, M. Ehrig, M. Menken, P. Mika, M. Olko, M. Plechawski, P. Pyszlak, B. Schnizler, R. Siebes, S. Staab, and C. Tempich, "Bibster - A Semantics-Based Bibliographic Peer-to-Peer System," Proc. Int'l Semantic Web Conf. (ISWC), 2004.
[13] J. Lu and J.P. Callan, "Content-Based Retrieval in Hybrid Peer-to-Peer Networks," Proc. Conf. Information and Knowledge Management (CIKM), 2003.
[14] M. Li, W.-C. Lee, and A. Sivasubramaniam, "Semantic Small World: An Overlay Network for Peer-to-Peer Search," Proc. IEEE Int'l Conf. Network Protocols (ICNP), 2004.
[15] C. Tang and S. Dwarkadas, "Hybrid Global-Local Indexing for Efficient Peer-to-Peer Information Retrieval," Proc. Networked Systems Design and Implementation (NSDI), 2004.
[16] S. Robertson, "Understanding Inverse Document Frequency: On Theoretical Arguments for Idf," J. Documentation, vol. 60, pp. 503-520, 2004.
[17] O.D. Gnawali, "A Keyword-Set Search System for Peer-to-Peer Networks," master's thesis, Massachusetts Inst. of Tech nology, 2002.
[18] I. Podnar, M. Rajman, T. Luu, F. Klemm, and K. Aberer, "Scalable Peer-to-Peer Web Retrieval with Highly Discriminative Keys," Proc. IEEE Int'l Conf. Data Eng. (ICDE), 2007.
[19] G. Skobeltsyn, T. Luu, I.P. Zarko, M. Rajman, and K. Aberer, "Web Text Retrieval with a P2P Query-Driven Index," Proc. Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR), 2007.
[20] M. Bender, S. Michel, P. Triantafillou, G. Weikum, and C. Zimmer, "P2P Content Search: Give the Web Back to the People," Proc. Int'l Workshop Peer-to-Peer System (IPTPS), 2006.
[21] B.T. Loo, J.M. Hellerstein, R. Huebsch, S. Shenker, and I. Stoica, "Enhancing P2P File-Sharing with an Internet-Scale Query Processor," Proc. Int'l Conf. Very Large Data Bases (VLDB), 2004.
[22] A. Rao, K. Lakshminarayanan, S. Surana, R.M. Karp, and I. Stoica, "Load Balancing in Structured P2P Systems," Proc. Int'l Workshop Peer-to-Peer System (IPTPS), 2003.
[23] A. Broder and M. Mitzenmacher, "Network Applications of Bloom Filters: A Survey," Internet Math., vol. 1, no. 4, pp. 484-509, 2005.
[24] S. Nath, P.B. Gibbons, S. Seshan, and Z.R. Anderson, "Synopsis Diffusion for Robust Aggregation in Sensor Networks," Proc. Int'l Conf. Embedded Networked Sensor Systems (SenSys), 2004.
[25] D. Kempe, A. Dobra, and J. Gehrke, "Gossip-Based Computation of Aggregation Information," Proc. Ann. IEEE Symp. Foundations of Computer Science (FOCS), 2003.
[26] P. Flajolet and G.N. Martin, "Probabilistic Counting Algorithms for Data Base Applications," J. Computer and System Sciences, vol. 31, pp. 182-209, 1985.
[27] The Gnutella Protocol Specification 0.6, 2002.
[28] Limewire, http:/www.limewire.com, 2010.
[29] H. Tangmunarunkit, R. Govindan, S. Jamin, S. Shenker, and W. Willinger, "Network Topology Generators: Degree-Based vs. Structural," Proc. ACM SIGCOMM, 2002.
[30] Brite, http://www.cs.bu.edubrite/, 2010.
[31] Y. Liu, X. Liu, L. Xiao, L.M. Ni, and X. Zhang, "Location-Aware Topology Matching in P2P Systems," Proc. IEEE INFOCOM, 2004.
[32] C. Huang, J. Li, and W. Ross, "Can Internet Video-on-Demand Be Profitable?," Proc. ACM SIGCOMM, 2007.
[33] N.F. Huang, R. Liu, C.H. Chen, Y.T. Chen, and L.W. Huang, "A Fast Url Lookup Engine for Content-Aware Multi-Gigabit Switches," Proc. Int'l Conf. Advanced Information Networking and Applications (AINA), 2005.
19 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool