The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.07 - July (2010 vol.59)
pp: 969-980
Hanhua Chen , Huazhong University of Science and Technology, China
Jun Yan , Microsoft Research Asia
Hai Jin , Huazhong University of Science and Technology, China
Yunhao Liu , Hong Kong University of Science and Technology
Lionel M. Ni , Hong Kong University of Science and Technology
ABSTRACT
Previous multikeyword search in DHT-based P2P systems often relies on multiple single keyword search operations, suffering from unacceptable traffic cost and poor accuracy. Precomputing term-set-based index can significantly reduce the cost but needs exponentially growing index size. Based on our observations that 1) queries are typically short and 2) users usually have limited interests, we propose a novel index pruning method, called TSS. By solely publishing the most relevant term sets from documents on the peers, TSS provides comparable search performance with a centralized solution, while the index size is reduced from exponential to the scale of O(nlog(n)). We evaluate this design through comprehensive trace-driven simulations using the TREC WT10G data collection and the query log of a major commercial search engine.
INDEX TERMS
Peer-to-peer, multikeyword searching, ranking.
CITATION
Hanhua Chen, Jun Yan, Hai Jin, Yunhao Liu, Lionel M. Ni, "TSS: Efficient Term Set Search in Large Peer-to-Peer Textual Collections", IEEE Transactions on Computers, vol.59, no. 7, pp. 969-980, July 2010, doi:10.1109/TC.2010.81
REFERENCES
[1] FIPS 180-1, Secure Hash Standard, Department of Commerce/NIST, National Technical Information Service, 1995.
[2] Iprospect, http:/www.iprospect.com/, 2009.
[3] Limewire, http:/www.limewire.com, 2009.
[4] M. Bender, S. Michel, P. Triantafillou, G. Weikum, and C. Zimmer, "P2P Content Search: Give the Web Back to the People," Proc. Int'l Workshop Peer-to-Peer Systems (IPTPS), 2006.
[5] H. Chen, H. Jin, Y. Liu, and L.M. Ni, "Difficulty-Aware Hybrid Search in Peer-to-Peer Networks," IEEE Trans. Parallel and Distributed Systems, vol. 20, no. 1, pp. 71-82, Jan. 2009.
[6] H. Chen, H. Jin, J. Wang, L. Chen, Y. Liu, and L.M. Ni, "Efficient Multi-Keyword Search over P2P Web," Proc. Int'l World Wide Web Conf. (WWW), 2008.
[7] E. Cohen and S. Shenker, "Replication Strategies in Unstructured Peer-to-peer Networks," Proc. ACM SIGCOMM, 2002.
[8] F.M. Cuenca-Acuna, C. Peery, R.P. Martin, and T.D. Nguyen, "PlanetP: Using Gossiping to Build Content Addressable Peer-to-Peer Information Sharing Communities," Proc. IEEE Int'l Symp. High Performance Distributed Computing (HPDC), 2003.
[9] B. Fan, J.C.S. Lui, and D.-M. Chiu, "The Design Trade-Offs of BitTorren-Like File Sharing Protocols," IEEE/ACM Trans. Networking, vol. 17, no. 2, pp. 365-376, Apr. 2009.
[10] P. Flajolet and G.N. Martin, "Probabilistic Counting Algorithms for Data Base Applications," J. Computer and System Sciences, vol. 31, pp. 182-209, 1985.
[11] O.D. Gnawali, "A Keyword-Set Search System for Peer-to-Peer Networks," Master's thesis, MIT, 2002.
[12] D. Hawking and P. Thistlewaite, "Relevance Weighting Using Distance between Term Occurrences," technical report, Dept. of Computer Science, The Australian Nat'l Univ., 1996.
[13] D. Kempe, A. Dobra, and J. Gehrke, "Gossip-Based Computation of Aggregate Information," Proc. IEEE Symp. Foundations of Computer Science (FOCS), 2003.
[14] D. Li, J. Cao, X. Lu, and K. Chen, "Efficient Range Query Processing in Peer-to-Peer Systems," IEEE Trans. Knowledge and Data Eng., vol. 21, no. 1, pp. 78-91, Jan. 2009.
[15] J. Li, B.T. Loo, J.M. Hellerstein, M.F. Kaashoek, D.R. Karger, and R. Morris, "On the Feasibility of Peer-to-Peer Web Indexing and Search," Proc. Int'l Workshop Peer-to-Peer Systems (IPTPS), 2003.
[16] M. Li, W.-C. Lee, A. Sivasubramaniam, and J. Zhao, "SSW: A Small World Based Overlay for Peer-to-Peer Search," IEEE Trans. Distributed and Parallel Systems, vol. 19, no. 6, pp. 735-749, June 2008.
[17] X. Lou and K. Hwang, "Collusive Piracy Prevention in P2P Content Delivery Networks," IEEE Trans. Computers, vol. 58, no. 7, pp. 970-983, July 2009.
[18] J. Lu and J. Callan, "Content-Based Retrieval in Hybrid Peer-to-Peer Networks," Proc. Int'l Conf. Information and Knowledge Management (CIKM), 2003.
[19] X. Luo, Z. Qin, J. Han, and H. Chen, "DHT-Assisted Probabilistic and Exhaustive Search in Unstructured P2P Networks," Proc. IEEE Int'l Symp. Parallel and Distributed Processing (IPDPS), 2008.
[20] S. Nath, P.B. Gibbons, S. Seshan, and Z.R. Anderson, "Synopsis Diffusion for Robust Aggregation in Sensor Networks," Proc. ACM Int'l Conf. Embedded Networked Sensor Systems (SenSys), 2004.
[21] M. Plechawski, P. Pyszlak, B.R. Schnizler, R. Siebes, S. Staab, and C. Tempich, "Bibster—A Semantics-Based Bibliographic Peer-to-Peer System," Proc. Int'l Semantic Web Conf. (ISWC), 2004.
[22] K.P.N. Puttaswamy, H. Zheng, and B.Y. Zhao, "Securing Structured Overlays against Identity Attacks," IEEE Trans. Parallel and Distributed Systems, vol. 20, no. 10, pp. 1487-1498, Oct. 2009.
[23] P. Reynolds and A. Vahdat, "Efficient Peer-to-Peer Keyword Searching," Proc. Int'l Conf. Middleware, 2003.
[24] G. Salton and C. Buckley, "Term Weighting Approaches in Automatic Text Retrieval," Information Processing and Management, vol. 24, pp. 513-523, 1988.
[25] S. Saroiu, P.K. Gummadi, R.J. Dunn, S.D. Gribble, and H.M. Levy, "An Analysis of Internet Content Delivery Systems," Proc. Symp. Operating Systems Design and Implementation (OSDI), 2002.
[26] H. Shen and C.-Z. Xu, "Locality-Aware and Churn-Resilient Load-Balancing Algorithms in Structured Peer-to-Peer Networks," IEEE Trans. Parallel and Distributed Systems, vol. 18, no. 6, pp. 849-862, June 2007.
[27] H.T. Shen, Y.F. Shu, and B. Yu, "Efficient Semantic-Based Content Search in P2P Network," IEEE Trans. Knowledge and Data Eng., vol. 16, no. 7, pp. 813-826, July 2004.
[28] K. Sripanidkulchai, B. Maggs, and H. Zhang, "Efficient Content Location Using Interest-Based Locality in Peer-to-Peer Systems," Proc. IEEE INFOCOM, 2003.
[29] I. Stoica, R. Morris, D. Liben-Nowell, D.R. Karger, M.F. Kaashoek, F. Dabek, and H. Balakrishnan, "Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications," IEEE/ACM Trans. Networking, vol. 11, no. 1, pp. 17-32, Feb. 2003.
[30] C. Tang and S. Dwarkadas, "Hybrid Global-Local Indexing for Efficient Peer-to-Peer Information Retrieval," Proc. USENIX Symp. Networked Systems Design and Implementation (NSDI), 2004.
[31] W.W. Terpstra, J. Kangasharju, C. Leng, and A.P. Buchmann, "Bubblestorm: Resilient, Probabilistic, and Exhaustive Peer-to-Peer Search," Proc. ACM SIGCOMM, 2007.
[32] E.M. Voorhees, "Overview of TREC-2007," Proc. 16th Text Retrieval Conf. (TREC-9), 2007.
[33] J.R. Wen, J.Y. Nie, and H.J. Zhang, "Query Clustering Using User Logs," ACM Trans. Information Systems, vol. 20, no. 1, pp. 59-81, 2002.
[34] M. Zaharia and S. Keshav, "Gossip-Based Search Selection in Hybrid Peer-to-Peer Networks," Proc. Int'l Workshop Peer-to-Peer Systems (IPTPS), 2006.
17 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool