This Article 
 Bibliographic References 
 Add to: 
Efficient Semantic-Based Content Search in P2P Network
July 2004 (vol. 16 no. 7)
pp. 813-826

Abstract—Most existing Peer-to-Peer (P2P) systems support only title-based searches and are limited in functionality when compared to today's search engines. In this paper, we present the design of a distributed P2P information sharing system that supports semantic-based content searches of relevant documents. First, we propose a general and extensible framework for searching similar documents in P2P network. The framework is based on the novel concept of Hierarchial Summary Structure. Second, based on the framework, we develop our efficient document searching system by effectively summarizing and maintaining all documents within the network with different granularity. Finally, an experimental study is conducted on a real P2P prototype, and a large-scale network is further simulated. The results show the effectiveness, efficiency, and scalability of the proposed system.

[1] S. Berchtold and D.A. Keim, Indexing High-Dimensional Spaces: Database Support for Next Decade's Applications ACM Computing Surveys, vol. 33, no. 3, pp. 322-373, 2001.
[2] C. Buckley, A. Singhal, M. Mitra, and G. Salton, New Retrieval Approaches Using Smart Proc. Text REtrieval Conf. (TREC 4), pp. 25-48, 1995.
[3] A. Crespo and H. Garcia-Molina, "Routing Indices for Peer-to-Peer Systems," Proc. 22nd Int'l Conf. Distributed Computing Systems (ICDCS'02), IEEE CS Press, 2002, pp. 23–35.
[4] F.M. Cuenca-Acuna and T.D. Nguyen, Text-Based Content Search and Retrieval in Ad Hoc P2P Communities Proc. Int'l Workshop Peer-to-Peer Computing, 2002.
[5] Freenet, http:/, 2000.
[6] Gnutella, http:/, 2000.
[7] Napster, http:/, 2001.
[8] SVD Package, http://www.netlib.orgsvdpack/, 2004.
[9] C. Palmer and J. Steffan, Generating Network Topologies that Obey Power Law Proc. GLOBECOM, 2000.
[10] C.H. Papadimitriou, H. Tamaki, P. Raghavan, and S. Vempala, Latent Semantic Indexing: A Probabilistic Analysis Proc. Symp. Principles of Database Systems (PODS), 1998.
[11] S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Shenker, A Scalable Content-Addressable Network Proc. SIGCOMM, 2001.
[12] I. Stoica, R. Morris, D. Karger, F. Kaashoek, and H. Balakrishnan, Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications Proc. SIGCOMM, 2001.
[13] P. Triantafillou, C. Xiruhaki, M. Koubarakis, and N. Ntarmos, Towards High Performance Peer-to-Peer Content and Resource Sharing Systems Proc. Conf. Innovative Data Systems Research (CIDR), 2003.
[14] R. Weber, H. Schek, and S. Blott, A Quantitative Analysis and Performance Study for Similarity Search Methods in High Dimensional Spaces Proc. Conf. Very Large Databases (VLDB), pp. 194-205, 1998.
[15] S.K.M. Wong, W. Ziarko, V.V. Raghavan, and P.C.N. Wong, On Modeling of Information Retrieval Concepts in Vector Spaces IEEE Trans. Database Systems (TODS), 1987
[16] B. Yang and H. Garcia-Molina, Comparing Hybrid Peer-to-Peer Systems Proc. Conf. Very Large Databases (VLDB), 2001.
[17] B. Yang and H. Garcia-Molina, Improving Efficiency of Peer-to-Peer Search Proc. 28th Int'l Conf. Distributed Computing Systems, 2002.
[18] B. Yang and H. Garcia-Molina, Designing a Super-Peer Network Proc. Int'l Conf. Data Engineering (ICDE), 2003.

Index Terms:
Content-based, similarity search, peer-to-peer, hierarchical summary, indexing.
Heng Tao Shen, Yanfeng Shu, Bei Yu, "Efficient Semantic-Based Content Search in P2P Network," IEEE Transactions on Knowledge and Data Engineering, vol. 16, no. 7, pp. 813-826, July 2004, doi:10.1109/TKDE.2004.1318564
Usage of this product signifies your acceptance of the Terms of Use.