This Article 
 Bibliographic References 
 Add to: 
Enhancing Search Performance on Gnutella-Like P2P Systems
December 2006 (vol. 17 no. 12)
pp. 1482-1495

Abstract—The big challenges facing the search techniques on Gnutella-like peer-to-peer networks are search efficiency and quality of search results. In this paper, leveraging information retrieval (IR) algorithms such as Vector Space Model (VSM) and relevance ranking algorithms, we present GES (Gnutella with Efficient Search) to improve search performance. The key idea is that GES uses a distributed topology adaptation algorithm to organize semantically relevant nodes into same semantic groups by using the notion of node vector. Given a query, GES employs an efficient search protocol to direct the query to the most relevant semantic groups for answers, thereby achieving high recall with probing only a small fraction of nodes. To the best of our knowledge, GES is the first to identify node vector size as an important role in impacting search performance and to show that the node vector size offers a good trade-off between search performance and bandwidth cost. Moreover, GES adopts automatic query expansion and local data clustering to improve search performance. We show that GES is efficient and even outperforms the centralized node clustering system SETS. For example, in the scenario where node capacity is heterogeneous, GES can achieve 73 percent recall when probing only 20 percent nodes, outperforming SETS by about 18 percent.

[1] I. Stoica, R. Morris, D. Karger, M. Kaashoek, and H. Balakrishnan, “Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications,” Proc. ACM SIGCOMM, pp. 149-160, Aug. 2001.
[2] A. Rowstron and P. Druschel, “Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems,” Proc. 18th IFIP/ACM Int'l Conf. Distributed System Platforms (Middleware), pp. 329-350, Nov. 2001.
[3] S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Shenker, “A Scalable Content-Addressable Network,” Proc. ACM SIGCOMM, pp. 161-172, Aug. 2001.
[4] B.Y. Zhao, J.D. Kubiatowicz, and A.D. Joseph, “Tapestry: An Infrastructure for Fault-Tolerance Wide-Area Location and Routing,” Technical Report UCB/CSD-01-1141, Computer Science Division, Univ. of California, Berkeley, Apr. 2001.
[5] J. Li, B.T. Loo, J. Hellerstein, F. Kaashoek, D.R. Karger, and R. Morris, “On the Feasibility of Peer-to-Peer Web Indexing and Search,” Proc. Second Int'l Workshop Peer-to-Peer Systems (IPTPS), pp. 207-215, Feb. 2003.
[6] P. Reynolds and A. Vahdat, “Efficient Peer-to-Peer Keyword Searching,” Proc. ACM/IFIP/USENIX Int'l Middleware Conf. (Middleware), pp. 21-40, June 2003.
[7] C. Tang, Z. Xu, and S. Dwarkadas, “Peer-to-Peer Information Retrieval Using Self-Organizing Semantic Overlay Networks,” Proc. ACM SIGCOMM, pp. 175-186, Aug. 2003.
[8] Y. Zhu, H. Wang, and Y. Hu, “Integrating Semantics-Based Access Mechanisms with P2P File Systems,” Proc. Third Int'l Conf. Peer-to-Peer Computing, pp. 118-125, Sept. 2003.
[9] Q. Lv, P. Cao, and E. Cohen, “Search and Replication in Unstructured Peer-to-Peer Networks,” Proc. 16th ACM Ann. Int'l Conf. Supercomputing (ICS), pp. 84-95, June 2002.
[10] E. Cohen, H. Kaplan, and A. Fiat, “Associative Search in Peer to Peer Networks: Harnessing Latent Semantics,” Proc. IEEE INFOCOM, vol. 2, pp. 1261-1271, Apr. 2003.
[11] A. Crespo and H. Garcia-Molina, “Routing Indices for Peer-to-Peer Systems,” Proc. 22nd IEEE Int'l Conf. Distributed Computing Systems (ICDCS), pp. 23-32, July 2002.
[12] K. Spripanidkulchai, B. Maggs, and H. Zhang, “Efficient Content Location Using Interest-Based Locality in Peer-to-Peer Systems,” Proc. IEEE INFOCOM, vol. 3, pp. 2166-2176, Mar. 2003.
[13] M. Bawa, G. Manku, and P. Raghavan, “SETS: Search Enhanced by Topic Segmentation,” Proc. 26th Ann. Int'l ACM SIGIR Conf., pp.306-313, July 2003.
[14] M. Mitra, A. Singhal, and C. Buckley, “Improving Automatic Query Expansion,” Proc. ACM SIGIR, pp. 206-214, 1998.
[15] C. Tang and S. Dwarkadas, “Hybrid Global-Local Indexing for Efficient Peer-to-Peer Information Retrieval,” Proc. USENIX/ACM Symp. Networked Systems Design and Implementation (NSDI), Mar. 2004.
[16] C. Tang, S. Dwarkadas, and Z. Xu, “On Scaling Latent Semantic Indexing for Large Peer-to-Peer Systems,” Proc. 27th Ann. Int'l ACM SIGSIR Conf., July 2004.
[17] N. Ntarmos and P. Triantafillou, “AESOP: Altruism-Endowed Self-Organizing Peers,” Proc. Second Int'l Workshop Databases, Information Systems, and Peer-to-Peer Computing, pp. 151-165, Aug. 2004.
[18] A. Gupta, B. Liskov, and R. Rodrigues, “Efficient Routing for Peer-to-Peer Overlays,” Proc. First Symp. Networked Systems Design and Implementation (NSDI), Mar. 2004.
[19] S. Rhea, D. Geels, T. Roscoe, and J. Kubiatowicz, “Handling Churn in a DHT,” Proc. 2004 USENIX Technical Conf., June 2004.
[20] C.H. Ng and K.C. Sia, “Peer Clustering and Firework Query Model,” Proc. World Wide Web Conf. (WWW), May 2002.
[21] P. Triantafillou, C. Xiruhaki, M. Koubarakis, and N. Ntarmos, “Toward High Performance Peer-to-Peer Content and Resource Sharing Systems,” Proc. CIDR, Jan. 2003.
[22] F.M. Cuenca-Acuna, C. Peery, R.P. Martin, and T.D. Nguyen, “PlanetP: Using Gossiping to Build Content Addressable Peer-to-Peer Information Sharing Communities,” Proc. 12th IEEE Int'l Symp. High Performance Distributed Computing (HPDC), June 2003.
[23] J. Kubiatowicz, D. Bindel, Y. Chen, P. Eaton, D. Geels, R. Gummadi, S. Rhea, H. Weatherspoon, W. Weimer, C. Wells, and B. Zhao, “Oceanstore: An Architecture for Global-Scale Persistent Storage,” Proc. ACM ASPLOS, Nov. 2000.
[24] B.T. Loo, R. Huebsch, I. Stoica, and J.M. Hellerstein, “The Case for a Hybrid P2P Search Infrastructure,” Proc. Third Int'l Workshop Peer-to-Peer Systems (IPTPS), Feb. 2004.
[25] Y. Chawathe, S. Ratnasamy, L. Breslau, N. Lanham, and S. Shenker, “Making Gnutella-Like P2P Systems Scalable,” Proc. ACM SIGCOMM, pp. 407-418, Aug. 2003.
[26] T. Hernandez and S. Kambhampati, “Improving Text Collection Selection with Coverage and Overlap Statistics,” Proc. 14th Int'l World Wide Web Conf. (WWW), May 2005.
[27] M. Bender, S. Michel, P. Triantafillou, G. Weikum, and C. Zimmer, “Improving Collection Selection with Overlap Awareness in P2P Search Engines,” Proc. 28th Int'l ACM SIGIR Conf., Aug. 2005.
[28] M.W. Berry, Z. Drmac, and E.R. Jessup, “Matrices, Vector Spaces, and Information Retrieval,” SIAM Rev., vol. 41, no. 2, pp. 335-362, 1999.
[29] H. Schutze and C. Silverstein, “A Comparison of Projections for Efficient Document Clustering,” Proc. ACM SIGIR, pp. 74-81, July 1997.
[30] R. Lempel and S. Moran, “Optimizing Result Prefetching in Web Search Engines with Segmented Indices,” Proc. Conf. Very Large Data Bases (VLDB), 2001.
[31] Proc. Text Retrieval Conf. (TREC), http:/, July 2005.
[32] Reuters Corpus, Nov. 2000, andstandards corpus.
[33] C. Buckley, “Implementation of the Smart Information Retrieval System,” Technical Report TR85-686, Dept. of Computer Science, Cornell Univ., May 1985.
[34] S. Saroiu, K.P. Gummadi, and S.D. Gribble, “Measuring and Analyzing the Characteristics of Napster and Gnutella Hosts,” Multimedia Systems J., vol. 9, pp. 170-184, Aug. 2003.

Index Terms:
Peer-to-peer, topology adaptation, biased walk, semantic group, node vector, recall, information retrieval.
Yingwu Zhu, Yiming Hu, "Enhancing Search Performance on Gnutella-Like P2P Systems," IEEE Transactions on Parallel and Distributed Systems, vol. 17, no. 12, pp. 1482-1495, Dec. 2006, doi:10.1109/TPDS.2006.173
Usage of this product signifies your acceptance of the Terms of Use.