P2P Directories for Distributed Web Search: From Each According to His Ability, to Each According to His Needs
2013 IEEE 29th International Conference on Data Engineering Workshops (ICDEW) (2006)
Apr. 3, 2006 to Apr. 7, 2006
Matthias Bender , Max-Planck-Institut fur Informatik
Gerhard Weikum , Max-Planck-Institut fur Informatik
Sebastian Michel , Max-Planck-Institut fur Informatik
A compelling application of peer-to-peer (P2P) system technology would be distributed Web search, where each peer autonomously runs a search engine on a personalized local corpus (e.g., built from a thematically focused Web crawl) and peers collaborate by routing queries to remote peers that can contribute many or particularly good results for these specific queries. Such systems typically rely on a decentralized directory, e.g., built on top of a distributed hash table (DHT), that holds compact, aggregated statistical metadata about the peers which is used to identify promising peers for a particular query. To support an a-priori unlimited number of peers, it is crucial to keep the load on the distributed directory low. Moreover, each peer should ideally tailor its postings to the directory to reflect its particular strengths, such as rich information about specialized topics that no or only few other peers would also cover. This paper addresses this problem by proposing strategies for peers that identify suitable subsets of the most beneficial statistical metadata. We argue that posting a carefully selected subset of metadata can achieve almost the same result quality as a complete metadata directory, for only the most relevant peers are eventually involved in the execution of a given query. Additionally, asking only relevant peers will result in higher precision, as the noise introduced by poor peers is reduced. We have implemented these strategies in our fully operational P2P Web search prototype Minerva, and present experimental results on real-world Web data that show the viability of the strategies and their gains in terms of high search result quality at low networking costs.
Matthias Bender, Gerhard Weikum, Sebastian Michel, "P2P Directories for Distributed Web Search: From Each According to His Ability, to Each According to His Needs", 2013 IEEE 29th International Conference on Data Engineering Workshops (ICDEW), vol. 00, no. , pp. 51, 2006, doi:10.1109/ICDEW.2006.110