Resource search or discovery is a fundamental issue in peer-to-peer (P2P) and grid studies. 1 Search objects, or resources, can be cycles, storage spaces, files, services, addresses, and so on. In general, systems are employing three categories of P2P-network architectures to improve search performance: centralized (such as Napster, http://www.napster.com/), decentralized but structured (such as Chord, http://www.pdos.lcs.mit.edu/chord), 2 and decentralized and unstructured (such as Gnutella, http://gnutella.wego.com/). Most popular P2P applications operate in the latter category, and grids are essentially P2P systems. 3 Resource search technology in these complex, fully unstructured and decentralized P2P networks presents a great challenge to the further usability of P2P networks and grids.
In most search methods cited in related literature, peers run uniform forwarding logic; we call these
homogeneous search (HMS) techniques. Although various efficient and scalable HMS techniques exist,
4
8 previous studies show that the performance of any single method isn't perfect on all sides.
9 Therefore, our research mixes those HMS techniques to increase performance; we call this
heterogeneous search (HES). In this article, we explore the feasibility of HES evaluated in the context of timekeeping P2P networks from our prior studies.
10 (See the sidebar "
Timekeeping P2P Networks " for more information.) Our simulation results show HES to have significant advantages over other methods.
The heterogeneous search principle
What's the most essential property of resource discovery in unstructured P2P networks? We believe it's randomicity. Before performing a search, we don't know where the targets are, whether they exist, or which technique is the most suitable for locating them. This implies that no search technique applies to all situations. Some methods show powerful discovery capabilities but are more costly; others are cost-efficient but show varying performance. A specific technique might show better performance in some peers and worse in others during a single retrieving instance, or it might be efficient in exact-matching retrieving and inefficient in wildcard or rich searches. Therefore, to avoid worsening or unstable search performance, it might be rational in these situations to combine multiple search techniques among peers in some way.
A peer's internal structure
Figure 1 illustrates the internal logical structure of a peer employing the HES technique.
Figure 1. Internal structure of a peer using heterogeneous search (HES). The process begins with a request message or query (Q). The Red-killer then filters out worthless queries. The Check & Answer parses Q with resources and forwards unfulfilled queries. The Module-selector chooses a suitable module to employ. Lastly, the Forwarder propagates Q to neighbors. (The solid and dotted lines indicate possible neighbors using different modules.) We use the term module for the implementation of any HMS technique and its forwarding algorithm in a peer. Q indicates an incoming request message from a neighboring peer or a new query originated by this peer. The Modules block is a pool of modules available in a peer. The Red-killer block checks out queries and deletes those that are redundant or worthless. The Check & Answer block parses Q with the Resources pool block to determine whether this peer has the required resources. When a peer forwards Q, the Module-selector block determines which module will be employed from the Modules block. The Forwarder block executes the hit module to propagate Q.
Module switch
Our HES technique depends on intelligently switching existing search techniques or corresponding forwarding algorithms whose function is completed in the Module-selector block. For example, suppose
M = {m
1, m
2,

m
k } denotes the modules set available in the Modules block of the peer G. Logically, the Module-selector block is a switch function
s. After a new request comes in, this function
s determines which module will be used for forwarding the coming request. However, the function
s switches a module probabilistically rather than deterministically. Because of this, we call
s a
probability switch function. When the block Module-selector receives query Q in the peer G, a random number
x in [1,
k] is generated by
s that corresponds to the module superscript in
M. This block then calls the Forwarder block with the parameter
xto finish the forwarding task.
Clearly, the key is how to construct every function s to generate the desirable random number stream in all peers, which should maximize overall search performance. We must map out different forms for this function construction based on different situations and objectives. For the purposes of this article, however, we use a simple probability-switch function called a Linear Probability Switch (LPS), which works as follows:
The sample space of random variable x is the set of all the modules' superscripts in M. We assume each preference probability of the peer G's module is available, that is
LSV = [ p 1, p 2, ¼ p k ] is the preference probability of the i th module, and
LSV the linear switch vector of the peer G. So, the probability switch s can be achieved by generating a random number sequence that follows the LSV in this case.
Now, we can determine that the overall search performance of HES might produce varying characteristics with different LSV combinations in peers. The natural question: Is HES better than HMS?
We found that we can model the search activities with HES in unstructured P2P networks as a noncooperative, or n-player, game. Game theory describes mathematical models of conflicting and cooperative interactions between rational, utility-maximizing decision makers. In this HES game model, each peer is a player that holds some resources, such as files or cycles, that other peers might try to efficiently discover. At the same time, every peer needs to effectively locate the required resources as well. Obviously, every peer wants to improve its search performance, but the only way to do this is to help each other—that is, a peer can't improve its search performance by changing only its own LSV.
We found that when peers try to improve performances by changing their LSVs independently, and all peers in the network take
Nash equilibrium 11 about LSV vectors, those peers using HES perform better than those using any HMS. How do you obtain such a Nash equilibrium? One way might be to review the answering process of past online queries.
Unlike evaluation for generated static topologies and predefined probability distribution for duplicate resources,
4 , 9 we take an approach that's closely related to our time-synchronization research. Peers in many practical P2P overlay networks are highly transient, often create concurring queries, and usually come and go frequently, causing available resources distributions to change in very complicated ways. Hence, it might not be sufficient to study the performances of search techniques over static topology graphs, since the dynamics of unstructured P2P networks is rarely considered there. Instead, we study some more complex retrieving scenarios specially derived from our timekeeping project.
Queries for evaluation
We chose our queries for evaluation from the Timekeeping Overlay Network formation phase—the core stage for solving the autonomous configuration issue in the Network Time Protocol (see the sidebar "
Timekeeping P2P Networks " for more information). A TON is like an ad hoc network during the construction phase. Before a new lonely node joins, it must first find out from the TON's existing nodes the desirable places—that is, the nodes' IP addresses—to plug into. In terms of search technology, these IP addresses are resources or objects that peers retrieve. In our case, when a new node joins the TON successfully, it adds a new resource for subsequent lonely nodes at the same time, since it might become a new object for subsequent new nodes. In other words, the resources amount is dynamically changing during the search period. Besides, a search instance of any peer is believed successful if it finds at least one such address. In our tests, we used a maximum of three IP addresses for each new node for construction. Consequently, if the new node hadn't succeeded in joining the TON, it would have died when the search failed.
What's the determination condition when locating these resources? Usually, each node in the TON has a special variable stratum, according to the NTP. The lower the stratum, the more accurate the node. Usually, several nodes directly synchronized to a standard time base, such as GPS with stratum 1, are called root nodes. Any successive child node increases its stratum value by 1 compared to its parent, and the stratum's upper bound is usually set to 16. Therefore, from this viewpoint, the maximum amount of resource categories in the TON is 16, and each node has only one resource in the TON. For simplicity, the determination condition in our queries is based only on the parameter stratum expectation (SE), which is the expected stratum value a node must find in the TON. For instance, a database server might require an SE of 2, because its clock precision is very important, but an ordinary workstation might require an SE of 12. In practical situations, the probability distribution of variable SE depends on the usage patterns of joining nodes, but here we assume it follows the Gaussian distribution from 2 to 15. If we assume the number of new nodes is N, then our queries for evaluating HES are defined as
Find out IP addresses to join the TON for N new nodes concurrently, whose determination condition is that the stratum of any node exactly equals SE .
How does a new node start its query, or what are the direct neighbors to propagate its query? A search originated from any new peer requires at least one way to set up an initial connection with the TON. We simulated practical scenarios in our evaluation. There are several options for this issue, including media broadcasting and friends introducing. In media broadcasting, nodes nearby the TON might "see" the requests that the new node broadcasts and then answer them. Obviously, media broadcasting's advantage is that it doesn't need any additional information; the disadvantage is unreliability. So, it's possible that some nodes won't be able to join the TON by media broadcasting only. In friends introducing, a lonely node sets up the initial connection through friends, which are some well-known peers in the TON. In our evaluation, peers employed media broadcasting. If it failed, they used friends introducing, but we offered only one available friend node. In our case, we assumed for a new node the number of neighbors that could see the query message varies randomly from 0 to 32, each randomly selected from the TON.
Search techniques tested and metrics
To create our HES, we implemented three basic HMS techniques: flooding (FLD), random-walkers (RDW), and an informed-search technique called
intelligent forward (IFD). (See the sidebar "
Typical Search Techniques " for more information on these and other search methods.) In our RDW, the TTL-based method determines the query failure, and the RDW randomly selects two neighbors—that is, two walkers—to forward requests if possible. The IFD module uses the neighbor-nodes stratum information available in each peer. In the IFD technique, requests in any peer will be passed to neighbors only if the difference between a neighbor stratum and SE is less than the difference between the stratum of the current peer and SE. We assume these three modules are available in every peer, and all peers take the same
LSV. In addition, we combined the iterative deepening technique with all tests.
We've also constructed TON networks under the same conditions but with different search techniques and then observed how many new nodes succeeded in joining the TON and the corresponding cost. We count the metrics on the statistics after N nodes try to join. We use the following metrics:
Success rate per node (SR): rate of the number of nodes that successfully joined the TON to N.
Cost ratio per node (CR): ratio of the number of messages processed per node to the number that use the FLD technique.
To help our comparison, we took the search cost of those using the flooding technique as the baseline of the metric CR.
Simulation
We use the simulation platform Parsec to implement our experiments.
12 Parsec is a C-based simulation language that the Parallel Computing Laboratory at UCLA developed for sequential and parallel execution of discrete-event simulation models. We employed Parsec because of its power to accurately simulate complex dynamic behaviors such as the influence of link latency and peer-capability differences during resources search in an unstructured P2P network.
We used a three-step experiment:
1. Create three root nodes and about 200 nodes as the initial topology of the TON, which is randomly constructed to simulate any possible topology during the initial period.
2. Concurrently join 8,000 new nodes into the TON with more than four different search techniques (three HMS techniques and our HES technique, respectively).
3. Collect data and calculate the SR and CR for the metrics in steps 1 and 2.
Figures 2 and
3 compare performances among four search techniques when combined with the iterative deepening technique (see the sidebar "
Typical Search Techniques " for more information on these search methods). The HES curves take
LSV =[0.1,0.6,0.3] over FLD, RDW and IFD respectively. In the case of
radius D equals 0—that is, no forwarding actions are trigged—repeated experiments show that the metric
SR is about 0.40.
Figure 2. The success rate of four search techniques to radius D. We combined each of the methods with the iterative deepening technique at a uniform expanding pace of 1 starting from 1.
Figure 3. The cost ratio of four search techniques to radius D. We combined each of the methods with the iterative deepening technique at a uniform expanding pace of 1 starting from 1. Figures 2 and
3 indicate that HES's SR metric is very close to FLD, but its CR metric is significantly lower. From this, we can imply several things. First, as we expected, the HES technique has the power to improve search performance by mixing several HMS techniques. Second, although the performance using IFD is similar to HES, its efficiency greatly depends on the available resources-location information—in this case, the neighbor stratum. Based on the LSV in the previous paragraph, we found that compared to IFD only about one-third of the nodes in the HES curve use neighbor stratum information, but HES shows similar performance. On the other hand, IFD performance relies on 100 percent of the resources-location information being available. In many instances, a node might not possess such exact resources location information, such as stratum in the TON. Therefore, this indicates that the HES technique might improve search performance without heavily depending on indices as many informed search techniques do.
Figure 4 compares the performances under different LSV combinations. We can see that tuning the LSV significantly affects HES performance. We can also achieve desirable search performance without using resources-location information (neighbor stratum) when the LSV combination is [0.2,0.8,0]. Although Figure 4 indicates that the random-walkers technique is most suitable in this case, it's not true in many other situations. For instance, when we want to seek faster search response and high success rate at the same time by decreasing radius
D (here the
D takes 10 hops in the figure), the random-walkers technique will definitely perform badly on the search success rate metric. Therefore, the most outstanding advantage of our HES is that it provides the flexibility to trade-off the search success rate and the cost and to avoid the drawbacks of using a single search technique; no single HMS works this way.
Figure 4. HES performances under different LSVs with D = 1. All combine with the iterative deepening technique at a uniform expanding pace of 1 starting from 1. Our simulations show the benefits of HES over HMS in unstructured P2P networks. In our future work, we plan to explore the technique in other types of P2P networks.
The National Natural Science Foundation of China supported this research (grant CNSF 69773040 and 60203021). The authors are grateful to the editors and anonymous reviewers for their helpful comments.
References
- [1] I. Foster and A. Imanitchi,, "On Death, Taxes, and the Convergence of Peer-to-Peer and Grid Computing," Proc. Peer-To-Peer Systems II: 2nd Int'l Workshop (IPTPS '03), LNCS 2735, Springer-Verlag, 2003, pp. 118
128. - [2] S. Ratnasamy, et al., "A Scalable Content-Addressable Network," Proc. ACM Special Interest Group on Data Communications (SIGCOMM 2001), ACM Press, 2001, pp. 161
172. - [3] D. Talia and P. Trunfio,, "Toward a Synergy between P2P and Grids," http://csdl.computer.org/comp/mags/ic/2003/04/w4096abs.htm IEEE Internet Computing, vol. 7, no. 4, 2003, pp. 94
96. - [4] C. Lv, et al., "Search and Replication in Unstructured Peer-to-Peer Networks," Proc. 16th Int'l Conf. Supercomputing, ACM Press, 2002, pp. 84
95. - [5] L.A. Adamic, et al., "Search in Power Law Networks," Physical Rev. E , 2001, vol. 64, no. 4, pp. 46135
46143. - [6] D.A. Menasce,, "Scalable P2P Search," http://csdl.computer.org/comp/mags/ic/2003/02/w2083abs.htm IEEE Internet Computing, vol. 7, no. 2, 2003, pp. 83
87. - [7] P. Marius and S. Aruna,, "Cost-Effective Broadcast for Fully Decentralized Peer-to-Peer Networks," Computer Comm., vol. 26, no. 11, 2003, pp. 1159
1167. - [8] Y. Baryshnikov, et al., "Flood Search under the California Split Rule," Operations Research Letters, vol. 32, no. 3, 2004, pp. 199
206. - [9] D. Tsoumakos and N. Roussopoulos,, "A Comparison of Peer-to-Peer Search Methods," Proc. 6th Int'l Workshop Web and Databases, (WebDB 2003), 2003, pp. 61
66. - [10] B. Xiu-guo, et al., "A Self-Organizing Timekeeping Network," J. China Institute of Comm., vol. 25, no. 1, 2004, pp. 150
157. - [11] J. Nash,, "Noncooperative Games," Math. Ann., vol. 54, no. 2, 1951, pp. 286
295. - [12] R. Bagrodia,, et al., "Parsec: A Parallel Simulation Environment for Complex Systems," http://csdl.computer.org/comp/mags/co/1998/10/rx077abs.htm Computer, vol. 31, no. 10, 1998, pp. 77
85.
Xiuguo Bao is a PhD student at the Computer Network and Information Security Technology Research Center, Harbin Institute of Technology in China. His research interests include peer-to-peer computing, timekeeping for massive networks, and network security. He received his MS in computer science from Yanshan University in China. Contact him at 3 Yuming Ave., Beijing, 100029, China; bxg@pact518.hit.edu.cn.
Binxing Fang is a professor in the Computer Network and Information Security Technology Research Center, Harbin Institute of Technology in China. His research interests include computer networks, network security, and parallel and distributed computing. He received his PhD at the Harbin Institute of Technology. Contact him at 3 Yuming Ave., Beijing, 100029, China; bxfang@pact518.hit.edu.cn.
Mingzhen Hu is a professor at the Computer Network and Information Security Technology Research Center, Harbin Institute of Technology in China. His research interests include computer architecture and parallel and distributed computing. He received his MS at Harbin Institute of Technology in China. Contact him at 3 Yuming Ave., Beijing, 100029, China; mzhu@hit.edu.cn.
Binbin Xu is a masters student at the College of Computer Science at the Beijing University of Technology in China. His research interests include peer-to-peer computing and timekeeping for massive networks. Contact him at 3 Yuming Ave., Beijing, 100029, China; xiaosi_xu@163.com.