Issue No. 05 - May (2007 vol. 18)
<p><b>Abstract</b>—Distributed peer-to-peer systems rely on voluntary participation of peers to effectively manage a storage pool. In such systems, data is generally replicated for performance and availability. If the storage associated with replication is not monitored and provisioned, the underlying benefits may not be realized. Resource constraints, performance scalability, and availability present diverse considerations. Availability and performance scalability, in terms of response time, are improved by aggressive replication, whereas resource constraints limit total storage in the network. Identification and elimination of redundant data pose fundamental problems for such systems. In this paper, we present a novel and efficient solution that addresses availability and scalability with respect to management of redundant data. Specifically, we address the problem of duplicate elimination in the context of systems connected over an <it>unstructured</it> peer-to-peer network in which there is no a priori binding between an object and its location. We propose two randomized protocols to solve this problem in a scalable and decentralized fashion that does not compromise the availability requirements of the application. Performance results using both large-scale simulations and a prototype built on PlanetLab demonstrate that our protocols provide high probabilistic guarantees while incurring minimal administrative overheads.</p>
Peer-to-peer, unstructured networks, duplicate elimination, randomized algorithms.
R. A. Ferreira, A. Grama, M. K. Ramanathan and S. Jagannathan, "Randomized Protocols for Duplicate Elimination in Peer-to-Peer Storage Systems," in IEEE Transactions on Parallel & Distributed Systems, vol. 18, no. , pp. 686-696, 2007.