This Article 
 Bibliographic References 
 Add to: 
The Small World of File Sharing
July 2011 (vol. 22 no. 7)
pp. 1120-1134
Adriana Iamnitchi, University of South Florida, Tampa
Matei Ripeanu, University of British Columbia, Vancouver
Elizeu Santos-Neto, University of British Columbia, Vancouver
Ian Foster, University of Chicago, Chicago
Web caches, content distribution networks, peer-to-peer file-sharing networks, distributed file systems, and data grids all have in common that they involve a community of users who use shared data. In each case, overall system performance can be improved significantly by first identifying and then exploiting the structure of community's data access patterns. We propose a novel perspective for analyzing data access workloads that considers the implicit relationships that form among users based on the data they access. We propose a new structure—the interest-sharing graph—that captures common user interests in data and justify its utility with studies on four data-sharing systems: a high-energy physics collaboration, the Web, the Kazaa peer-to-peer network, and a BitTorrent file-sharing community. We find small-world patterns in the interest-sharing graphs of all four communities. We investigate analytically and experimentally some of the potential causes that lead to this pattern and conclude that user preferences play a major role. The significance of small-world patterns is twofold: it provides a rigorous support to intuition and it suggests the potential to exploit these naturally emerging patterns. As a proof of concept, we design and evaluate an information dissemination system that exploits the small-world interest-sharing graphs by building an interest-aware network overlay. We show that this approach leads to improved information dissemination performance.

[1] J. Abello, P. Pardalos, and M. Resende, "On Maximum Clique Problems in Very Large Graphs," External Memory Algorithms, vol. 50, pp. 119-130, Am. Math. Soc./DIMACS, 1999.
[2] E. Adar and B.A. Huberman, "Free Riding on Gnutella," First Monday, vol. 5, no. 10, 2000.
[3] W. Aiello, F. Chung, and L. Lu, "A Random Graph Model for Massive Graphs," Proc. 32nd Ann. ACM Symp. Theory of Computing, pp. 171-180, , 2000.
[4] R. Albert and A.-L. Barabási, "Statistical Mechanics of Complex Networks," Rev. Modern Physics, vol. 74, pp. 47-97, 2002.
[5] S. Arora, S. Rao, and U. Vazirani, "Expander Flows, Geometric Embeddings and Graph Partitioning," Proc. 36th Ann. ACM Symp. Theory of Computing (STOC '04), pp. 222-231, 2004.
[6] A.-L. Barabási, Linked: The New Science of Networks. Perseus Publishing, 2002.
[7] A.-L. Barabási, R. Albert, and H. Jeong, "Scale-Free Characteristics of Random Networks: The Topology of the World Wide Web," Physica A, vol. 286, pp. 69-77, 2000.
[8] P. Barford, A. Bestavros, A. Bradley, and M. Crovella, "Changes in Web Client Access Patterns Characteristics and Caching Implications," Technical Report BUCS-TR-1998-023, Boston Univ., 1998.
[9] "Bittorrent Community," BitSoup, http:/, 2007.
[10] BitTorrent, http:/, 2010.
[11] B. Bloom, "Space/Time Trade-Offs in Hash Coding with Allowable Errors," Comm. ACM, vol. 13, no. 7, pp. 422-426, 1970.
[12] "Boeing Proxy Logs," boeing boeing.990301-05.notes, 2009.
[13] L. Breslau, P. Cao, L. Fan, G. Phillips, and S. Shenker, "Web Caching Zipf-Like Distributions: Evidence and Implications," Proc. IEEE INFOCOM, 1999.
[14] C. Briquet, X. Dalem, S. Jodogne, and P.-A. de Marneffe, "Scheduling Data-Intensive Bags of Tasks in P2P Grids with Bittorrent-Enabled Data Distribution," Proc. Second Workshop Use of P2P, GRID and Agents for the Development of Content Networks (UPGRADE '07), pp. 39-48, 2007.
[15] A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, and J. Wiener, "Graph Structure in the Web," Computer Networks, vol. 33, nos. 1-6, 2000.
[16] R.F. Cancho and R.V. Solè, "The Small World of Human Language," Proc. Royal Soc. B, vol. 268, pp. 2261-2266, 2001.
[17] X. Cheng and J. Liu, "Nettube: Exploring Social Networks for Peer-to-Peer Short Video Sharing," Proc. IEEE INFOCOM '09, Apr. 2009.
[18] E. Cohen, A. Fiat, and H. Kaplan, "Associative Search in Peer to Peer Networks: Harnessing Latent Semantics," Proc. IEEE INFOCOM, 2003.
[19] "The DZero Experiment," http:/, 2010.
[20] S. Doraimani and A. Iamnitchi, "File Grouping for Scientific Data Management: Lessons from Experimenting with Real Traces," Proc. 17th ACM Symp. High Performance Distributed Computing (HPDC '08), June 2008.
[21] S. Dorogovtsev and J. Mendes, "Evolution of Networks," Advances in Physics, vol. 51, no. 4, pp. 1079-1187, 2002.
[22] E. Tardos, "Technical Perspective: New Developments in Graph Partitioning," Comm. ACM, vol. 51, no. 10, p. 95, 2008.
[23] M. Faloutsos, P. Faloutsos, and C. Faloutsos, "On Power-Law Relationships of the Internet Topology," Proc. ACM SIGCOMM, pp. 251-262, , 1999.
[24] A. Fast, D. Jensen, and B.N. Levine, "Creating Social Networks to Improve Peer-to-Peer Networking," Proc. 11th ACM SIGKDD Int'l Conf. Knowledge Discovery in Data Mining (KDD '05), pp. 568-573, 2005.
[25] F.L. Fessant, S.H. amd Anne-Marie Kermarrec, and L. Massoulie, "Clustering in Peer-to-Peer File Sharing Workloads," Proc. Third Int'l Workshop Peer-to-Peer Systems (IPTPS), Feb. 2004.
[26] A. Ganesh, A. Kermarrec, and L. Massoulie, "Peer-to-Peer Membership Management for Gossip-Based Protocols," IEEE Trans. Computers, vol. 52, no. 2, pp. 139-149, Feb. 2003.
[27] M. Girvan and M. Newman, "Community Structure in Social and Biological Networks," Proc. Nat'l Academy of Sciences, vol. 99, pp. 8271-8276, 2002.
[28] L. Guo, E. Tan, S. Chen, Z. Xiao, and X. Zhang, "Does Internet Media Traffic Really Follow Zipf-Like Distribution?" ACM SIGMETRICS Performance Evaluation Rev., vol. 35, no. 1, pp. 359-360, 2007.
[29] L. Guo, E. Tan, S. Chen, Z. Xiao, and X. Zhang, "The Stretched Exponential Distribution of Internet Media Access Patterns," Proc. 27th ACM Symp. Principles of Distributed Computing (PODC '08), pp. 283-294, 2008.
[30] A. Iamnitchi, "Resource Discovery in Large Resource-Sharing Environments," PhD dissertation, Univ. of Chicago, 2003.
[31] A. Iamnitchi, S. Doraimani, and G. Garzoglio, "Filecules in High-Energy Physics: Characteristics and Impact on Resource Management," Proc. 15th IEEE Int'l Symp. High Performance Distributed Computing (HPDC), pp. 69-80, June 2006.
[32] A. Iamnitchi, S. Doraimani, and G. Garzoglio, "Workload Characterization in a High-Energy Data Grid and Impact on Resource Management," Cluster Computing, vol. 12, pp. 153-173, 2009.
[33] A. Iamnitchi and I. Foster, "Interest-Aware Information Dissemination in Small-World Communities," 14th IEEE Int'l Symp. High Performance Distributed Computing (HPDC), July 2005.
[34] P. Jaccard, "The Distribution of the Flora in the Alpine Zone," The New Phytologist, vol. 11, no. 2, pp. 37-50,, 1912.
[35] H. Kautz, B. Selman, and M. Shah, "Referralweb: Combining the Social Networks and Collaborative Filtering," Comm. ACM, vol. 40, no. 3, pp. 63-65, 1997.
[36] A.-M. Kermarrec, L. Massoulie, and A. Ganesh, "Probabilistic Reliable Dissemination in Large-Scale Systems," IEEE Trans. Parallel and Distributed Systems, vol. 14, no. 3, pp. 248-258, Mar. 2003.
[37] M. Khambatti, K.D. Ryu, and P. Dasgupta, "Structuring Peer-to-Peer Networks Using Interest-Based Communities," Proc. First Int'l Workshop Databases, Information Systems, and Peer-to-Peer Computing (DBISP2P), pp. 48-63, 2003.
[38] S. Khan and L. Tokarchuk, "Interest-Based Self Organization in Group-Structured P2P Networks," Proc. Sixth IEEE Consumer Comm. and Networking Conf. (CCNC '09), pp. 1-5, Jan. 2009.
[39] N. Leibowitz, M. Ripeanu, and A. Wierzbicki, "Deconstructing the Kazaa Network," Proc. IEEE Workshop Internet Applications, 2003.
[40] M. Li, W.-C. Lee, and A. Sivasubramaniam, "Semantic Small World: An Overlay Network for Peer-to-Peer Search," Proc. 12th IEEE Int'l Conf. Network Protocols, (ICNP '04), pp. 228-238, 2004.
[41] L. Loebel-Carpenter, L. Lueking, C. Moore, R. Pordes, J. Trumbo, S. Veseli, I. Terekhov, M. Vranicar, S. White, and V. White, "SAM and the Particle Physics Data Grid," Proc. Computing in High-Energy and Nuclear Physics, 2001.
[42] K. Lua, J. Crowcroft, M. Pias, R. Sharma, and S. Lim, "A Survey and Comparison of Peer-to-Peer Overlay Network Schemes," IEEE Comm. Surveys and Tutorials, vol. 7, no. 2, pp. 72-93, Apr.-June 2005.
[43] M. Newman, "Scientific Collaboration Networks: I. Network Construction and Fundamental Results," Physical Rev. E, vol. 64, no. 1, 2001.
[44] M. Newman, "Scientific Collaboration Networks: II. Shortest Paths, Weighted Networks, and Centrality," Physical Rev. E, vol. 64, 2001.
[45] M. Newman, "The Structure of Scientific Collaboration Networks," Proc. Nat'l Academy of Sciences of the USA, vol. 98, pp. 404-409, 2001.
[46] M. Newman, "The Structure and Function of Complex Networks," SIAM Rev., vol. 45, no. 2, pp. 167-256, 2003.
[47] M. Newman, S. Forrest, and J. Balthrop, "Email Networks and the Spread of Computer Viruses," Physical Rev. E, vol. 66, no. 035101, 2002.
[48] M. Newman, S. Strogatz, and D. Watts, "Random Graphs with Arbitrary Degree Distribution and Their Applications," Physical Rev. E, vol. 64, no. 026118, 2001.
[49] M. Newman, D. Watts, and S. Strogatz, "Random Graph Models of Social Networks," Proc. Nat'l Academy of Sciences of the USA, vol. 99, pp. 2566-2572, 2002.
[50] E. Otoo, D. Rotem, and A. Romosan, "Optimal File-Bundle Caching Algorithms for Data-Grids," Proc. ACM/IEEE Conf. Supercomputing (SC '04), p. 6, 2004.
[51] M.K. Ramanathan, V. Kalogeraki, and J. Pruyne, "Finding Good Peers in Peer-to-Peer Networks," Proc. 16th IEEE Int'l Symp. Parallel and Distributed Processing (IPDPS '02), pp. 24-31, 2002.
[52] A. Rapoport, "Spread of Information through a Population with Socio-Structural Basis: I. Assumption of Transitivity," Bull. Math. Biophysics, vol. 15, pp. 523-543, Dec. 1953.
[53] S. Redner, "How Popular Is Your Paper? An Empirical Study of the Citation Distribution," European Physical J. B, vol. 4, pp. 131-134, 1998.
[54] M. Ripeanu, A. Iamnitchi, and I. Foster, "Mapping the Gnutella Network: Properties of Large-Scale Peer-to-Peer Systems and Implications for System Design," IEEE Internet Computing, vol. 6, no. 1, pp. 50-57, Jan./Feb. 2002.
[55] Y. Saito, C. Karamanolis, M. Karlsson, and M. Mahalingam, "Taming Aggressive Replication in the Pangaea Wide-Area File System," Proc. Fifth Symp. Operating Systems Design and Implementation (OSDI '02), 2002.
[56] E. Santos-Neto, W. Cirne, F. Brasileiro, and A. Lima, "Exploiting Replication and Data Reuse to Efficiently Schedule Data-Intensive Applications on Grids," Proc. Int'l Workshop Job Scheduling Strategies for Parallel Processing, pp. 210-232, June 2004.
[57] E. Santos-Neto, M. Ripeanu, and A. Iamnitchi, "Tracking User Attention in Collaborative Tagging Communities," Proc. Int'l ACM/IEEE Workshop Contextualized Attention Metadata: Personalized Access to Digital Resources, June 2007.
[58] S. Saroiu, P.K. Gummadi, R. Dunn, S.D. Gribble, and H. Levy, "An Analysis of Internet Content Delivery Systems," Proc. Fifth Symp. Operating Systems Design and Implementation (OSDI '02), 2002.
[59] S. Saroiu, P.K. Gummadi, and S.D. Gribble, "A Measurement Study of Peer-to-Peer File Sharing Systems," Proc. Multimedia Computing and Networking (MMCN), 2002.
[60] K. Sripanidkulchai, B. Maggs, and H. Zhang, "Efficient Content Location Using Interest-Based Locality in Peer-to-Peer Systems," Proc. IEEE INFOCOM, 2003.
[61] D. Stutzbach, R. Rejaie, and S. Sen, "Characterizing Unstructured Overlay Topologies in Modern P2P File-Sharing Systems," IEEE/ACM Trans. Networking, vol. 16, no. 2, pp. 267-280, Apr. 2008.
[62] D. Stutzbach and R. Rejaie, "Understanding Churn in Peer-to-Peer Networks," Proc. Sixth ACM SIGCOMM Internet Measurement Conf. (IMC '06), pp. 189-202, 2006.
[63] D. Watts and S. Strogatz, "Collective Dynamics of 'Small-World' Networks," Nature, vol. 393, pp. 440-442, June 1998.
[64] D.J. Watts, Small Worlds: The Dynamics of Networks between Order and Randomness. Princeton Univ. Press, 1999.
[65] W. Willinger, D. Alderson, and J.C. Doyle, "Mathematics and the Internet: A Source of Enormous Confusion and Great Potential," Notices of the Am. Math. Soc., vol. 56, no. 5, , May 2009.

Index Terms:
File sharing, workload characterization, small-world graphs, self-organization, peer-to-peer systems.
Adriana Iamnitchi, Matei Ripeanu, Elizeu Santos-Neto, Ian Foster, "The Small World of File Sharing," IEEE Transactions on Parallel and Distributed Systems, vol. 22, no. 7, pp. 1120-1134, July 2011, doi:10.1109/TPDS.2010.170
Usage of this product signifies your acceptance of the Terms of Use.