This Article 
 Bibliographic References 
 Add to: 
Building a Large and Efficient Hybrid Peer-to-Peer Internet Caching System
June 2004 (vol. 16 no. 6)
pp. 754-769

Abstract—Proxy hit ratios tend to decrease as the demand and supply of Web contents are becoming more diverse. By case studies, we quantitatively confirm this trend and observe significant document duplications among a proxy and its client browsers' caches. One reason behind this trend is that the client/server Web caching model does not support direct resource sharing among clients, causing the Web contents and the network bandwidths among clients to be relatively underutilized. To address these limits and improve Web caching performance, we have extensively enhanced and deployed our browsers-aware framework, a peer-to-peer Web caching management scheme. We make the browsers and their proxy share the contents to exploit the neglected but rich data locality in browsers and reduce document duplications among the proxy and browsers' caches to effectively utilize the Web contents and network bandwidth among clients. The objective of our scheme is to improve the scalability of proxy-based caching both in the number of connected clients and in the diversity of Web documents. In this paper, we show that building such a caching system with considerations of sharing contents among clients, minimizing document duplications, and achieving data integrity and communication anonymity is not only feasible but also highly effective.

[1] M. Abrams, C.R. Standridge, G. Abdulla, S. Williams, and E.A. Fox, Caching Proxies: Limitations and Potentials Proc. Fourth Int'l World Wide Web Conf., Dec. 1995.
[2] S. Albers, S. Arora, and S. Khanna, Page Replacement for General Caching Problems Proc. 10th Ann. ACM-SIAM Symp. Discrete Algorithms (SODA '99), pp. 31-40, 1999.
[3] P. Barford, A. Bestavros, A. Bradley, and M. Crovella, Changes in Web Client Access Patterns: Characteristics and Caching Implications World Wide Web J., vol. 2, no. 1, pp. 15-28, Jan. 1999.
[4] L.A. Belady, A Study of Replacement Algorithms for Virtual Storage Computers IBM Systems J., vol. 5, pp. 78-101, 1966.
[5] L. Breslau, P. Cao, L. Fan, G. Phillips, and S. Shenker, Web Caching and Zipf-Like Distributions: Evidence and Implications Proc. Infocom '99, Mar. 1999.
[6] Boeing log files,, 2003.
[7] BU traces, -traces.tar.gz and trace-98.gz , 2003.
[8] P. Cao and S. Irani, Cost-Aware WWW Proxy Caching Algorithms Proc. USENIX Symp. Internet Technologies and Systems, Dec. 1997.
[9] H. Che, Z. Wang, and Y. Tung, Analysis and Design of Hierarchical Web Caching Systems Proc. IEEE INFOCOM 2001, Apr. 2001.
[10] E. Cohen and H. Kaplan, LP-Based Analysis of Greedy-Dual-Size Proc. 10th Ann. ACM-SIAM Symp. Discrete Algorithms, pp. 879-880, Jan. 1999.
[11] M.D. Dahlin, R.Y. Wang, T.E. Anderson, and D.A. Patterson, Cooperative Caching: Using Remote Client Memory to Improve File System Performance Proc. First Symp. Operating Systems Design and Implementation, Nov. 1994.
[12] B.M. Duska, D. Marwood, and M.J. Feeley, The Measured Access Characteristics of World-Wide-Web Client Proxy Caches Proc. USENIX Symp. Internet Technologies and Systems, Dec. 1997.
[13] S.G. Dykes and K.A. Robbins, A Viability Analysis of Cooperative Proxy Caching Proc. IEEE INFOCOM 2001, Apr. 2001.
[14] L. Fan, P. Cao, J. Almeida, and A.Z. Broder, Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol Proc. 1998 SIGCOMM Conf., pp. 254-265, 1998.
[15] E. Gabber, P. Gibbons, D. Kristol, Y. Matias, and A. Mayer, Consistent, Yet Anonymous, Web Access with LPWA Comm. ACM, vol. 42, no. 2, pp. 42-47, Feb. 1999.
[16] E. Gabber, P. Gibbons, Y. Matias, and A. Mayer, How to Make Personalized Web Browsing Simple, Secure, and Anonymous Proc. Conf. Financial Cryptography, 1997.
[17] S. Gadde, M. Rabinovich, and J. Chase, "Reduce, Reuse, Recycle: An Approach to Building Large Internet Caches," Proc. Sixth Workshop on Hot Topics in Operating Systems, IEEE Computer Soc. Press, Los Alamitos, Calif., 1997.
[18] L. Gong, "JXTA: A Network Programming Environment," IEEE Internet Computing, vol. 5, no. 3, May/June 2001, pp. 88-95.
[19] S.D. Gribble and E.A. Brewer, System Design Issues for Internet Middleware Services: Deductions from a Large Client Trace Proc. 1997 Usenix Symp. Internet Technologies and Systems, Dec. 1997.
[20] S. Jin and A. Bestavros, “Popularity-Aware Greedydual-Size Web Proxy Caching Algorithms,” Proc. 20th IEEE Int'l Conf. Distributed Computing Systems (ICDCS), pp. 254-261, Apr. 2000.
[21] S. Irani, Page Replacement with Multi-Size Pages and Applications to Web Caching Proc. 29th Ann. ACM Symp. Theory of Computing (STOC '97), pp. 701-710, 1997.
[22] S. Iyer, A. Rowstron, and P. Druschel, Squirrel: A Decentralized Peer-to-Peer Web Caching Proc. 21st ACM Symp. Principles of Distributed Computing, 2002.
[23] T. Kelly, Thin-Client Web Access Patterns: Measurements from a Cache-Busting Proxy Computer Comm. vol. 25, pp. 357-366, 2002.
[24] M.R. Korupolu and M. Dahlin, Coordinated Placement and Replacement for Large-Scale Distributed Cached IEEE Trans. Knowledge and Data Eng., vol. 13, 2001.
[25] M.R. Korupolu, C.G. Plaxton, and R. Rajaraman, Placement Algorithms for Hierarchical Cooperative Caching Proc. 10th Ann. ACM-SIAM Symp. Discrete Algorithms (SODA '99), pp. 586-595, Jan. 1999.
[26] A. Mahanti, C. Williamson, and D. Eager, “Traffic Analysis of a Web Proxy Caching Hierarchy,” IEEE Network Magazine, pp. 16-23, May/June 2000.
[27] R. Malpani, J. Lorch, and D. Berger, Making World Wide Web Caching Servers Cooperate Proc. Fourth Int'l World Wide Web Conf., Dec. 1995.
[28] A.J. Menezes, P.C. Van Oorschot, and S.A. Vanstone, Handbook of Applied Cryptography. CRC Press, 1996.
[29] Nat'l Lab Applied Network Research,http:/, Sanitized access logs:ftp://ircache.nlanr.netTraces/, and Statistics:, 2003.
[30] K. Psounis and B. Prabhakar, "A Randomized Web-Cache Replacement Scheme," IEEE Infocom 2001, IEEE Press, Piscataway, N.J., 2001.
[31] R. Rivest, The MD5 Message-Digest Algorithm Internet RFC/STD/FYI/BCP Archives, request for comments: 1321, (http://www., Apr. 1992.
[32] A. Rousskov and V. Soloviev, A Performance Study of the Squid Proxy on HTTP/1.0 World Wide Web, vol. 2, nos. 1-2, pp. 47-67, Jan. 1999. Also available at On Performance of Caching Proxies, Proc. SIGMETRICS '98, pp. 272-273, 1998.
[33] R. Tewari, M. Dahlin, H.M. Vin, and J. Kay, Beyond Hierarchies: Design Considerations for Distributed Caching on the Internet Proc. 19th Int'l Conf. Distributed Computing Systems (ICDCS), June 1999.
[34] A. Wolman, G. Voelker, N. Sharma, N. Cardwell, M. Brown, T. Landray, D. Pinnel, A. Karlin, and H. Levy, Organization-Based Analysis of Web-Object Sharing and Caching Proc. Second USENIX Symp. Internet Technologies and Systems, Oct. 1999.
[35] A. Wolman, G. Voelker, N. Sharma, N. Cardwell, A. Karlin, and H. Levy, On the Scale and Performance of Cooperative Web Proxy Caching Proc. 17th ACM Symp. Operating System Principles (SOSP), pp. 16-31, Dec. 1999.
[36] Z. Xiao and K.P. Birman, Providing Efficient, Robust Error Recovery through Randomization Proc. Int'l Workshop Applied Reliable Group Comm., (jointly held with the21st Int'l Conf. Distributed Computing Systems), Apr. 2001.
[37] L. Xiao and X. Zhang, Exploiting Neglected Data Locality in Browsers Proc. 10th Int'l World Wide Web Conf. (WWW10), May 2001. (an extended abstract)
[38] L. Xiao, Z. Xu, and X. Zhang, Low Cost and Reliable Mutual Anonymity Protocols in Peer-to-Peer Networks IEEE Trans. Parallel and Distributed Systems, vol. 14, no. 9, pp. 829-840, Sept. 2003.
[39] L. Xiao, X. Zhang, and Z. Xu, A Reliable and Scalable Peer-to-Peer Web Document Sharing System Proc. 2002 Int'l Parallel and Distributed Processing Symp., (IPDPS '2002), 2002.
[40] J. Yang, W. Wang, and R. Muntz, Collaborative Web Caching Based on Proxy Affinities Proc. ACM SIGMETRICS 2000, pp. 78-89, June 2000.
[41] P.S. Yu and E.A. MacNair, Performance Study of a Collaborative Method for Hierarchical Caching in Proxy Servers Proc. Seventh Int'l World Wide Web Conf., Apr. 1998.
[42] H. Zhu and T. Yang, Class-Based Cache Management for Dynamic Web Content Proc. IEEE INFOCOM, pp. 1215-1224, Apr. 2001.

Index Terms:
Internet systems, peer-to-peer systems, proxy caching, browser caching, data integrity, communication anonymity.
Li Xiao, Xiaodong Zhang, Artur Andrzejak, Songqing Chen, "Building a Large and Efficient Hybrid Peer-to-Peer Internet Caching System," IEEE Transactions on Knowledge and Data Engineering, vol. 16, no. 6, pp. 754-769, June 2004, doi:10.1109/TKDE.2004.1
Usage of this product signifies your acceptance of the Terms of Use.