This Article 
 Bibliographic References 
 Add to: 
A Data Mining Algorithm for Generalized Web Prefetching
September/October 2003 (vol. 15 no. 5)
pp. 1155-1169

Abstract—Predictive Web prefetching refers to the mechanism of deducing the forthcoming page accesses of a client based on its past accesses. In this paper, we present a new context for the interpretation of Web prefetching algorithms as Markov predictors. We identify the factors that affect the performance of Web prefetching algorithms. We propose a new algorithm called WM o, which is based on data mining and is proven to be a generalization of existing ones. It was designed to address their specific limitations and its characteristics include all the above factors. It compares favorably with previously proposed algorithms. Further, the algorithm efficiently addresses the increased number of candidates. We present a detailed performance evaluation of WM o with synthetic and real data. The experimental results show that WM o can provide significant improvements over previously proposed Web prefetching algorithms.

[1] V. Almeida, A. Bestavros, M. Crovella, and A. de Oliveira, Characterizing Reference Locality in the WWW Proc. IEEE Conf. Parallel and Distributed Information Systems (IEEE PDIS '96), pp. 92-103, Dec. 1996.
[2] P. Atzeni, G. Mecca, and P. Merialdo, "To Weave the Web," Proc. 23th VLDB Conf., 1997, pp. 206-215.
[3] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proc. 1994 Int'l Conf. Very Large Data Bases, pp. 487-499, Sept. 1994.
[4] R. Agrawal and R. Srikant, “Mining Sequential Patterns,” Proc. 1995 Int'l Conf. Data Eng., pp. 3-14, Mar. 1995.
[5] C. Aggarwal, J. Wolf, and P. Yu, "Caching on the World Wide Web," IEEE Trans. Knowledge and Data Eng., vol. 11, no. 1, 1999, pp. 94-107.
[6] P. Barford and M. Crovella, “Generating Representative Web Workloads for Network and Server Performance Evaluation,” Proc. ACM SIGMETRICS '98, pp. 151-160, July 1998.
[7] A. Bestavros, “Speculative Data Dissemination and Service to Reduce Server Load, Network Traffic and Service Time in Distributed Information Systems,” Proc. Int'l Conf. Data Eng., Mar. 1996.
[8] B. Berendt and M. Spiliopoulou, Analysis of Navigation Behavior in Web Sites Integrating Multiple Information Systems The VLDB J., vol. 9, no. 1, pp. 56-75, May 2000.
[9] M. Crovella and P. Barford, “The Network Effects of Prefetching,” Proc. Conf. Computer Comm. INFOCOM '98, 1998.
[10] P. Cao and S. Irani, Cost-Aware WWW Proxy Caching Algorithms Proc. 1997 USENIX Symp. Internet Technologies and Systems (USITS '97), pp. 193-206, Jan. 1997.
[11] E. Cohen, B. Krishnamurthy, and J. Rexford, Improving End-to-End Performance of the Web Using Server Volumes and Proxy Filters Proc. ACM Conf. Applications, Technologies, Architectures and Protocols for Computer Comm. (ACM SIGCOMM '98), pp. 241-253, Aug. 1998.
[12] K.M. Curewitz, P. Krishnan, and J.S. Vitter, Practical Prefetching via Data Compression Proc. ACM Conf. Management of Data (ACM SIGMOD '93), pp. 257-266, June 1993.
[13] C. Liu and P. Cao, "Maintaining Strong Cache Consistency for the World-Wide Web," IEEE Trans. Computers, vol. 47, no. 4, Apr. 1998, pp. 445-457.
[14] R. Cooley, B. Mobasher, and J. Srivastava, Data Preparation for Mining World Wide Web Browsing Patterns Knowledge and Information Systems (KAIS), vol. 1, no. 1, pp. 5-32, Feb. 1999.
[15] M.-S. Chen, J.-S. Park, and P.S. Yu, “Efficient Data Mining for Path Traversal Patterns,” IEEE Trans. Knowledge and Data Eng., vol. 10, no. 2, pp. 209-221, Apr. 1998.
[16] P. Cao, J. Zhang, and K. Beach, Active Cache: Caching Dynamic Contents on the Web Proc. IFIP Conf. Distributed Systems Platforms and Open Distributed Processing (Middleware '98), pp 373-388, Sept. 1998.
[17] M. Deshpande and G. Karypis, Selective Markov Models for Predicting Web-Page Accesses Proc. SIAM Int'l Conf. Data Mining (SDM '01), Apr. 2001.
[18] D. Duchamp, Prefetching Hyperlinks Proc. USENIX Symp. Internet Technologies and Systems (USITS '99), Oct. 1999.
[19] L. Fan et al., "Web Prefetching between Low-Bandwidth Clients and Proxies: Potential and Performance," Proc. ACM SIGMetrics Conf. Measurement and Modeling of Computer Systems, ACM Press, 1999, pp. 178-187.
[20] M.F. Fernandez, D. Florescu, A.Y. Levy, and D. Suciu, Declarative Specification of Web Sites with Strudel The VLDB J., vol. 9, no. 1, pp. 38-55, May 2000.
[21] J. Griffioen and R. Appleton, Reducing File System Latency Using a Predictive Approach Proc. 1994 USENIX Ann. Technical Conf. (USENIX '95), pp. 197-207, Jan. 1995.
[22] B. Huberman, P. Pirolli, J. Pitkow, and R. Lukose, Strong Regularities in World Wide Web Surfing Science, vol. 280, pp. 95-97, Apr. 1998.
[23] S. Jin and A. Bestavros, Sources and Characteristics of Web Temporal Locality Proc. IEEE/ACM Symp. Modeling, Analysis and Simulation of Computer and Telecomm. Systems (MASCOTS '2000), Aug. 2000.
[24] R. Klemm, WebCompanion: A Friendly Client-Side Web Prefetching Agent IEEE Trans. Knowledge and Data Eng., vol. 11, no. 4, pp. 577-594, July/Aug. 1999.
[25] T. Kroeger, D.E. Long, and J. Mogul, Exploring the Bounds of Web Latency Reduction from Caching and Prefetching Proc. USENIX Symp. Internet Technologies and Systems (USITS '97), pp. 13-22, Jan. 1997.
[26] B. Lan, S. Bressan, B.C. Ooi, and Y. Tay, Making Web Servers Pushier Proc. Workshop Web Usage Analysis and User Profiling (WEBKDD '99), Aug. 1999.
[27] B. Lan, S. Bressan, B.C. Ooi, and K. Tan, Rule-Assisted Prefetching in Web-Server Caching Proc. ACM Int'l Conf. Information and Knowledge Management (ACM CIKM '00), pp. 504-511, Nov. 2000.
[28] H. Mannila, H. Toivonen, and A.I. Verkamo, “Discovery of Frequent Episodes in Event Sequences,” Data Mining and Knowledge Discovery, vol. 1, pp. 259-289, 1997.
[29] A. Nanopoulos and Y. Manolopoulos, Finding Generalized Path Patterns for Web Log Data Mining Proc. East European Conf. Advances in Databases and Information Systems (ADBIS '00), pp. 215-228, Sept. 2000.
[30] A. Nanopoulos and Y. Manolopoulos, Mining Patterns from Graph Traversals Data and Knowledge Eng. (DKE), vol. 37, no. 3, pp. 243-266, June 2001.
[31] J. Pei, J. Han, B. Mortazavi-Asl, and H. Zhu, Mining Access Patterns Efficiently from Web Logs Proc. Pacific-Asia Conf. Knowledge Discovery and Data Mining (PAKDD '00), Apr. 2000.
[32] R.H. Patterson, G. Gibson, E. Ginting, D. Stodolsky, and J. Zelenka, "Informed Prefetching and Caching," Proc. 15th ACM Symp. Operating Systems Principles, pp. 79-95, Dec. 1995.
[33] V. Padmanabhan and J. Mogul, Using Predictive Prefetching to Improve World Wide Web Latency ACM SIGCOMM Computer Comm. Rev., vol. 26, no. 3, July 1996.
[34] T. Palpanas and A. Mendelzon, Web Prefetching Using Partial Match Prediction Proc. Fourth Web Caching Workshop (WCW '99), Mar. 1999.
[35] J. Pitkow and P. Pirolli, Mining Longest Repeating Subsequences to Predict World Wide Web Surfing Proc. USENIX Symp. Internet Technologies and Systems (USITS '99), Oct. 1999.
[36] R. Sarukkai, Link Prediction and Path Analysis Using Markov Chains Computer Networks, vol. 33, nos. 1-6, pp. 377-386, June 2000.
[37] J. Shim, P. Scheuermann, and R. Vingralek, "Proxy Cache Design: Algorithms, Implementation, and Performance," IEEE Trans. Knowledge and Data Eng., vol. 11, no. 4, 1999, pp. 549-562.
[38] Z. Wang and J. Crowcroft, Prefetching in World Wide Web Proc. IEEE Global Internet Conf., pp. 28-32, Nov. 1996.

Index Terms:
Prefetching, prediction, Web mining, association rules, data mining.
Alexandros Nanopoulos, Dimitrios Katsaros, Yannis Manolopoulos, "A Data Mining Algorithm for Generalized Web Prefetching," IEEE Transactions on Knowledge and Data Engineering, vol. 15, no. 5, pp. 1155-1169, Sept.-Oct. 2003, doi:10.1109/TKDE.2003.1232270
Usage of this product signifies your acceptance of the Terms of Use.