This Article 
 Bibliographic References 
 Add to: 
A Keyword-Based Semantic Prefetching Approach in Internet News Services
May 2004 (vol. 16 no. 5)
pp. 601-611

Abstract—Prefetching is an important technique to reduce the average Web access latency. Existing prefetching methods are based mostly on URL graphs. They use the graphical nature of HTTP links to determine the possible paths through a hypertext system. Although the URL graph-based approaches are effective in prefetching of frequently accessed documents, few of them can prefetch those URLs that are rarely visited. This paper presents a keyword-based semantic prefetching approach to overcome the limitation. It predicts future requests based on semantic preferences of past retrieved Web documents. We apply this technique to Internet news services and implement a client-side personalized prefetching system: NewsAgent. The system exploits semantic preferences by analyzing keywords in URL anchor text of previously accessed documents in different news categories. It employs a neural network model over the keyword set to predict future requests. The system features a self-learning capability and good adaptability to the change of client surfing interest. NewsAgent does not exploit keyword synonymy for conservativeness in prefetching. However, it alleviates the impact of keyword polysemy by taking into account server-provided categorical information in decision-making and, hence, captures more semantic knowledge than term-document literal matching methods. Experimental results from daily browsing of ABC News, CNN, and MSNBC news sites for a period of three months show an achievement of up to 60 percent hit ratio due to prefetching.

[1] V. Almeida, A. Bestavros, M. Crovella, and A. de Oliveira, Characterizing Reference Locality in the WWW Proc. IEEE Conf. Parallel and Distributed Information Systems (IEEE PDIS '96), pp. 92-103, Dec. 1996.
[2] P. Barford, A. Bestavros, A. Bradley, and M. Crovella, Changes in Web Client Access Patterns: Characteristics and Caching Implications World Wide Web: Special Issue on Characterization and Performance Evaluation, vol. 2, pp. 15-28, 1999.
[3] S. Chakrabarti, M. van der Berg, and B. Dom, Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery Proc. Eighth Int'l World Wide Web Conf. (WWW8), pp. 545-562, May 1999.
[4] M. Crovella and P. Barford, “The Network Effects of Prefetching,” Proc. Conf. Computer Comm. INFOCOM '98, 1998.
[5] D. Culler, J. Singh, and A. Gupta, Parallel Computer Architecture: A Hardware/Software Interface. Morgan Kaufmann, 1998.
[6] C.R. Cunha and C.F.B. Jaccoud, Determining WWW User's Next Access and Its Application to Prefetching Proc. Int'l Symp. Computers and Comm., pp. 6-11, July 1997.
[7] K.M. Curewitz, P. Krishnan, and J. Vitter, Practical Prefetching via Data Compression Proc. SIGMOD '93, pp. 257-266, May 1993.
[8] B. Davison, Simultaneous Proxy Evaluation Proc. Fourth Int'l Web Caching Workshop, pp. 170-178, Mar. 1999.
[9] B. Davison, A Survey of Proxy Cache Evaluation Techniques Proc. Fourth Int'l Web Caching Workshop, pp. 67-77, Mar. 1999.
[10] S. Deerwester et al. Indexing by Latent Semantic Analysis J. Am. Soc. Information Science, vol. 41, no. 6, pp. 391-407, 1990.
[11] S. Decker et al., "The Semantic Web: The roles of XML and RDF,": IEEE Internet Computing; vol. 4, no. 5, Sept./Oct. 2000, pp. 63-74.
[12] L. Fan, P. Cao, W. Lin, and Q. Jacobson, Web Prefetching between Low-Bandwidth Clients and Proxies: Potential and Performance Proc. SIGMETRICS '99, pp. 178-187, May 1999.
[13] E.J. Glover, G.W. Falke, S. Lawrence, W. Birmingham, A. Kruger, C. Giles, and D. Pennock, Improving Category Specific Web Search by Learning Query Modifications Proc. Symp. Applications and the Internet (SAINT 2001), Jan. 2001.
[14] J. Griffioen and R. Appleton, Reducing File System Latency Using a Predictive Approach Proc. 1994 Summer USENIX Conf., pp. 197-207, June 1994.
[15] H. Hassoun, Fundamentals of Artificial Neural Networks. The MIT Press, 1995.
[16] T. Ibrahim and C. Xu, Neural Network Based Prefetching to Tolerate WWW Latency Proc. 20th IEEE Conf. Distributed Computing Systems, pp. 636-643, Apr. 2000.
[17] ImsiSoft, NetAccelerator Product Overview, available athttp:/, 2003.
[18] R. Klemm, WebCompanion: A Friendly Client-Side Web Prefetching Agent IEEE Trans. Knowledge and Data Eng., vol. 11, no. 4, pp. 577-594, July/Aug. 1999.
[19] T. Kroeger, D. Long, and J. Mogul, Exploring the Bounds of Web Latency Reduction from Caching and Prefetching Proc. USENIX Symp. Internet Technologies and Systems, pp. 13-22, Dec. 1997.
[20] G. Kuenning and G. Popek, Automated Hoarding for Mobile Computers Proc. ACM Symp. Operating Systems Principles, pp. 264-275, Oct. 1997.
[21] D. Lee, Methods for Web Bandwidth and Response Time Improvement World Wide Web: Beyond the Basics, M. Abrams, ed., chapter 25, Prentice Hall, Apr. 1998.
[22] H. Lei and D. Duchamp, An Analytical Approach to File Prefetching Proc. USENIX 1997 Ann. Technical Conf., pp. 275-288, Jan. 1997.
[23] T.S. Loon and V. Bharghavan, Alleviating the Latency and Bandwidth Problems in WWW Browsing Proc. USENIX Symp. Internet Technologies and Systems, pp. 219-230, Dec. 1997.
[24] M. Kobayashi and K. Takeda, Information Retrieval on the Web ACM Computing Surveys, vol. 32, no. 2, pp. 144-173, June 2000.
[25] Macca Soft, WebMirror Product Overview, available athttp:/, 2002.
[26] E.P. Markatos and C.E. Chronaki, A Top-10 Approach to Prefetching on the Web Proc. INET '98, July 1998.
[27] A. McCallum, K. Nigam, J. Rennie, and K. Seymore, Building Domain-Specific Search Engines with Machine Learning Techniques Proc. AAAI Spring Symp. Intelligent Agents in Cyberspace, 1999.
[28] V. Padmanabhan and J. Mogul, Using Predictive Prefetching to Improve World Wide Web Latency Computer Comm. Rev., vol. 26, no. 3, pp. 22-36, July 1996.
[29] R. Sarukkai, Link Prediction and Path Analysis Using Markov Chains Proc. Ninth Int'l World Wide Web Conf., 2000.
[30] S. Schechter, M. Krishnan, and M. Smith, Using Path Profiles to Predict HTTP Requests Proc. Seventh Int'l World Wide Web Conf., also appeared in Computer Networks and ISDN Systems, vol. 20, pp. 457-467, 1998.
[31] F. Sebastiani, Machine Learning in Automated Text Categorization ACM Computing Surveys, vol. 34, no. 1, pp. 1-47, Mar. 2002.
[32] E. Shriver and C. Small, Why Does File System Prefetching Work? Proc. 1999 USENIX Ann. Technical Conf., June 1999.
[33] Z. Su, Q. Yang, and H. Zhang, A Prediction System for MultiMedia Prefetching in Internet Proc. ACM Multimedia, 2000.

Index Terms:
NewsAgent, neural networks, personalized news service, prefetching, semantic locality.
Cheng-Zhong Xu, Tamer I. Ibrahim, "A Keyword-Based Semantic Prefetching Approach in Internet News Services," IEEE Transactions on Knowledge and Data Engineering, vol. 16, no. 5, pp. 601-611, May 2004, doi:10.1109/TKDE.2004.1277820
Usage of this product signifies your acceptance of the Terms of Use.