This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Effectively Finding Relevant Web Pages from Linkage Information
July/August 2003 (vol. 15 no. 4)
pp. 940-951
Yanchun Zhang, IEEE Computer Society

Abstract—This paper presents two hyperlink analysis-based algorithms to find relevant pages for a given Web page (URL). The first algorithm comes from the extended cocitation analysis of the Web pages. It is intuitive and easy to implement. The second one takes advantage of linear algebra theories to reveal deeper relationships among the Web pages and to identify relevant pages more precisely and effectively. The experimental results show the feasibility and effectiveness of the algorithms. These algorithms could be used for various Web applications, such as enhancing Web search. The ideas and techniques in this work would be helpful to other Web-related researches.

[1] AltaVista search engine,http:/www.altavista.com/, 2003.
[2] K. Bharat, A. Broder, M. Henzinger, P. Kumar, and S. Venkatasubramanian, The Connectivity Server: Fast Access to Linkage Information on the Web Proc. Seventh Int'l World Wide Web Conf., pp. 469-477, 1998.
[3] K. Bharat and M. Henzinger, Improved Algorithms for Topic Distillation in a Hyperlinked Environment Proc. 21st Int'l ACM Conf. Research and Development in Information Retrieval, pp. 104-111, 1998.
[4] S. Brin and L. Page, The Anatomy of a Large-Scale Hypertextual Web Search Engine Proc. Seventh Int'l World Wide Web Conf., Apr. 1998.
[5] S. Brin and L. Page, The PageRank Citation Ranking: Bringing Order to the Web Jan. 1998. http://www-db.stanford.edu/~backrubpageranksub.ps.
[6] L.A. Carr, W. Hall, and S. Hitchcock, Link Services or Link Agents? Proc. Ninth ACM Conf. Hypertext and Hypermedia, pp. 113-122, 1998.
[7] S. Chakrabarti, B. Dom, D. Gibson, J. Kleinberg, P. Raghavan, and S. Rajagopalan, Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text Proc. Seventh Int'l World Wide Web Conf., pp. 65-74, 1998.
[8] C. Chen, Structuring and Visualising the WWW by Generalised Similarity Analysis Proc. Eighth ACM Conf. Hypertext, pp. 177-186, 1997.
[9] C. Chen and L. Carr, Trailblazing the Literature of Hypertext: Author Co-Citation Analysis (1989-1998) Proc. 10th ACM Conf. Hypertext and Hypermedia, pp. 51-60, 1999.
[10] B.N. Datta, Numerical Linear Algebra and Application. Brooks/Cole Publishing, 1995.
[11] J. Dean and M. Henzinger, Finding Related Pages in the World Wide Web Proc. Eight Int'l World Wide Web Conf., pp. 389-401, 1999.
[12] S. Deerwester, S.T. Dumais, G.W. Furnas, T.K. Landauer, and R. Harshman, Indexing by Latent Semantic Analysis J. Am. Soc. Information Science, vol. 41, no. 6, pp. 391-407, 1990.
[13] S.R. El-Beltagy, W. Hall, D. De Roure, and L. Carr, Linking in Context Proc. 12th ACM Conf. Hypertext and Hypermedia, pp. 151-160, 2001.
[14] E. Garfield, Citation Analysis as a Tool in Journal Evaluation Science, pp. 471-479, vol. 178, 1972.
[15] D. Gibson, J. Kleinberg, and P. Raghavan, Inferring Web Communities from Link Topology Proc. Ninth ACM Conf. Hypertext and Hypermedia, pp. 225-234, 1998.
[16] G.H. Golub and C.F. Van Loan, Matrix Computations, second ed. The Johns Hopkins Univ. Press, 1993.
[17] Google search engine,http:/www.google.com/, 2003,
[18] J. Hou and Y. Zhang, Constructing Good Quality Web Page Communities Proc. 13th Australasian Database Conf., pp. 65-74, Jan.-Feb., 2002.
[19] J. Hou, Y. Zhang, J. Cao, W. Lai, and D. Ross, Visual Support for Text Information Retrieval Based on Linear Algebra J. Applied Systems Studies, vol. 3, no. 2, 2002.
[20] J. Hou, Y. Zhang, J. Cao, and W. Lai, Visual Support for Text Information Retrieval Based on Matrix's Singular Value Decomposition Proc. First Int'l Conf. Web Information Systems Eng., vol. 1, (main program), pp. 333-340, June 2000.
[21] H. Kaindl, S. Kramer, and L.M. Afonso, Combining Structure Search and Content Search for the World-Wide Web Proc. Ninth ACM Conf. Hypertext and Hypermedia, pp. 217-224, 1998.
[22] J. Kleinberg, Authoritative Sources in a Hyperlinked Environment J. ACM, vol. 46, 1999.
[23] R. Larson, Bibliometrics of the World Wide Web: An Exploratory Analysis of the Intellectual Structure of Cyberspace Proc. Ann. Meeting Am. Soc. Information Sciences, 1996.
[24] S. Mukherjea, J.D. Foley, and S.E. Hudson, Interactive Clustering for Navigating in Hypermedia Systems Proc. 1994 ACM European Conf. Hypermedia Technology, pp. 136-145, 1994.
[25] S. Mukherjea and J.D. Foley, Visualizing the World-Wide Web with the Navigational View Builder Computer Networks and ISDN Systems, vol. 27, pp. 1075-1087, 1995.
[26] S. Mukherjea and Y. Hara, Focus+Context Views of World-Wide Web Nodes Proc. Eighth ACM Conf. Hypertext, pp. 187-196, 1997.
[27] C. Papadimitriou, P. Raghavan, H. Tamaki, and S. Vempala, Latent Semantic Indexing: A Probabilistic Analysis Proc. ACM Symp. Principles of Database Systems, 1997.
[28] J. Pitkow and P. Pirolli, Life, Death, and Lawfulness on the Electronic Frontier Proc. ACM SIGCHI Conf. Human Factors in Computing, pp. 383-390, Mar. 1997.
[29] L. Terveen and W. Hill, Finding and Visualizing Inter-site Clan Graphs Proc. ACM SIGCHI Conf. Human Factors in Computing: Making the Impossible Possible, pp. 448-455, Apr. 1998.
[30] R. Weiss, B. Vélez, M.A. Sheldon, C. Namprempre, P. Szilagyi, A. Duda, and D.K. Gifford, HyPursuit: A Hierarchical Network Search Engine that Exploits Content-Link Hypertext Clustering Proc. Seventh ACM Conf. Hypertext, pp. 180-193, 1996.

Index Terms:
World Wide Web, Web search, information retrieval, hyperlink analysis, singular value decomposition (SVD).
Citation:
Jingyu Hou, Yanchun Zhang, "Effectively Finding Relevant Web Pages from Linkage Information," IEEE Transactions on Knowledge and Data Engineering, vol. 15, no. 4, pp. 940-951, July-Aug. 2003, doi:10.1109/TKDE.2003.1209010
Usage of this product signifies your acceptance of the Terms of Use.