2016 International Conference on Cyberworlds (CW) (2016)
Chongqing, China
Sept. 28, 2016 to Sept. 30, 2016
ISBN: 978-1-5090-2303-5
pp: 147-150
HITS (HyperLink-Induced Topic Search) is a classical link analysis algorithm for analyzing WSM (Web Structure Mining). The algorithm takes into consideration of the structural information of links but ignores the correlation between pages and topics. In some cases, the problem of "topic drift"-a deviation between search and topic-would appear. For this purpose, the current paper presents an improved algorithm, by taking into account both of the web content similarity and link analysis. Our experiment shows that the improved algorithm has enhanced the correlation of search results and limited the occurrence of topic drift to some degree.
Algorithm design and analysis, Web pages, Symmetric matrices, Computers, Crawlers, Computational efficiency, Correlation

