First IEEE International Conference on Data Mining (ICDM'01)
Automatic Topic Identification Using Webpage Clustering
San Jose, California
November 29-December 02
ISBN: 0-7695-1119-8
Grouping webpage into distinct topics is one way to organize the large amount of retrieved information on the web. In this paper, we report that based on similarity metric which incorporates textual information, hyperlink structure and co-citation relations, an unsupervised clustering method can automatically and effectively identify relevant topics, a shown in experiments on several retrieved sets of webpages. The clustering method is a state-of-art spectral graph partitioning method based on normalized cut criterion first developed for image segmentation.
Citation:
Xiaofeng He, Chris H.Q. Ding, Honguan Zha, Horst D. Simon, "Automatic Topic Identification Using Webpage Clustering," icdm, pp.195, First IEEE International Conference on Data Mining (ICDM'01), 2001