loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
International Conference on Computational Inteligence for Modelling Control and Automation and International Conference on Intelligent Agents Web Technologies and International Commerce (CIMCA'06)
Web Document Clustering with Multi-view Information Bottleneck
Sydney Australia
November 28-December 01
ISBN: 0-7695-2731-0
Yan Gao, Central South University University
Shiwen Gu, Central South University University
Liming Xia, Central South University University
Yaoping Fei, Central South University University
Clustering is an important way to organize the large amount of information on the Web. In this paper, we study how to incorporate many information of web document, such as content, anchor, url etc, to improve the performance of clustering. We propose a novel algorithm: multi-view information bottleneck (MVIB), to cluster web documents with multi-type features. In this algorithm, the compatible constraint maximizing the agreement between clustering hypotheses on different views is imposed on the individual views to cluster instances. Based on the compatible constraints, the set of clustering hypotheses revealing lots of information about correct one is obtained. The final hypothesis can be deduced from these hypotheses. We study the performance of MVIB in different views setting. Experiments on two real datasets indicate that MVIB with 3-view setting based on content, anchor text and url can improve the quality of clusters more effectively.
Citation:
Yan Gao, Shiwen Gu, Liming Xia, Yaoping Fei, "Web Document Clustering with Multi-view Information Bottleneck," cimca, pp.148, International Conference on Computational Inteligence for Modelling Control and Automation and International Conference on Intelligent Agents Web Technologies and International Commerce (CIMCA'06), 2006
Usage of this product signifies your acceptance of the Terms of Use.