This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
2003 IEEE/WIC International Conference on Web Intelligence (WI'03)
Incremental Document Clustering Using Cluster Similarity Histograms
Halifax, Canada
October 13-October 17
ISBN: 0-7695-1932-6
Khaled M. Hammouda, University of Waterloo
Mohamed S. Kamel, University of Waterloo
Clustering of large collections of text documents is a key process in providing a higher level of knowledge about the underlying inherent classification of the documents. Web documents, in particular, are of great interest since managing, accessing, searching, and browsing large repositories of web content requires efficient organization. Incremental clustering algorithms are always preferred to traditional clustering techniques, since they can be applied in a dynamic environment such as the Web. An incremental document clustering algorithm is introduced in this paper, which relies only on pair-wise document similarity information. Clusters are represented using a Cluster Similarity Histogram, a concise statistical representation of the distribution of similarities within each cluster, which provides a measure of cohesiveness. The measure guides the incremental clustering process. Complexity analysis and experimental results are discussed and show that the algorithm requires less computational time than standard methods while achieving a comparable or better clustering quality.
Citation:
Khaled M. Hammouda, Mohamed S. Kamel, "Incremental Document Clustering Using Cluster Similarity Histograms," wi, pp.597, 2003 IEEE/WIC International Conference on Web Intelligence (WI'03), 2003
Usage of this product signifies your acceptance of the Terms of Use.