This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
An Entropy Weighting k-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data
August 2007 (vol. 19 no. 8)
pp. 1026-1041
This paper presents a new k-means type algorithm for clustering high-dimensional objects in subspaces. In high-dimensional data, clusters of objects often exist in subspaces rather than in the entire space. For example, in text clustering, clusters of documents of different topics are categorized by different subsets of terms or keywords. The keywords for one cluster may not occur in the documents of other clusters. This is a data sparsity problem faced in clustering high-dimensional data. In the new algorithm, we extend the k{\hbox{-}}{\rm{means}} clustering process to calculate a weight for each dimension in each cluster and use the weight values to identify the subsets of important dimensions that categorize different clusters. This is achieved by including the weight entropy in the objective function that is minimized in the k{\hbox{-}}{\rm{means}} clustering process. An additional step is added to the k{\hbox{-}}{\rm{means}} clustering process to automatically compute the weights of all dimensions in each cluster. The experiments on both synthetic and real data have shown that the new algorithm can generate better clustering results than other subspace clustering algorithms. The new algorithm is also scalable to large data sets.
Index Terms:
k{\hbox{-}}{\rm{means}} clustering, variable weighting, subspace clustering, text clustering, high-dimensional data.
Citation:
Liping Jing, Michael K. Ng, Joshua Zhexue Huang, "An Entropy Weighting k-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data," IEEE Transactions on Knowledge and Data Engineering, vol. 19, no. 8, pp. 1026-1041, Aug. 2007, doi:10.1109/TKDE.2007.1048
Usage of this product signifies your acceptance of the Terms of Use.