This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Automated Variable Weighting in k-Means Type Clustering
May 2005 (vol. 27 no. 5)
pp. 657-668
This paper proposes a k\hbox{-}{\rm{means}} type clustering algorithm that can automatically calculate variable weights. A new step is introduced to the k\hbox{-}{\rm{means}} clustering process to iteratively update variable weights based on the current partition of data and a formula for weight calculation is proposed. The convergency theorem of the new clustering process is given. The variable weights produced by the algorithm measure the importance of variables in clustering and can be used in variable selection in data mining applications where large and complex real data are often involved. Experimental results on both synthetic and real data have shown that the new algorithm outperformed the standard k\hbox{-}{\rm{means}} type algorithms in recovering clusters in data.

[1] Z. Huang, “Extensions to the $k\hbox{-}{\rm{Means}}$ Algorithms for Clustering Large Data Sets with Categorical Values,” Data Ming and Knowledge Discovery, vol. 2, no. 3, pp. 283-304, 1998.
[2] J. MacQueen, “Some Methods for Classification and Analysis of Multivariate Observation,” Proc. Fifth Berkeley Symp. Math. Statistica and Probability, pp. 281-297, 1967.
[3] P.E. Green, F.J. Carmone, and J. Kim, “A Preliminary Study of Optimal Variable Weighting in $k\hbox{-}{\rm{Means}}$ Clustering,” J. Classification, vol. 7, pp. 271-285, 1990.
[4] W.S. Desarbo, J.D. Carroll, L.A. Clark, and P.E. Green, “Synthesized Clustering: A Method for Amalgamating Clustering Bases with Differential Weighting Variables,” Psychometrika, vol. 49, pp. 57-78, 1984.
[5] G. De Soete, “Optimal Variable Weighting for Ultrametric and Additive Tree Clustering,” Quality and Quantity, vol. 20, pp. 169-180, 1986.
[6] G. De Soete, “OVWTRE: A Program for Optimal Variable Weighting for Ultrametric and Additive Tree Fitting,” J. Classification, vol. 5, pp. 101-104, 1988.
[7] E. Fowlkes, R. Gnanadesikan, and J. Kettenring, “Variable Selection in Clustering,” J. Classification, vol. 5, pp. 205-228, 1988.
[8] G. Milligan, “A Validation Study of a Variable Weighting Algorithm for Cluster Analysis,” J. Classification, vol. 6, pp. 53-71, 1989.
[9] R. Gnanadesikan, J. Kettenring, and S. Tsao, “Weighting and Selection of Variables for Cluster Analysis,” J. Classification, vol. 12, pp. 113-136, 1995.
[10] V. Makarenkov and B. Leclerc, “An Algorithm for the Fitting of a Tree Metric According to a Weighted Least-Squares Criterion,” J. Classification, vol. 16, pp. 3-26, 1999.
[11] V. Makarenkov and P. Legendre, “Optimal Variable Weighting for Ultrametric and Additive Trees and K-Means Partitioning: Methods and Software,” J. Classification, vol. 18, pp. 245-271, 2001.
[12] J.H. Friedman and J.J. Meulman, “Clustering Objects on Subsets of Attributes,” J. Royal Statistical Soc. B., 2002.
[13] D.S. Modha and W.S. Spangler, “Feature Weighting in $k\hbox{-}{\rm{Means}}$ Clustering,” Machine Learning, vol. 52, pp. 217-237, 2003.
[14] R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan, “Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications,” Proc. ACM SIGMOD, pp. 94-105, June 1998.
[15] J. Bezdek, “A Convergence Theorem for the Fuzzy Isodata Clustering Algorithms,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 2, no. 1, pp. 1-8, 1980.
[16] Z. Huang and M. Ng, “A Fuzzy k-Modes Algorithm for Clustering Categorical Data,” IEEE Trans. Fuzzy Systems, vol. 7, no. 4, pp. 446-452, 1999.
[17] M. Anderberg, Cluster Analysis for Applications. Academic Press, 1973.
[18] S. Selim and M. Ismail, “K-Means-Type Algorithms: A Generalized Convergence Theorem and Characterization of Local Optimality,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 6, no. 1, pp. 81-87, 1984.
[19] G. Milligan and P. Isaac, “The Valiadation of Four Ultrametric Clustering Algorithms,” Pattern Recognition, vol. 12, pp. 41-50, 1980.
[20] A. Jain and R. Dubes, Algorithms for Clustering Data. Prentice Hall, 1988.
[21] G.S. Fishman, Monte Carlo: Concepts, Algorithms, and Applications, p. 19. Springer-Verlag, 1996.

Index Terms:
Clustering, data mining, mining methods and algorithms, feature evaluation and selection.
Citation:
Joshua Zhexue Huang, Michael K. Ng, Hongqiang Rong, Zichen Li, "Automated Variable Weighting in k-Means Type Clustering," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 5, pp. 657-668, May 2005, doi:10.1109/TPAMI.2005.95
Usage of this product signifies your acceptance of the Terms of Use.