Publication 2005 Issue No. 5 - May Abstract - Automated Variable Weighting in k-Means Type Clustering
 This Article Share Bibliographic References Add to: Digg Furl Spurl Blink Simpy Google Del.icio.us Y!MyWeb Search Similar Articles Articles by Joshua Zhexue Huang Articles by Michael K. Ng Articles by Hongqiang Rong Articles by Zichen Li
Automated Variable Weighting in k-Means Type Clustering
May 2005 (vol. 27 no. 5)
pp. 657-668
 ASCII Text x Joshua Zhexue Huang, Michael K. Ng, Hongqiang Rong, Zichen Li, "Automated Variable Weighting in k-Means Type Clustering," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 5, pp. 657-668, May, 2005.
 BibTex x @article{ 10.1109/TPAMI.2005.95,author = {Joshua Zhexue Huang and Michael K. Ng and Hongqiang Rong and Zichen Li},title = {Automated Variable Weighting in k-Means Type Clustering},journal ={IEEE Transactions on Pattern Analysis and Machine Intelligence},volume = {27},number = {5},issn = {0162-8828},year = {2005},pages = {657-668},doi = {http://doi.ieeecomputersociety.org/10.1109/TPAMI.2005.95},publisher = {IEEE Computer Society},address = {Los Alamitos, CA, USA},}
 RefWorks Procite/RefMan/Endnote x TY - JOURJO - IEEE Transactions on Pattern Analysis and Machine IntelligenceTI - Automated Variable Weighting in k-Means Type ClusteringIS - 5SN - 0162-8828SP657EP668EPD - 657-668A1 - Joshua Zhexue Huang, A1 - Michael K. Ng, A1 - Hongqiang Rong, A1 - Zichen Li, PY - 2005KW - ClusteringKW - data miningKW - mining methods and algorithmsKW - feature evaluation and selection.VL - 27JA - IEEE Transactions on Pattern Analysis and Machine IntelligenceER -
This paper proposes a k\hbox{-}{\rm{means}} type clustering algorithm that can automatically calculate variable weights. A new step is introduced to the k\hbox{-}{\rm{means}} clustering process to iteratively update variable weights based on the current partition of data and a formula for weight calculation is proposed. The convergency theorem of the new clustering process is given. The variable weights produced by the algorithm measure the importance of variables in clustering and can be used in variable selection in data mining applications where large and complex real data are often involved. Experimental results on both synthetic and real data have shown that the new algorithm outperformed the standard k\hbox{-}{\rm{means}} type algorithms in recovering clusters in data.

[1] Z. Huang, “Extensions to the $k\hbox{-}{\rm{Means}}$ Algorithms for Clustering Large Data Sets with Categorical Values,” Data Ming and Knowledge Discovery, vol. 2, no. 3, pp. 283-304, 1998.
[2] J. MacQueen, “Some Methods for Classification and Analysis of Multivariate Observation,” Proc. Fifth Berkeley Symp. Math. Statistica and Probability, pp. 281-297, 1967.
[3] P.E. Green, F.J. Carmone, and J. Kim, “A Preliminary Study of Optimal Variable Weighting in $k\hbox{-}{\rm{Means}}$ Clustering,” J. Classification, vol. 7, pp. 271-285, 1990.
[4] W.S. Desarbo, J.D. Carroll, L.A. Clark, and P.E. Green, “Synthesized Clustering: A Method for Amalgamating Clustering Bases with Differential Weighting Variables,” Psychometrika, vol. 49, pp. 57-78, 1984.
[5] G. De Soete, “Optimal Variable Weighting for Ultrametric and Additive Tree Clustering,” Quality and Quantity, vol. 20, pp. 169-180, 1986.
[6] G. De Soete, “OVWTRE: A Program for Optimal Variable Weighting for Ultrametric and Additive Tree Fitting,” J. Classification, vol. 5, pp. 101-104, 1988.
[7] E. Fowlkes, R. Gnanadesikan, and J. Kettenring, “Variable Selection in Clustering,” J. Classification, vol. 5, pp. 205-228, 1988.
[8] G. Milligan, “A Validation Study of a Variable Weighting Algorithm for Cluster Analysis,” J. Classification, vol. 6, pp. 53-71, 1989.
[9] R. Gnanadesikan, J. Kettenring, and S. Tsao, “Weighting and Selection of Variables for Cluster Analysis,” J. Classification, vol. 12, pp. 113-136, 1995.
[10] V. Makarenkov and B. Leclerc, “An Algorithm for the Fitting of a Tree Metric According to a Weighted Least-Squares Criterion,” J. Classification, vol. 16, pp. 3-26, 1999.
[11] V. Makarenkov and P. Legendre, “Optimal Variable Weighting for Ultrametric and Additive Trees and K-Means Partitioning: Methods and Software,” J. Classification, vol. 18, pp. 245-271, 2001.
[12] J.H. Friedman and J.J. Meulman, “Clustering Objects on Subsets of Attributes,” J. Royal Statistical Soc. B., 2002.
[13] D.S. Modha and W.S. Spangler, “Feature Weighting in $k\hbox{-}{\rm{Means}}$ Clustering,” Machine Learning, vol. 52, pp. 217-237, 2003.
[14] R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan, “Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications,” Proc. ACM SIGMOD, pp. 94-105, June 1998.
[15] J. Bezdek, “A Convergence Theorem for the Fuzzy Isodata Clustering Algorithms,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 2, no. 1, pp. 1-8, 1980.
[16] Z. Huang and M. Ng, “A Fuzzy k-Modes Algorithm for Clustering Categorical Data,” IEEE Trans. Fuzzy Systems, vol. 7, no. 4, pp. 446-452, 1999.
[17] M. Anderberg, Cluster Analysis for Applications. Academic Press, 1973.
[18] S. Selim and M. Ismail, “K-Means-Type Algorithms: A Generalized Convergence Theorem and Characterization of Local Optimality,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 6, no. 1, pp. 81-87, 1984.
[19] G. Milligan and P. Isaac, “The Valiadation of Four Ultrametric Clustering Algorithms,” Pattern Recognition, vol. 12, pp. 41-50, 1980.
[20] A. Jain and R. Dubes, Algorithms for Clustering Data. Prentice Hall, 1988.
[21] G.S. Fishman, Monte Carlo: Concepts, Algorithms, and Applications, p. 19. Springer-Verlag, 1996.

Index Terms:
Clustering, data mining, mining methods and algorithms, feature evaluation and selection.
Citation:
Joshua Zhexue Huang, Michael K. Ng, Hongqiang Rong, Zichen Li, "Automated Variable Weighting in k-Means Type Clustering," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 5, pp. 657-668, May 2005, doi:10.1109/TPAMI.2005.95