Subscribe
Issue No.04 - April (2013 vol.25)
pp: 932-944
Xiaojun Chen , Shenzhen Grad. Sch., Harbin Inst. of Technol., Shenzhen, China
Xiaofei Xu , Dept. of Comput. Sci. & Eng., Harbin Inst. of Technol., Harbin, China
J. Z. Huang , Coll. of Comput. Sci. & Software, Shenzhen Univ., Shenzhen, China
Yunming Ye , Shenzhen Grad. Sch., Harbin Inst. of Technol., Shenzhen, China
ABSTRACT
This paper proposes TW-k-means, an automated two-level variable weighting clustering algorithm for multiview data, which can simultaneously compute weights for views and individual variables. In this algorithm, a view weight is assigned to each view to identify the compactness of the view and a variable weight is also assigned to each variable in the view to identify the importance of the variable. Both view weights and variable weights are used in the distance function to determine the clusters of objects. In the new algorithm, two additional steps are added to the iterative k-means clustering process to automatically compute the view weights and the variable weights. We used two real-life data sets to investigate the properties of two types of weights in TW-k-means and investigated the difference between the weights of TW-k-means and the weights of the individual variable weighting method. The experiments have revealed the convergence property of the view weights in TW-k-means. We compared TW-k-means with five clustering algorithms on three real-life data sets and the results have shown that the TW-k-means algorithm significantly outperformed the other five clustering algorithms in four evaluation indices.
INDEX TERMS
Clustering algorithms, Partitioning algorithms, Computational modeling, Clustering methods, Web pages, Data models, Algorithm design and analysis,variable weighting, Data mining, clustering, multiview learning, $(k)$-means
CITATION
Xiaojun Chen, Xiaofei Xu, J. Z. Huang, Yunming Ye, "TW-k-means: Automated two-level variable weighting clustering algorithm for multiview data", IEEE Transactions on Knowledge & Data Engineering, vol.25, no. 4, pp. 932-944, April 2013, doi:10.1109/TKDE.2011.262
REFERENCES
 [1] J. Mui and K. Fu, "Automated Classification of Nucleated Blood Cells Using a Binary Tree Classifier," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 2, no. 5, pp. 429-443, May 1980. [2] J. Wang, H. Zeng, Z. Chen, H. Lu, L. Tao, and W. Ma, "ReCoM: Reinforcement Clustering of multiType Interrelated Data Objects," Proc. 26th Ann. Int'l ACM SIGIR Conf. Research and Development in Informaion Retrieval, pp. 274-281, 2003. [3] S. Bickel and T. Scheffer, "Multi-view Clustering," Proc. IEEE Fourth Int'l Conf. Data Mining, pp. 19-26, 2004. [4] K. Kailing, H. Kriegel, A. Pryakhin, and M. Schubert, "Clustering Multi-Represented Objects with Noise," Proc. Eighth Pacific-Asia Conf. Knowledge Discovery and Data Mining, H. Dai, R. Srikant, and C. Zhang, eds., vol. 3056, pp. 394-403, Springer Berlin/Heidelberg, 2004. [5] V.R. de Sa, "Spectral Clustering with Two Views," Proc. IEEE 22nd Int'l Workshop Learning with Multiple Views (ICML), pp. 20-27, 2005. [6] D. Zhou and C. Burges, "Spectral Clustering and Transductive Learning with Multiple Views," Proc. 24th Int'l Conf. Machine Learning, pp. 1159-1166, 2007. [7] M.B. Blaschko and C.H. Lampert, "Correlational Spectral Clustering," Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR '08), pp. 1-8, 2008. [8] K. Chaudhuri, S. Kakade, K. Livescu, and K. Sridharan, "Multiview Clustering via Canonical Correlation Analysis," Proc. 26th Ann. Int'l Conf. Machine Learning, pp. 129-136, 2009. [9] G. Tzortzis and C. Likas, "Multiple View Clustering Using a Weighted Combination of Exemplar-Based Mixture Models," IEEE Trans. Neural Networks, vol. 21, no. 12, pp. 1925-1938, Dec. 2010. [10] B. Long, P. Yu, and Z. Zhang, "A General Model for Multiple View Unsupervised Learning," Proc. Eighth SIAM Int'l Conf. Data Mining (SDM '08), 2008. [11] D. Greene and P. Cunningham, "A Matrix Factorization Approach for Integrating Multiple Data Views," Proc. European Conf. Machine Learning and Knowledge Discovery in Databases, pp. 423-438, 2009. [12] R. Gnanadesikan, J. Kettenring, and S. Tsao, "Weighting and Selection of Variables for Cluster Analysis," J. Classification, vol. 12, pp. 113-136, 1995. [13] G. De Soete, "Optimal Variable Weighting for Ultrametric and Additive Tree Clustering," Quality and Quantity, vol. 20, pp. 169-180, 1986. [14] G. De Soete, "OVWTRE: A Program for Optimal Variable Weighting for Ultrametric and Additive Tree Fitting," J. Classification, vol. 5, no. 1, pp. 101-104, 1988. [15] E. Fowlkes, R. Gnanadesikan, and J. Kettenring, "Variable Selection in Clustering," J. Classification, vol. 5, pp. 205-228, 1988. [16] V. Makarenkov and B. Leclerc, "An Algorithm for the Fitting of a Tree Metric According to a Weighted Least-Squares Criterion," J. Classification, vol. 16, pp. 3-26, 1999. [17] V. Makarenkov and P. Legendre, "Optimal Variable Weighting for Ultrametric and Additive Trees and k-Means Partitioning: Methods and Software," J. Classification, vol. 18, no. 2, pp. 245-271, 2001. [18] D. Modha and W. Spangler, "Feature Weighting in k-Means Clustering," Machine Learning, vol. 52, no. 3, pp. 217-237, 2003. [19] J. Friedman and J. Meulman, "Clustering Objects on Subsets of Attributes," J. Royal Statistical Soc.. Series B (Statistical Methodology), vol. 66, no. 4, pp. 815-849, 2004. [20] Z. Huang, M. Ng, H. Rong, and Z. Li, "Automated Variable Weighting in k-Means Type Clustering," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 5, pp. 657-668, May 2005. [21] C. Domeniconi, D. Gunopulos, S. Ma, B. Yan, M. Al-Razgan, and D. Papadopoulos, "Locally Adaptive Metrics for Clustering High Dimensional Data," Data Mining and Knowledge Discovery, vol. 14, no. 1, pp. 63-97, 2007. [22] L. Jing, M. Ng, and Z. Huang, "An Entropy Weighting $k$ -Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data," IEEE Trans. Knowledge and Data Eng., vol. 19, no. 8, pp. 1026-1041, 2007. [23] P. Hoff, "Model-Based Subspace Clustering," Bayesian Analysis, vol. 1, no. 2, pp. 321-344, 2006. [24] C. Bouveyron, S. Girard, and C. Schmid, "High Dimensional Data Clustering," Computational Statistics and Data Analysis, vol. 52, no. 1, pp. 502-519, 2007. [25] C.-Y. Tsai and C.-C. Chiu, "Developing a Feature Weight Self-Adjustment Mechanism for a k-Means Clustering Algorithm," Computational Statistics and Data Analysis, vol. 52, no. 10, pp. 4658-4672, 2008. [26] Z. Deng, K. Choi, F. Chung, and S. Wang, "Enhanced Soft Subspace Clustering Integrating Within-Cluster and Between-Cluster Information," Pattern Recognition, vol. 43, no. 3, pp. 767-781, 2010. [27] H. Cheng, K.A. Hua, and K. Vu, "Constrained Locally Weighted Clustering," Proc. VLDB Endowment, vol. 1, pp. 90-101, Aug. 2008. [28] W. DeSarbo, J. Carroll, L. Clark, and P. Green, "Synthesized Clustering: A Method for Amalgamating Clustering Bases with Differential Weighting Variables," Psychometrika, vol. 49, no. 1, pp. 57-78, 1984. [29] P. Green, J. Kim, and F. Carmone, "A Preliminary Study of Optimal Variable Weighting in K-Means Clustering," J. Classification, vol. 7, no. 2, pp. 271-285, 1990. [30] D. Lashkari and P. Golland, "Convex Clustering with Exemplar-Based Models," Advances in Neural Information Processing Systems, vol. 20, pp. 825-832, 2008. [31] Z. Huang, "Extensions to the $k$ -Means Algorithms for Clustering Large Data Sets with Categorical Values," Data Ming and Knowledge Discovery, vol. 2, no. 3, pp. 283-304, 1998. [32] A. Frank and A. Asuncion, "UCI Machine Learning Repository," http://archive.ics.uci.eduml, 2010. [33] P. Spellman, G. Sherlock, M. Zhang, V. Iyer, K. Anders, M. Eisen, P. Brown, D. Botstein, and B. Futcher, "Comprehensive Identification of Cell Cycle-Regulated Genes of the Yeast Saccharomyces Cerevisiae by Microarray Hybridization," Molecular Biology of the Cell, vol. 9, no. 12, pp. 3273-3297, 1998. [34] N. Kushmerick, "Learning to Remove Internet Advertisements," Proc. Third Ann. Conf. Autonomous Agents, pp. 175-181, 1999. [35] Wikipedia, "Plagiarism—Wikipedia, the Free Encyclopedia," http://en.wikipedia.org/wikiInformation_retrieval , 2011.