Subscribe
Issue No.10 - October (2010 vol.22)
pp: 1401-1414
Liang Wang , The University of Melbourne, Melbourne
Xin Geng , Southeast University, Nanjing
James Bezdek , The University of Melbourne, Melbourne
Christopher Leckie , The University of Melbourne, Melbourne
Kotagiri Ramamohanarao , The University of Melbourne, Melbourne
ABSTRACT
Visual methods have been widely studied and used in data cluster analysis. Given a pairwise dissimilarity matrix {\schmi D} of a set of n objects, visual methods such as the VAT algorithm generally represent {\schmi D} as an n\times n image {\rm I}(\tilde{{\schmi D}}) where the objects are reordered to reveal hidden cluster structure as dark blocks along the diagonal of the image. A major limitation of such methods is their inability to highlight cluster structure when {\schmi D} contains highly complex clusters. This paper addresses this limitation by proposing a Spectral VAT algorithm, where {\schmi D} is mapped to {\schmi D}^{\prime } in a graph embedding space and then reordered to {{\tilde{\schmi D}^{\prime }}} using the VAT algorithm. A strategy for automatic determination of the number of clusters in {\rm I}({\tilde{{\schmi D}^{\prime }}}) is then proposed, as well as a visual method for cluster formation from {\rm I}({\tilde{{\schmi D}^{\prime }}}) based on the difference between diagonal blocks and off-diagonal blocks. A sampling-based extended scheme is also proposed to enable visual cluster analysis for large data sets. Extensive experimental results on several synthetic and real-world data sets validate our algorithms.
INDEX TERMS
Clustering, VAT, cluster tendency, spectral embedding, out-of-sample extension.
CITATION
Liang Wang, Xin Geng, James Bezdek, Christopher Leckie, Kotagiri Ramamohanarao, "Enhanced Visual Analysis for Cluster Tendency Assessment and Data Partitioning", IEEE Transactions on Knowledge & Data Engineering, vol.22, no. 10, pp. 1401-1414, October 2010, doi:10.1109/TKDE.2009.192
REFERENCES
 [1] M. Belkin and P. Niyogi, "Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering," Advances in Neural Information Processing Systems, MIT Press, 2002. [2] J.C. Bezdek and R.J. Hathaway, "VAT: A Tool for Visual Assessment of (Cluster) Tendency," Proc. Int'l Joint Conf. Neural Networks, pp. 2225-2230, 2002. [3] J.C. Bezdek, R.J. Hathaway, and J. Huband, "Visual Assessment of Clustering Tendency for Rectangular Dissimilarity Matrices," IEEE Trans. Fuzzy Systems, vol. 15, no. 5, pp. 890-903, Oct. 2007. [4] M. Breitenbach and G. Grudic, "Clustering through Ranking on Manifolds," Proc. Int'l Conf. Machine Learning, 2005. [5] D. Cai, X. He, and J. Han, "Document Clustering Using Locality Preserving Indexing," IEEE Trans. Knowledge and Data Eng., vol. 17, no. 12, pp. 1624-1637, Dec. 2005. [6] F. Chung, Spectral Graph Theory, vol. 92. Am. Math. Soc., 1997. [7] W.S. Cleveland, Visualizing Data. Hobart Press, 1993. [8] R. Hathaway, J.C. Bezdek, and J. Huband, "Scalable Visual Assessment of Cluster Tendency," Pattern Recognition, vol. 39, pp. 1315-1324, 2006. [9] X. Hu and L. Xu, "A Comparative Study of Several Cluster Number Selection Criteria," Intelligent Data Engineering and Automated Learning, pp. 195-202, Springer, 2003. [10] J. Huband, J.C. Bezdek, and R. Hathaway, "Bigvat: Visual Assessment of Cluster Tendency for Large Data Sets," Pattern Recognition, vol. 38, no. 11, pp. 1875-1886, 2005. [11] R. Ling, "A Computer Generated Aid for Cluster Analysis," Comm. ACM, vol. 16, pp. 355-361, 1973. [12] L. Lovasz and M. Plummer, Matching Theory. Elsevier Science publishers B.V. and Akadémiai Kiadó, Budarest, 1986. [13] U. Maulik and S. Bandyopadhyay, "Performance Evaluation of Some Clustering Algorithms and Validity Indices," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 12, pp. 1650-1654, Dec. 2002. [14] G.J. McLachlan, Discriminant Analysis and Statistical Pattern Recognition. John Wiley & Sons, 2005. [15] A. Ng, M. Jordan, and Y. Weiss, "On Spectral Clustering: Analysis and an Algorithm," Advances in Neural Information Processing Systems. MIT Press, 2002. [16] H.Z. Ning, W. Xu, Y. Chi, and T.S. Huang, "Incremental Spectral Clustering with Application to Monitoring of Evolving Blog Communities," Proc. SIAM Int'l Conf. Data Mining, 2007. [17] N. Otsu, "A Threshold Selection Method from Gray-Level Histograms," IEEE Trans. Systems, Man, and Cybernetics, vol. 9, no. 1, pp. 62-66, Jan. 1979. [18] N. Pal, J. Keller, M. Popescu, J. Bezdek, J. Mitchell, and J. Huband, "Gene Ontology-Based Knowledge Discovery through Fuzzy Cluster Analysis," J. Neural, Parallel and Scientific Computing, vol. 13, pp. 337-361, 2005. [19] P.J. Rousseeuw, "A Graphical Aid to the Interpretations and Validation of Cluster Analysis," J. Computational and Applied Math., vol. 20, pp. 53-65, 1987. [20] J. Shi and J. Malik, "Normalized Cuts and Image Segmentation," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 888-905, Aug. 2000. [21] T. Tran-Luu, "Mathematical Concepts and Novel Heuristic Methods for Data Clustering and Visualization," PhD thesis, Univ. of Maryland, 1996. [22] U. von Luxburg, "A Tutorial on Spectral Clustering," technical report, Max Planck Inst. for Biological Cybernetics, 2006. [23] Y. Weiss, "Segmentation Using Eigenvectors: A Unifying View," Proc. IEEE Int'l Conf. Computer Vision, pp. 975-982, 1999. [24] R. Xu and D. Wunsch,II, "Survey of Clustering Algorithms," IEEE Trans. Neural Networks, vol. 16, no. 3, pp. 645-678, May 2005. [25] W. Xu, X. Liu, and Y. Gong, "Document Clustering Based on Non-Negative Matrix Factorization," Proc. ACM SIGIR, 2003. [26] L. Zelnik-Manor and P. Perona, "Self-Tuning Spectral Clustering," Advances in Neural Information Processing Systems, MIT Press, 2004. [27] J.C. Dunn, "Indices of Partition Fuzziness and the Detection of Clusters in Large Sets," Fuzzy Automata and Decision Processes, Elsevier, 1976. [28] M. Pavan and M. Pelillo, "Efficient Out-of-Sample Extension of Dominant-Set Clusters," Advances in Neural Information Processing Systems, MIT Press, 2004. [29] C. Fowlkes, S. Belongie, F. Chung, and J. Malik, "Spectral Grouping Using the Nyström Method," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, no. 2, pp. 214-225, Jan. 2004. [30] Y. Bengio, J. Paiement, P. Vincent, O. Delallean, N. Roux, and M. Ouimet, "Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering," Advances in Neural Information Processing Systems, MIT Press, 2004. [31] A. Georghiades, P. Belhumeur, and D. Kriegman, "From Few to Many: Illumination Cone Models for Face Recognition under Variable Lighting and Pose," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 6, pp. 643-660, June 2001. [32] L. Wang, J. Bezdek, C. Leckie, and R. Kotagiri, "Selective Sampling for Approximate Clustering of Very Large Data Sets," Int'l J. Intelligent Systems, vol. 23, no. 3, pp. 313-331, 2008. [33] S. Guattery and G.L. Miller, "Graph Embeddings and Laplacian Eigenvalues," SIAM J. Matrix Analysis and Applications, vol. 21, no. 3, pp. 703-723, 2000. [34] I. Dhillon, D. Modha, and W. Spangler, "Visualizing Class Structure of Multidimensional Data," Proc. 30th Symp. Interface: Computing Science and Statistics, 1998. [35] R.B. Calinski and J. Harabasz, "A Dendrite Method for Cluster Analysis," Comm. Statistics, vol. 3, pp. 1-27, 1974. [36] J.W. Tukey, Exploratory Data Analysis. Addison-Wesley, 1977. [37] R. Tibshirani, G. Walther, and T. Hastie, "Estimating the Number of Clusters in a Data-Set via the Gap Statistics," J. Royal Statistical Soc. B, vol. 63, pp. 411-423, 2001. [38] I. Sledge, J. Huband, and J.C. Bezdek, "(Automatic) Cluster Count Extraction from Unlabeled Data-Sets," Proc. Joint Fourth Int'l Conf. Natural Computation (ICNC) and Fifth Int'l Conf. Fuzzy Systems and Knowledge Discovery (FSKD), 2008. [39] J.C. Bezdek and N.R. Pal, "Some New Indices of Cluster Validity," IEEE Trans. System, Man and Cybernetics, vol. 28, no. 3, pp. 301-315, June 1998. [40] Decomposition Methodology for Knowledge Discovery and Data Mining, O. Maimon and L. Rokach, eds., pp. 90-94. World Scientific, 2005. [41] C. Williams and M. Seeger, "Using the Nystr$\ddot{{\rm o}}$ m Method to Speed up Kernel Machines," Advances in Neural Information Processing Systems, pp. 682-688, MIT Press, 2000. [42] K. Zhang, I. Tsang, and J. Kwok, "Improved Nystr$\ddot{{\rm o}}$ m Low-Rank Approximation and Error Analysis," Proc. Int'l Conf. Machine Learning, 2008. [43] A. Talwalkar, S. Kumar, and H. Rowley, "Large-Scale Manifold Learning," Proc. Int'l Conf. Computer Vision and Pattern Recognition, 2008. [44] L. Wang, X. Geng, J. Bezdek, C. Leckie, and R. Kotagiri, "SpecVAT: Enhanced Visual Cluster Analysis," Proc. Int'l Conf. Data Mining, 2008. [45] L. Wang, C. Leckie, J. Bezdek, and R. Kotagiri, "Automatically Determining the Number of Clusters in Unlabeled Data Sets," IEEE Trans. Knowledge and Data Eng., vol. 21, no. 3, pp. 335-350, Mar. 2009. [46] K.H. Rosen, Discrete Mathematics and Its Applications. McGraw-Hill, 1999. [47] E. Falkenauer, Genetic Algorithms and Grouping Problems, John Wiley & Sons, 1997. [48] T. Havens, J. Bezdek, J. Keller, and M. Popescu, "Clustering in Ordered Dissimilarity Data," technical report, Univ. of Missouri, 2007. [49] B. Mirkin, Clustering for Data Mining: A Data Recovery Approach. Chapman & Hall/CRC, 2005. [50] L. Wang, C. Leckie, X. Wang, R. Kotagiri, and J. Bezdek, "Tensor Space Learning for Analyzing Activity Patterns from Video Sequences," Proc. IEEE Int'l Conf. Data Mining (ICDM) Workshop Knowledge Discovery and Data Mining from Multimedia Data and Multimedia Applications, pp. 63-68, 2007.