The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.05 - May (2012 vol.24)
pp: 813-822
Timothy C. Havens , Michigan State University, East Lansing
James C. Bezdek , University of Missouri, Columbia
ABSTRACT
The VAT algorithm is a visual method for determining the possible number of clusters in, or the cluster tendency of a set of objects. The improved VAT (iVAT) algorithm uses a graph-theoretic distance transform to improve the effectiveness of the VAT algorithm for “tough” cases where VAT fails to accurately show the cluster tendency. In this paper, we present an efficient formulation of the iVAT algorithm which reduces the computational complexity of the iVAT algorithm from O(N^3) to O(N^2). We also prove a direct relationship between the VAT image and the iVAT image produced by our efficient formulation. We conclude with three examples displaying clustering tendencies in three of the Karypis data sets that illustrate the improvement offered by the iVAT transformation. We also provide a comparison of iVAT images to those produced by the Reverse Cuthill-Mckee (RCM) algorithm; our examples suggest that iVAT is superior to the RCM method of display.
INDEX TERMS
Clustering, cluster tendency, visualization, VAT.
CITATION
Timothy C. Havens, James C. Bezdek, "An Efficient Formulation of the Improved Visual Assessment of Cluster Tendency (iVAT) Algorithm", IEEE Transactions on Knowledge & Data Engineering, vol.24, no. 5, pp. 813-822, May 2012, doi:10.1109/TKDE.2011.33
REFERENCES
[1] S. Theodoridis and K. Koutroumbas, Pattern Recognition, fourth ed. Academic Press, 2009.
[2] R. Duda, P. Hart, and D. Stork, Pattern Classification, second ed. Wiley-Interscience, Oct. 2000.
[3] J. Hartigan, Clustering Algorithms. Wiley, 1975.
[4] R. Xu and D. WunschII, Clustering. IEEE Press, 2009.
[5] A. Jain, M. Murty, and P. Flynn, "Data Clustering: A Review," ACM Computing Surveys, vol. 31, no. 3, pp. 264-323, Sept. 1999.
[6] A. Jain and R. Dubes, Algorithms for Clustering Data. Prentice-Hall, 1988.
[7] D. Jiang, C. Tang, and A. Zhang, "Cluster Analysis for Gene Expression Data: A Survey," IEEE Trans. Knowledge Eng., vol. 16, no. 11, pp. 1370-1386, Nov. 2004.
[8] A. Jain, "Data Clustering: 50 Years Beyond K-Means," Machine Learning and Knowledge Discovery in Databases, W. Daelemans, B. Goethals, and K. Morik, eds., pp. 3-4, Springer, 2008.
[9] J. Bezdek and R. Hathaway, "VAT: A Tool for Visual Assessment of (Cluster) Tendency," Proc. Int'l Joint Conf. Neural Networks (IJCNN), pp. 2225-30, 2002.
[10] L. Wang, T. Nguyen, J. Bezdek, C. Leckie, and K. Ramamohanarao, "iVAT and aVAT: Enhanced Visual Analysis for Cluster Tendency Assessment," Proc. PAKDD, June 2010.
[11] W. Petrie, "Sequences in Prehistoric Remains," J. Anthropological Inst. of Great Britain and Ireland, vol. 29, pp. 295-301, 1899.
[12] J. Czekanowski, "Zur Differentialdiagnose der Neandertalgruppe," Korrespondenzblatt der Deutschen Gesellschaft für Anthropologie, Ethnologie und Urgeschichte, vol. 40, pp. 44-47, 1909.
[13] L. Wilkinson and M. Friendly, "The History of the Cluster Heat Map," The Am. Statistician, vol. 63, no. 2, pp. 179-184, 2009.
[14] R. Tryon, Cluster Analysis. Edwards Bros., 1939.
[15] R. Cattell, "A Note on Correlation Clusters and Cluster Search Methods," Psychometrika, vol. 9, pp. 169-184, 1944.
[16] P. Sneath, "A Computer Approach to Numerical Taxonomy," J. General Microbiology, vol. 17, pp. 201-226, 1957.
[17] G. Floodgate and P. Hayes, "The Adansonian Taxonomy of Some Yellow Pigmented Marine Bacteria," J. General Microbiology, vol. 30, pp. 237-244, 1963.
[18] R. Ling, "A Computer Generated Aid for Cluster Analysis," Comm. ACM, vol. 16, no. 6, pp. 355-361, 1973.
[19] D. Johnson and D. Wichern, Applied Multivariate Statistical Analysis, sixth ed. Prentice Hall, 2007.
[20] T. Tran-Luu, "Mathematical Concepts and Novel Heuristic Methods for Data Clustering and Visualization," PhD dissertation, Univ. of Maryland, 1996.
[21] M. Girolami, "Mercer Kernel-Based Clustering in Feature Space," IEEE Trans. Neural Networks, vol. 13, no. 3, pp. 780-784, May 2002.
[22] D. Zhang and S. Chen, "Clustering Incomplete Data Using Kernel-Based Fuzzy C-Means Algorithm," Neural Processing Letters, vol. 18, pp. 155-162, 2003.
[23] D. West, Introduction to Graph Theory, second ed. Prentice Hall, Inc., 2001.
[24] C. Mueller, B. Martin, and A. Lumsdaine, "A Comparison of Vertex Ordering Algorithms for Large Graph Visualization," Proc. Int'l Asia-Pacific Symp. Visualization, pp. 141-148, Feb. 2007.
[25] A. George and J. Liu, Computer Solution of Large Sparse Positive Definite Systems. Prentice-Hall, 1981.
[26] I. King, "An Automatic Reordering Scheme for Simultaneous Equations Derived From Network Analysis," Int'l J. Numerical Methods in Eng., vol. 2, pp. 523-533, 1970.
[27] S. Sloan, "An Algorithm for Profile and Wavefront Reduction of Sparse Matrices," Int'l J. Numerical Methods in Eng., vol. 23, pp. 239-251, 1986.
[28] R. Prim, "Shortest Connection Networks and Some Generalisations," Bell System Technology J., vol. 36, pp. 1389-1401, 1957.
[29] B. Fisher, T. Zoller, and J. Buhmann, "Path Based Pairwise Data Clustering with Application to Texture Segmentation," Proc. Third Int'l Workshop Energy Minimization Methods in Computer Vision and Pattern Recognition (EMMCVPR '01), vol. 2134, pp. 235-250, 2001.
[30] J. Tenenbaum, V. de Silva, and J. Langford, "A Global Geometric Framework for Nonlinear Dimensionality Reduction," Science, vol. 290, no. 5500, pp. 2319-2323, Dec. 2000.
[31] T. Cormen, C. Leiserson, R. Rivest, and C. Stein, Introduction to Algorithms, third ed. MIT Press, 2009.
[32] T. Havens, J. Bezdek, J. Keller, M. Popescu, and J. Huband, "Is VAT Really Single Linkage in Disguise?," Ann. Math. Artificial Intelligence, vol. 55, nos. 3-4, pp. 237-251, 2009.
[33] G. Karypis, E. Han, and V. Kumar, "CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling," Computer, vol. 32, no. 8, pp. 68-75, Aug. 1999.
[34] Using MATLAB. Natick: The Mathworks, Nov. 2000.
[35] T. Havens, J. Bezdek, and J. Keller, "A New Implementation of the Co-Vat Algorithm for Visual Assessment of Clusters in Rectangular Relational Data," Proc. 10th Int'l Conf. Artificial Intelligence and Soft Computing: Part I (ICAISC '10), pp. 363-371, Apr. 2010.
[36] J. Bezdek, R. Hathaway, and J. Huband, "Visual Assessment of Clustering Tendency for Rectangular Dissimilarity Matrices," IEEE Trans. Fuzzy Systems, vol. 15, no. 5, pp. 890-903, Oct. 2007.
[37] R. Hathaway, J. Bezdek, and J. Huband, "Scalable Visual Asseessment of Cluster Tendency for Large Data Sets," Pattern Recognition, vol. 39, no. 7, pp. 1315-1324, July 2006.
[38] J. Huband, J. Bezdek, and R. Hathaway, "bigVAT: Visual Assessment of Cluster Tendency for Large Data Sets," Pattern Recognition, vol. 38, no. 11, pp. 1875-1886, Nov. 2005.
[39] L. Park, J. Bezdek, and C. Leckie, "Visualization of Clusters in Very Large Rectangular Dissimilarity Data," Proc. Fourth Int'l Conf. Autonomous Robots and Agents (ICARA), pp. 251-256, Feb. 2009.
[40] T. Havens, J. Bezdek, J. Keller, and M. Popescu, "Clustering in Ordered Dissimilarity Data," Int'l J. Intelligent Systems, vol. 24, no. 5, pp. 504-528, May 2009.
[41] L. Wang, C. Leckie, K. Rao, and J. Bezdek, "Automatically Determining the Number of Clusters from Unlabeled Data Sets," IEEE Trans. Knowledge Eng., vol. 21, no. 3, pp. 335-350, Mar. 2009.
[42] I. Sledge, T. Havens, J. Huband, J. Bezdek, and J. Keller, "Finding the Number of Clusters in Ordered Dissimilarities," Soft Computing, vol. 13, no. 12, pp. 1125-1142, Oct. 2009.
16 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool