|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
| ASCII Text | x | ||
| Ujjwal Maulik, Sanghamitra Bandyopadhyay, "Performance Evaluation of Some Clustering Algorithms and Validity Indices," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 12, pp. 1650-1654, December, 2002. | |||
| BibTex | x | ||
| @article{ 10.1109/TPAMI.2002.1114856, author = {Ujjwal Maulik and Sanghamitra Bandyopadhyay}, title = {Performance Evaluation of Some Clustering Algorithms and Validity Indices}, journal ={IEEE Transactions on Pattern Analysis and Machine Intelligence}, volume = {24}, number = {12}, issn = {0162-8828}, year = {2002}, pages = {1650-1654}, doi = {http://doi.ieeecomputersociety.org/10.1109/TPAMI.2002.1114856}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - JOUR JO - IEEE Transactions on Pattern Analysis and Machine Intelligence TI - Performance Evaluation of Some Clustering Algorithms and Validity Indices IS - 12 SN - 0162-8828 SP1650 EP1654 EPD - 1650-1654 A1 - Ujjwal Maulik, A1 - Sanghamitra Bandyopadhyay, PY - 2002 KW - Unsupervised classification KW - Euclidean distance KW - K-Means algorithm KW - single linkage algorithm KW - validity index KW - simulated annealing. VL - 24 JA - IEEE Transactions on Pattern Analysis and Machine Intelligence ER - | |||
Abstract—In this article, we evaluate the performance of three clustering algorithms, hard K-Means, single linkage, and a simulated annealing (SA) based technique, in conjunction with four cluster validity indices, namely Davies-Bouldin index, Dunn's index, Calinski-Harabasz index, and a recently developed index \cal I. Based on a relation between the index \cal I and the Dunn's index, a lower bound of the value of the former is theoretically estimated in order to get unique hard K-partition when the data set has distinct substructures. The effectiveness of the different validity indices and clustering methods in automatically evolving the appropriate number of clusters is demonstrated experimentally for both artificial and real-life data sets with the number of clusters varying from two to ten. Once the appropriate number of clusters is determined, the SA-based clustering technique is used for proper partitioning of the data into the said number of clusters.
[1] J.T. Tou and R.C. Gonzalez, Pattern Recognition Principles. Reading: Addison-Wesley, 1974.
[2] A.K. Jain and R.C. Dubes, Algorithms for Clustering Data. Englewood Cliffs, N.J.: Prentice Hall, 1988.
[3] H. Frigui and R. Krishnapuram, “A Robust Competitive Clustering Algorithm with Applications in Computer Visions,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 21, no. 5, pp. 450- 465, May 1999.
[4] B.S. Everitt, Cluster Analysis, Halsted Press, third ed., 1993.
[5] U. Maulik and S. Bandyopadhyay, “Genetic Algorithm Based Clustering Technique,” Pattern Recognition, vol. 33, pp. 1455-1465, 2000.
[6] G.W. Milligan and C. Cooper, “An Examination of Procedures for Determining the Number of Clusters in a Data Set,” Psychometrika, vol. 50, no. 2, pp. 159-179, 1985.
[7] M. Meilă and D. Heckerman, “An Experimental Comparison of Several Clustering and Initialization Methods,” Proc. 14th Conf. Uncertainty in Artificial Intelligence, pp. 386-395, 1998.
[8] C. Fraley and A.E. Raftery, “How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis,” The Computer J., vol. 41, no. 8, pp. 578-588, 1998.
[9] L.O. Hall, I.B. Özyurt, and J.C. Bezdek, “Clustering with a Genetically Optimized Approach,” IEEE Trans. Evolutionary Computation, vol. 3, no. 2, pp. 103-112, July 1999.
[10] D.L. Davies and D.W. Bouldin, “A Cluster Separation Measure,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 1, pp. 224-227, 1979.
[11] J.C. Dunn, “A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters,” J. Cybernetics, vol. 3, pp. 32-57, 1973.
[12] R.B. Calinski and J. Harabasz, “A Dendrite Method for Cluster Analysis,” Comm. in Statistics, vol. 3, pp. 1-27, 1974.
[13] S. Kirkpatrik, C. Gelatt, and M. Vecchi, “Optimization by Simulated Annealing,” Science, vol. 220, pp. 671-680, 1983.
[14] S. Bandyopadhyay, U. Maulik, and M.K. Pakhira, “Clustering Using Simulated Annealing with Probabilistic Redistribution,” Int'l J. Pattern Recognition and Artificial Intelligence, vol. 15, no. 2, pp. 269-285, 2001.
[15] X.L. Xie and G. Beni, A Validity Measure for Fuzzy Clustering IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 13, pp. 841-847, 1991.
[16] J.C. Dunn, “Well Separated Clusters and Optimal Fuzzy Partitions,” J. Cybernetics, vol. 4, pp. 95-104, 1974.
[17] R.A. Johnson and D.W. Wichern,Applied multivariate statistical analysis, Prentice Hall, 1988.

