Publication 2003 Issue No. 8 - August Abstract - A New Cluster Isolation Criterion Based on Dissimilarity Increments
 This Article Share Bibliographic References Add to: Digg Furl Spurl Blink Simpy Google Del.icio.us Y!MyWeb Search Similar Articles Articles by Ana L.N. Fred Articles by Jos? M.N. Leit?
A New Cluster Isolation Criterion Based on Dissimilarity Increments
August 2003 (vol. 25 no. 8)
pp. 944-958
 ASCII Text x Ana L.N. Fred, Jos? M.N. Leit?, "A New Cluster Isolation Criterion Based on Dissimilarity Increments," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, no. 8, pp. 944-958, August, 2003.
 BibTex x @article{ 10.1109/TPAMI.2003.1217600,author = {Ana L.N. Fred and Jos? M.N. Leit?},title = {A New Cluster Isolation Criterion Based on Dissimilarity Increments},journal ={IEEE Transactions on Pattern Analysis and Machine Intelligence},volume = {25},number = {8},issn = {0162-8828},year = {2003},pages = {944-958},doi = {http://doi.ieeecomputersociety.org/10.1109/TPAMI.2003.1217600},publisher = {IEEE Computer Society},address = {Los Alamitos, CA, USA},}
 RefWorks Procite/RefMan/Endnote x TY - JOURJO - IEEE Transactions on Pattern Analysis and Machine IntelligenceTI - A New Cluster Isolation Criterion Based on Dissimilarity IncrementsIS - 8SN - 0162-8828SP944EP958EPD - 944-958A1 - Ana L.N. Fred, A1 - Jos? M.N. Leit?, PY - 2003KW - ClusteringKW - hierarchical methodsKW - context-based clusteringKW - cluster isolation criteriaKW - dissimilarity incrementsKW - model-based clustering.VL - 25JA - IEEE Transactions on Pattern Analysis and Machine IntelligenceER -

Abstract—This paper addresses the problem of cluster defining criteria by proposing a model-based characterization of interpattern relationships. Taking a dissimilarity matrix between patterns as the basic measure for extracting group structure, dissimilarity increments between neighboring patterns within a cluster are analyzed. Empirical evidence suggests modeling the statistical distribution of these increments by an exponential density; we propose to use this statistical model, which characterizes context, to derive a new cluster isolation criterion. The integration of this criterion in a hierarchical agglomerative clustering framework produces a partitioning of the data, while exhibiting data interrelationships in terms of a dendrogram-type graph. The analysis of the criterion is undertaken through a set of examples, showing the versatility of the method in identifying clusters with arbitrary shape and size; the number of clusters is intrinsically found without requiring ad hoc specification of design parameters nor engaging in a computationally demanding optimization procedure.

[1] R. Duda, P. Hart, and D. Stork, Pattern Classification. New York: John Wiley&Sons, 2001.
[2] A.K. Jain and R.C. Dubes, Algorithms for Clustering Data. Englewood Cliffs, N.J.: Prentice Hall, 1988.
[3] G. McLachlan and K. Basford, Mixture Models: Inference and Application to Clustering. New York: Marcel Dekker, 1988.
[4] S. Roberts, D. Husmeier, I. Rezek, and W. Penny, Bayesian Approaches to Gaussian Mixture Modeling IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 11, Nov. 1998.
[5] M. Figueiredo, J. Leitão, and A.K. Jain, On Fitting Mixture Models Energy Minimization Methods in Computer Vision and Pattern Recognition, E. Hancock and M. Pellilo, eds., pp. 54-69, Springer-Verlag, 1999.
[6] J.D. Banfield and A.E. Raftery, Model-Based Gaussian and Non-Gaussian Clustering Biometrics, vol. 49, pp. 803-821, Sept. 1993.
[7] J. Buhmann and M. Held, Unsupervised Learning without Overfitting: Empirical Risk Approximation as an Induction Principle for Reliable Clustering Proc. Int'l Conf. Advances in Pattern Recognition, S. Singh, ed., pp. 167-176, 1999.
[8] B. Mirkin, Concept Learning and Feature Selection Based on Square-Error Clustering Machine Learning, vol. 35, pp. 25-39, 1999.
[9] L. Kaufman and P.J. Rosseeuw, Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley&Sons, Inc., 1990.
[10] H. Tenmoto, M. Kudo, and M. Shimbo, MDL-Based Selection of the Number of Components in Mixture Models for Pattern Recognition Advances in Pattern Recognition, A. Amin, D. Dori, P. Pudil, and H. Freeman, eds., vol. 1451, pp. 831-836, 1998.
[11] H. Bischof and A. Leonardis, Vector Quantization and Minimum Description Length Proc. Int'l Conf. Advances on Pattern Recognition, S. Singh, ed., pp. 355-364, 1999.
[12] N.R. Pal and J.C. Bezdek, On Cluster Validity for Fuzzy$c{\hbox{-}}\rm Means$Model IEEE Trans. Fuzzy Systems, vol. 1, pp. 370-379, 1995.
[13] P-Y. Yin, Algorithms for Straight Line Fitting Using k-Means Pattern Recognition Letters, vol. 19, pp. 31-41, 1998.
[14] D. Stanford and A.E. Raftery, Principal Curve Clustering with Noise technical report, Univ. of Washington, http://www.stat. washington.eduraftery, 1997.
[15] H. Frigui and R. Krishnapuram, “A Robust Competitive Clustering Algorithm with Applications in Computer Visions,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 21, no. 5, pp. 450- 465, May 1999.
[16] Y. Man and I. Gath, Detection and Separation of Ring-Shaped Clusters Using Fuzzy Clusters IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 16, no. 8, pp. 855-861, Aug. 1994.
[17] B. Fischer, T. Zoller, and J. Buhmann, Path Based Pairwise Data Clustering with Application to Texture Segmentation Energy Minimization Methods in Computer Vision and Pattern Recognition, M. Figueiredo, J. Zerubia, and A.K. Jain, eds., vol. 2134, pp. 235-266, 2001.
[18] E.J. Pauwels and G. Frederix, Finding Regions of Interest for Content-Extraction Proc. IS&T/SPIE Conf. Storage and Retrieval for Image and Video Databases VII, vol. 3656, pp. 501-510, Jan. 1999.
[19] K. Fukunaga, Introduction to Statistical Pattern Recognition, second edition. Academic Press, 1990.
[20] E. Gokcay and J.C. Principe, Information Theoretic Clustering IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 2, pp. 158-171, Feb. 2002.
[21] C. Zahn, Graph-Theoretical Methods for Detecting and Describing Gestalt Structures IEEE Trans. Computers, vol. 20, no. 1, pp. 68-86, Jan. 1971.
[22] Y. El-Sonbaty and M.A. Ismail, On-Line Hierarchical Clustering Pattern Recognition Letters, pp. 1285-1291, 1998.
[23] M. Chavent, A Monothetic Clustering Method Pattern Recognition Letters, vol. 19, pp. 989-996, 1998.
[24] A.L. Fred and J. Leitão, A Comparative Study of String Dissimilarity Measures in Structural clustering Proc. Int'l Conf. Advances in Pattern Recognition, S. Singh, ed., pp. 385-394, 1998.
[25] S. Guha, R. Rastogi, and K. Shim, CURE: An Efficient Clustering Algorithm for Large Databases Proc. ACM SIGMOD, pp. 73-84, June 1998.
[26] R. Dubes and A.K. Jain, Validity Studies in Clustering Methodologies Pattern Recognition, vol. 11, pp. 235-254, 1979.
[27] T.A. Bailey and R. Dubes, Cluster Validity Profiles Pattern Recognition, vol. 15, no. 2, pp. 61-83, 1982.
[28] E.W. Tyree and J.A. Long, The Use of Linke Line Segments for Cluster Representation and Data Reduction Pattern Recognition Letters, vol. 20, pp. 21-29, 1999.
[29] Y. Cheng, Mean Shift, Mode Seeking, and Clustering IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 17, no. 8, pp. 790-799, Aug. 1995.
[30] D. Comaniciu and P. Meer, Distribution Free Decomposition of Multivariate Data Pattern Analysis and Applications, vol. 2, pp. 22-30, 1999.
[31] G. Karypis, E-H. Han, and V. Kumar, "Chameleon: A Hierarchical Clustering Algorithm Using Dynamic Modeling," Computer, Aug. 1999, pp. 68-75.
[32] P. Bajcsy and N. Ahuja, Location- and Density-Based Hierarchical Clustering Using Similarity Analysis IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 9, pp. 1011-1015, Sept. 1998.
[33] A.L. Fred and J. Leitao, Clustering under a Hypothesis of Smooth Dissimilarity Increments Proc. 15th Int'l Conf. Pattern Recognition, vol. 2, pp. 190-194, 2000.
[34] C.J. Merz and P.M. Murphy, UCI Repository of Machine Learning Databases Dept. of Information and Computer Science, Univ. of California, Irvine,http://www.ics.uci.edu/mlearnMLRepository.html , 1996.
[35] E.S. Ristad and P.N. Yianilos, Learning String-Edit Distance IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 5, pp. 522-531, May 1998.
[36] D. Sankoff and J. Kruskal, Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison, reprint, with a forward by J. Nerbonne, Stanford, Calif.: CLSI Publications, (1983), 1999.
[37] B.J. Oomen and R.S.K. Loke, Pattern Recognition of Strings Containing Traditional and Generalized Transposition Errors Proc. Int'l Conf. Systems, Man, and Cybernetics, pp. 1154-1159, 1995.
[38] A. Marzal and E. Vidal, "Computation of Normalized Edit Distance and Applications," IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 15, pp. 926-932, 1993.
[39] A. Fred, Clustering Based on Dissimilarity First Derivatives Proc. Second Int'l Workshop Pattern Recognition in Information Systems, J. Iñesta and L. Micó, eds., pp. 257-266, 2002.
[40] R. Kothari and D. Pitts, On Finding the Number of Clusters Pattern Recognition Letters, vol. 20, pp. 405-416, 1999.
[41] S.V. Chakravarthy and J. Ghosh, Scale-Based Clustering Using the Radial Basis Function Network IEEE Trans. Neural Networks, vol. 7, pp. 1250-1261, 1996.

Index Terms:
Clustering, hierarchical methods, context-based clustering, cluster isolation criteria, dissimilarity increments, model-based clustering.
Citation:
Ana L.N. Fred, Jos? M.N. Leit?, "A New Cluster Isolation Criterion Based on Dissimilarity Increments," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, no. 8, pp. 944-958, Aug. 2003, doi:10.1109/TPAMI.2003.1217600