CSDL Home IEEE Transactions on Pattern Analysis & Machine Intelligence 2008 vol.30 Issue No.01 - January

Subscribe

Issue No.01 - January (2008 vol.30)

pp: 76-88

ABSTRACT

We introduce novel dissimilarity into a probabilistic clustering task to properly measure dissimilarity among multiple clusters when each cluster is characterized by a subpopulation in the mixture model. This measure of dissimilarity is called redundancy-based dissimilarity among probability distributions. From aspects of both source coding and a statistical hypothesis test, we shed light on several of the theoretical reasons for the redundancy-based dissimilarity among probability distributions being a reasonable measure of dissimilarity among clusters. We also elucidate a principle in common for the measures of redundancy-based dissimilarity and Ward’s method in terms of hierarchical clustering criteria. Moreover, we show several related theorems that are significant for clustering tasks. In the experiments, properties of the measure of redundancy-based dissimilarity are examined in comparison with several other measures.

INDEX TERMS

clustering, mixture model, dissimilarity measure, information theory, Ward’s method

CITATION

Kazunori Iwata, "A Redundancy-Based Measure of Dissimilarity among Probability Distributions for Hierarchical Clustering Criteria",

*IEEE Transactions on Pattern Analysis & Machine Intelligence*, vol.30, no. 1, pp. 76-88, January 2008, doi:10.1109/TPAMI.2007.1160REFERENCES

- [1] E. Gokcay and J.C. Principle, “Information Theoretic Clustering,”
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 2, pp. 158-171, Feb. 2002.- [2] U. Maulik and S. Bandyopadhyay, “Performance Evaluation of Some Clustering Algorithms and Validity Indices,”
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 12, pp. 1650-1654, Dec. 2002.- [3] M. Rosell, V. Kann, and J.-E. Litton, “Comparing Comparisons: Document Clustering Evaluation Using Two Manual Classifications,”
Proc. Int'l Conf. Natural Language Processing, pp. 207-216, Dec. 2004.- [4] A.R. Webb,
Statistical Pattern Recognition, second ed. John Wiley & Sons, 2002.- [5] R.O. Duda, P.E. Hart, and D.G. Stork,
Pattern Classification, second ed. John Wiley & Sons, 2001.- [6] R. Xu and D.C. Wunsch-II, “Survey of Clustering Algorithms,”
IEEE Trans. Neural Networks, vol. 16, no. 3, pp. 645-678, May 2005.- [7] T.W. Liao, “Clustering of Time Series Data—A Survey,”
Pattern Recognition, vol. 38, no. 11, pp. 1857-1874, Nov. 2005.- [8] D. Yeung and X. Wang, “Improving Performance of Similarity-Based Clustering by Feature Weight Learning,”
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 4, pp. 556-561, Apr. 2002.- [9] A.L. Fred and J.M. Leitão, “A New Cluster Isolation Criterion Based on Dissimilarity Increments,”
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 25, no. 8, pp. 944-958, Aug. 2003.- [10] M.-S. Yang and K.-L. Wu, “A Similarity-Based Robust Clustering Method,”
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, no. 4, pp. 434-448, Apr. 2004.- [11] M.E. Tipping, “Deriving Cluster Analytic Distance Functions from Gaussian Mixture Model,”
Proc. Ninth Int'l Conf. Artificial Neural Networks, vol. 2, IEE, pp. 815-820, Sept. 1999.- [12] M.S. Prieto and A.R. Allen, “A Similarity Metric for Edge Images,”
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 25, no. 10, pp. 1265-1273, Oct. 2003.- [13] J. Wei, “Markov Edit Distance,”
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, no. 3, pp. 311-321, Mar. 2004.- [14] A. Srivastava, S.H. Joshi, W. Mio, and X. Liu, “Statistical Shape Analysis: Clustering, Learning, and Testing,”
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 4, pp. 590-602, Apr. 2005.- [15] T. Kanungo, D.M. Mount, N.S. Netanyahu, C.D. Piatko, R. Silverman, and A.Y. Wu, “An Efficient K-Means Clustering Algorithm: Analysis and Implementation,”
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 7, pp. 881-892, July 2002.- [16] J.Z. Huang, M.K. Ng, H. Rong, and Z. Li, “Automated Variable Weighting in K-Means Type Clustering,”
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 5, pp. 657-668, May 2005.- [17] C.J. Veenman, M.J. Reinders, and E. Backer, “A Maximum Variance Cluster Algorithm,”
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 9, pp. 1273-1280, Sept. 2002.- [18] F. Österreicher, “On a Class of Perimeter-Type Distances of Probability Distributions,”
Cybernetics, vol. 32, no. 4, pp. 389-393, 1996.- [19] F. Topsøe, “Some Inequalities for Information Divergence and Related Measures of Discrimination,”
IEEE Trans. Information Theory, vol. 46, no. 4, pp. 1602-1609, July 2000.- [20] D.M. Endres and J.E. Schindelin, “A New Metric for Probability Distributions,”
IEEE Trans. Information Theory, vol. 49, no. 7, pp.1858-1860, July 2003.- [21] T.S. Han and K. Kobayashi,
Mathematics of Information and Coding, Translations of Math. Monographs, translated by J. Suzuki, vol.203, Am. Math. Soc., 2002.- [22] T.M. Cover and J.A. Thomas,
Elements of Information Theory, Wiley Series in Telecommunications, first ed. John Wiley & Sons, 1991.- [23] S.J. Roberts, C. Holmes, and D. Denison, “Minimum-Entropy Data Partitioning Using Reversible Jump Markov Chain Monte Carlo,”
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 8, pp. 909-914, Aug. 2001.- [24] I.N. Sanov, “On the Probability of Large Deviations of Random Variables,”
Selected Translations in Math. Statistics and Probability, vol. 1, pp. 213-244, 1961.- [25] A. Dembo and O. Zeitouni,
Large Deviations Techniques and Applications, Applications of Math., second ed., vol. 38. Springer, 1998.- [26] J.H. Ward, “Hierarchical Grouping to Optimize an Objective Function,”
J. Am. Statistical Assoc., vol. 58, no. 301, pp. 236-244, 1963.- [27] J.H. Ward and M.E. Hook, “Application of an Hierarchical Grouping Procedure to a Problem of Grouping Profiles,”
Educational Psychological Measurement, vol. 23, no. 1, pp. 69-82, 1963.- [28] M.R. Anderberg,
Cluster Analysis for Applications, Probability and Math. Statistics, vol. 19, Academic Press, 1973.- [29] P. Billingsley,
Probability and Measure, Wiley Series in Probability and Math. Statistics, third ed. John Wiley & Sons, 1995.- [30] I. Csiszár and J. Körner,
Information Theory: Coding Theorems for Discrete Memoryless Systems, first impression 1981, second impression 1986, third ed. Akadémiai Kiadó, 1997.- [31] J. Gärtner, “On Large Deviations from the Invariant Measure,”
Theory of Probability and Its Applications, vol. 22, no. 1, pp. 24-39, 1977.- [32] R.S. Ellis, “Large Deviations for a General Class of Random Vectors,”
The Annals of Probability, vol. 12, no. 5, pp. 1-12, 1984.- [33] T. Hastie, R. Tibshirani, and J. Friedman,
The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer Series in Statistics. Springer, 2001.- [34] C. Fraley, “Algorithms for Model-Based Gaussian Hierarchical Clustering,”
SIAM J. Scientific Computing, vol. 20, no. 1, pp. 270-281, Aug. 1998.- [35] M. Meilă and D. Heckerman, “An Experimental Comparison of Model-Based Clustering Methods,”
Machine Learning, vol. 42, no. 1 and 2, pp. 9-29, Jan. 2001.- [36] D.J. Newman, S. Hettich, C.L. Blake, and C.J. Merz,
UCI Repository of Machine Learning Databases, http://www.ics.uci.edu/~mlearnMLRepository.html , 1998.- [37] C.M. Bishop,
Neural Networks for Pattern Recognition. Oxford Univ. Press, 1995.- [38] L. Devroye, L. Györfi, and G. Lugosi,
A Probability Theory of Pattern Recognition, Applications of Math., vol. 31. Springer, 1996. |