The Community for Technology Leaders
RSS Icon
Issue No.01 - January (2008 vol.30)
pp: 160-173
Over the past few years, there has been a renewed interest in the consensus clustering problem. Several new methods have been proposed for finding a consensus partition for a set of n data objects that optimally summarizes an ensemble. In this paper, we propose new consensus clustering algorithms with linear computational complexity in n. We consider clusterings generated with random number of clusters, which we describe by categorical random variables. We introduce the idea of cumulative voting as a solution for the problem of cluster label alignment, where, unlike the common one-to-one voting scheme, a probabilistic mapping is computed. We seek a first summary of the ensemble that minimizes the average squared distance between the mapped partitions and the optimal representation of the ensemble, where the selection criterion of the reference clustering is defined based on maximizing the information content as measured by the entropy. We describe cumulative vote weighting schemes and corresponding algorithms to compute an empirical probability distribution summarizing the ensemble. Given the arbitrary number of clusters of the input partitions, we formulate the problem of extracting the optimal consensus as that of finding a compressed summary of the estimated distribution that preserves maximum relevant information. An efficient solution is obtained using an agglomerative algorithm that minimizes the average generalized Jensen-Shannon divergence within the cluster. The empirical study demonstrates significant gains in accuracy and superior performance compared to several recent consensus clustering algorithms.
Cluster Analysis, Consensus Clustering, Ensemble Methods, Voting
Hanan G. Ayad, "Cumulative Voting Consensus Method for Partitions with Variable Number of Clusters", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.30, no. 1, pp. 160-173, January 2008, doi:10.1109/TPAMI.2007.1138
[1] R. Duda, P. Hart, and D. Stork, Pattern Classification. John Wiley & Sons, 2001.
[2] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer, 2001.
[3] A.K. Jain and R.C. Dubes, Algorithms for Clustering Data. Prentice Hall, 1988.
[4] A. Jain, M. Murty, and P. Flynn, “Data Clustering: A Review,” ACM Computing Surveys, vol. 31, no. 3, pp. 264-323, Sept. 1999.
[5] E.D. Cristofor, “Information-Theoretic Methods in Clustering,” PhD dissertation, Univ. of Massachusetts, 2002.
[6] B.G. Mirkin, “On the Problem of Reconciling Partitions,” Quantitative Sociology: Int'l Perspectives on Math. and Statistical Modeling, H.M. Blalock, A. Aganbegian, F.M. Borodkin, R.Boudon, and V. Capecchi, eds., pp. 441-449, 1975.
[7] S. Régnier, “Etudes sur le polyèdre des Partitions,” Mathématiques et Sciences Humaines, vol. 82, pp. 85-111, 1983.
[8] P.A.L. Hubert, “Comparing Partitions,” J. Classification, vol. 2, pp.193-218, 1985.
[9] W.H.E. Day, “Foreword: Comparison and Consensus of Classifications,” J. Classification, vol. 3, pp. 183-185, 1986.
[10] J.-P. Barthélemy and B. Leclerc, “The Median Procedure for Partitions,” Partitioning Data Sets, vol. 19, pp. 3-33, 1995.
[11] W.H.E. Day and F.R. McMorris, Axiomatic Consensus Theory in Group Choice and Biomathematics, vol. 39, 2003.
[12] B. Leclerc, “Efficient and Binary Consensus Functions on Transitively Valued Relations,” Math. Social Sciences, vol. 8, pp.45-61, 1984.
[13] D.A. Neumann and V. Norton, “Clustering and Isolation in the Consensus Problem for Partitions,” J. Classification, vol. 3, no. 2, pp. 281-297, 1986.
[14] B. Monjardet, “Arrowian Characterization of Latticial Federation Consensus Functions,” Math. Social Sciences, vol. 20, pp. 51-71, 1990.
[15] S. Régnier, “Sur Quelques Aspect Mathématique des Problèmes de Classification Automatique,” Mathématiques et Sciences Humaines, vol. 82, pp. 13-29, 1983.
[16] J.P. Barthélemy and B. Monjardet, “The Median Procedure in Cluster Analysis and Social Choice Theory,” Math. Social Sciences, vol. 1, pp. 235-268, 1981.
[17] A. Strehl and J. Ghosh, “Cluster Ensembles—A Knowledge Reuse Framework for Combining Multiple Partitions,” J. Machine Learning Research, vol. 3, pp. 583-617, Dec. 2002.
[18] A. Fred, “Finding Consistent Clusters in Data Partitions,” Proc. Third Int'l Workshop Multiple Classifier Systems, J. Kittler and F.Roli, eds., pp. 309-318, 2001.
[19] A. Fred and A. Jain, “Data Clustering Using Evidence Accumulation,” Proc. 16th Int'l Conf. Pattern Recognition, vol. 4, pp. 276-280, Aug. 2002.
[20] A. Fred and A. Jain, “Evidence Accumulation Clustering Based on the K-Means Algorithm,” Structural, Syntactic, and Statistical Pattern Recognition, T. Caelli, A. Amin, R. Duin, M. Kamel, and D. de Ridder, eds., pp. 442-451, 2002.
[21] A. Fred and A. Jain, “Combining Multiple Clusterings Using Evidence Accumulation,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 6, pp. 835-850, June 2005.
[22] S. Dudoit and J. Fridlyand, “Bagging to Improve the Accuracy of a Clustering Procedure,” Bioinformatics, vol. 19, no. 9, pp. 1090-1099, 2003.
[23] B. Fischer and J. Buhmann, “Bagging for Path-Based Clustering,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 25, no. 11, pp. 1411-1415, Nov. 2003.
[24] H. Ayad and M. Kamel, “Finding Natural Clusters Using Multi-Clusterer Combiner Based on Shared Nearest Neighbors,” Proc. Fourth Int'l Workshop Multiple Classifier Systems, pp. 166-175, 2003.
[25] H. Ayad, O. Basir, and M. Kamel, “A Probabilistic Model Using Information Theoretic Measures for Cluster Ensembles,” Proc. Fifth Int'l Workshop Multiple Classifier Systems, pp. 144-153, 2004.
[26] H. Ayad and M. Kamel, “Cluster-Based Cumulative Ensembles,” Proc. Sixth Int'l Workshop Multiple Classifier Systems, pp. 236-245, 2005.
[27] L.I. Kuncheva and S. Hadjitodorov, “Using Diversity in Cluster Ensembles,” Proc. IEEE Int'l Conf. Systems, Man, and Cybernetics, pp. 1214-1219, 2004.
[28] A. Topchy, A. Jain, and W. Punch, “A Mixture Model of Clustering Ensembles,” Proc. SIAM Conf. Data Mining, pp. 379-390, Apr. 2004.
[29] A. Topchy, A. Jain, and W. Punch, “Combining Multiple Weak Clusterings,” Proc. IEEE Int'l Conf. Data Mining, pp. 331-338, Nov. 2003.
[30] B. Minaei, A. Topchy, and W. Punch, “Ensembles of Partitions via Data Resampling,” Proc. IEEE Int'l Conf. Information Technology: Coding and Computing, vol. 2, pp. 188-192, Apr. 2004.
[31] A. Topchy, A. Jain, and W. Punch, “Clustering Ensembles: Models of Consensus and Weak Partitions,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 12, pp. 1866-1881, Dec. 2005.
[32] J. Ghosh, “Multiclassifier Systems: Back to the Future,” Proc. Third Int'l Workshop Multiple Classifier Systems, F. Roli and J. Kittler, eds., pp. 1-15, June 2002.
[33] L. Breiman, “Bagging Predictors,” Machine Learning J., vol. 26, no. 2, pp. 123-140, 1996.
[34] Y. Freund and R.E. Schapire, “A Decision-Theoretic Generalization of On-Line and an Application to Boosting,” J. Computer and System Sciences, vol. 55, no. 1, pp. 119-139, 1995.
[35] L. Breiman, “Random Forests,” Machine Learning J., vol. 45, pp. 5-32, 2001.
[36] T. Ho, “The Random Subspace Method for Constructing Decision Forests,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, pp. 832-844, 1998.
[37] J. Friedman, T. Hastie, and R. Tibshirani, “Additive Logistic Regression: A Statistical View of Boosting,” The Annals of Statistics, vol. 28, pp. 337-407, 2000.
[38] L.I. Kuncheva, Combining Pattern Classifiers: Methods and Algorithms. Wiley-Interscience, 2004.
[39] S. Merugu and J. Ghosh, “Privacy-Preserving Distributed Clustering Using Generative Models,” Proc. IEEE Int'l Conf. Data Mining, pp. 211-218, Nov. 2003.
[40] N. Tishby, F. Pereira, and W. Bialek, “The Information Bottleneck Method,” Proc. 37th Ann. Allerton Conf. Comm., Control, and Computing, pp. 368-377, 1999.
[41] N. Slonim and N. Tishby, “Agglomerative Information Bottleneck,” Proc. Neural Information Processing Systems Conf., pp. 617-623, 1999.
[42] P. Arabie and S.A. Boorman, “Multidimensional Scaling of Measures of Distances between Partitions,” J. Math. Psychology, vol. 17, pp. 31-63, 1973.
[43] W. Day, “The Complexity of Computing Metric Distances between Partitions,” Math. Social Sciences, vol. 1, pp. 269-287, 1981.
[44] B. Mirkin, “Reinterpreting the Category Utility Function,” Machine Learning, vol. 45, no. 2, pp. 219-228, 2001.
[45] A. Topchy, M. Law, A. Jain, and A. Fred, “Analysis of Consensus Partition in Clustering Ensemble,” Proc. IEEE Int'l Conf. Data Mining, pp. 225-232, 2004.
[46] A.D. Gordon and M. Vichi, “Fuzzy Partition Models for Fitting a Set of Partitions,” Psychometrika, vol. 66, no. 2, pp. 229-248, 2001.
[47] E. Dimitriadou, A. Weingessel, and K. Hornik, “A Combination Scheme for Fuzzy Clustering,” Int'l J. Pattern Recognition and Artificial Intelligence, vol. 16, no. 7, pp. 901-912, 2002.
[48] T.M. Cover and J.A. Thomas, Elements of Information Theory. John Wiley & Sons, 1991.
[49] D. Gondek and T. Hofmann, “Non-Redundant Clustering with Conditional Ensembles,” Proc. ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, 2005.
[50] I.S. Dhillon, S. Mallela, and R. Kumar, “A Divisive Information-Theoretic Feature Clustering Algorithm for Text Classification,” J.Machine Learning Research, vol. 3, pp. 1265-1287, 2003.
[51] J. Lin, “Divergence Measures Based on the Shannon Entropy,” IEEE Trans. Information Theory, vol. 37, no. 1, pp. 145-151, 1995.
[52] W.H. Wolberg and O. Mangasarian, “Multisurface Method of Pattern Separation for Medical Diagnosis Applied to Breast Cytology,” Proc. Nat'l Academy of Sciences, vol. 87, pp. 9193-9196, Dec. 1990.
6 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool