The Community for Technology Leaders
RSS Icon
Issue No.12 - Dec. (2012 vol.34)
pp: 2315-2326
C. Carpineto , Fondazione Ugo Bordoni, Rome, Italy
G. Romano , Fondazione Ugo Bordoni, Rome, Italy
We introduce a probabilistic version of the well-known Rand Index (RI) for measuring the similarity between two partitions, called Probabilistic Rand Index (PRI), in which agreements and disagreements at the object-pair level are weighted according to the probability of their occurring by chance. We then cast consensus clustering as an optimization problem of the PRI value between a target partition and a set of given partitions, experimenting with a simple and very efficient stochastic optimization algorithm. Remarkable performance gains over input partitions as well as over existing related methods are demonstrated through a range of applications, including a new use of consensus clustering to improve subtopic retrieval.
stochastic processes, information retrieval, optimisation, pattern clustering, probability, performance gain, consensus clustering, probabilistic Rand index, subtopic retrieval, similarity measurement, object-pair level agreement, object-pair level disagreement, occurrence probability, optimization problem, PRI value, stochastic optimization algorithm, Indexes, Clustering algorithms, Probabilistic logic, Partitioning algorithms, Search problems, Optimized production technology, Information retrieval, subtopic retrieval, Consensus clustering, Rand index, probabilistic Rand index, search results clustering
C. Carpineto, G. Romano, "Consensus Clustering Based on a New Probabilistic Rand Index with Application to Subtopic Retrieval", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.34, no. 12, pp. 2315-2326, Dec. 2012, doi:10.1109/TPAMI.2012.80
[1] A.K. Jain, "Data Clustering: 50 Years beyond K-Means," Pattern Recognition Letters, vol. 31, no. 8, pp. 651-666, 2010.
[2] A. Fred and A. Jain, "Data Clustering Using Evidence Accumulation," Proc. 16th Int'l Conf. Pattern Recognition, pp. 276-280, 2002.
[3] A. Strehl and J. Ghosh, "Cluster Ensembles—A Knowledge Reuse Framework for Combining Multiple Partitions," J. Machine Learning Research, vol. 3, pp. 583-617, 2002.
[4] A. Topchy, A.K. Jain, and W. Punch, "A Mixture Model for Clustering Ensembles," Proc. SIAM Int'l Conf. Data Mining, pp. 379-390, 2004.
[5] A. Goder and V. Filkov, "Consensus Clustering Algorithms: Comparison and Refinement," Proc. Ninth Workshop Algorithm Eng. and Experiments, pp. 109-117, 2008.
[6] X. Wang, C. Yang, and J. Zhou, "Clustering Aggregation by Probability Accumulation," Pattern Recognition, vol. 45, no. 2, pp. 668-675, 2009.
[7] S. Vega-Pons, J. Correa-Morris, and J. Ruiz-Shulcloper, "Weighted Partition Consensus via Kernels," Pattern Recognition, vol. 43, no. 8, pp. 2712-2724, 2010.
[8] A. Topchy, A.K. Jain, and W. Punc, "Clustering Ensembles: Models of Consensus and Weak Partitions," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 12, pp. 1866-1881, Dec. 2005.
[9] H. Wang, H. Shan, and A. Banerjee, "Bayesian Cluster Ensembles," Proc. SIAM Int'l Conf. Data Mining, pp. 209-220, 2009.
[10] C. Carpineto, S. Osiński, G. Romano, and D. Weiss, "A Survey of Web Clustering Engines," ACM Computing Surveys, vol. 41, no. 3, pp. 1-38, 2009.
[11] A. Topchy, A.K. Jain, and W. Punch, "Combining Multiple Weak Clusterings," Proc. IEEE Third Int'l Conf. Data Mining, pp. 331-338, 2003.
[12] R. Caruana, M. Elhawary, N. Nguyen, and C. Smith, "Meta Clustering," Proc. Sixth Int'l Conf. Data Mining, pp. 107-118, 2006.
[13] W.M. Rand, "Objective Criteria for the Evaluation of Clustering Methods," J. Am. Statistical Assoc., vol. 66, pp. 846-850, 1971.
[14] L. Hubert and P. Arabie, "Comparing Partitions," J. Classification, vol. 2, no. 1, pp. 193-218, 1985.
[15] N.X. Vinh, J. Epps, and J. Bailey, "Information Theoretic Measures for Clustering Comparison: Is a Correction for Chance Necessary?" Proc. 26th Ann. Int'l Conf. Machine Learning, pp. 1073-1080, 2009.
[16] R.J.G.B. Campello, "A Fuzzy Extension of the Rand Index and Other Related Indexes for Clustering and Classification Assessment," Pattern Recognition Letters, vol. 28, no. 7, pp. 833-841, 2007.
[17] A. Ben-Hur, A. Elisseeff, and I. Guyon, "A Stability Based Method for Discovering Structure in Clustered Data," Proc. Pacific Symp. Biocomputing, pp. 6-17, 2002.
[18] J. Barthélemy and B. Leclerc, "The Median Procedure for Partitions," Partitioning Data Sets, I.J. Cox, P. Hansen, and B. Julesz, eds., pp. 3-34, Am. Math. Soc., 1995.
[19] Y. Wakabayashi, "The Complexity of Computing Medians of Relations," Resenhas, vol. 3, no. 3, pp. 323-349, 1998.
[20] V. Filkov and S. Skiena, "Integrating Microarray Data by Consensus Clustering," Proc. IEEE 15th Int'l Conf. Tools with Artificial Intelligence, pp. 418-426, 2003.
[21] A. Gionis, H. Mannila, and P. Tsaparas, "Clustering Aggregation," ACM Trans. Knowledge Discovery from Data, vol. 1, no. 4, 2007.
[22] T. Li and C. Ding, "Weighted Consensus Clustering," Proc. SIAM Int'l Conf. Data Mining, pp. 789-8092, 2008.
[23] R. Unnikrishnan and M. Hebert, "Measures of Similarity," Proc. IEEE Seventh Workshop Applications of Computer Vision, pp. 394-400, 2005.
[24] M. Mela, "Comparing Clusterings: An Axiomatic View," Proc. 22nd Int'l Conf. Machine Learning, pp. 577-584, 2005.
[25] P. Wang, C. Domeniconi, and K.B. Laskey, "Nonparametric Bayesian Clustering Ensembles," Proc. European Conf. Machine Learning and Knowledge Discovery in Databases, pp. 435-450, 2010.
[26] R. Bekkerman, M. Scholz, and K. Viswanatah, "Improving Clustering Stability with Combinatorial MRFs," Proc. 15th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 99-108, 2009.
[27] X.Z. Fern and W. Lin, "Cluster Ensemble Selection," Statistical Analysis and Data Mining, vol. 1, no. 3, pp. 128-141, 2008.
[28] J. Azimi and X. Fern, "Adaptive Cluster Ensemble Selection," Proc. 21st Int'l Jont Conf. Artifical Intelligence, pp. 992-997, 2009.
[29] N. Bansal, A. Blum, and S. Chawla, "Correlation Clustering," Proc. IEEE 43rd Ann. Symp. Foundations of Computer Science, pp. 238-250, 2002.
[30] P. Bonizzoni, G.D. Vedova, R. Dondi, and T. Jiang, "On the Approximation of Correlation Clustering and Consensus Clustering," J. Computer and System Sciences, vol. 74, no. 5, pp. 671-696, 2008.
[31] C. Carpineto and G. Romano, "Optimal Meta Search Results Clustering," Proc. 33rd Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 170-177, 2010.
[32] G.L. Liu, Introduction to Combinatorial Mathematics. McGraw Hill, 1968.
[33] D. Cristofor and D. Simovici, "Finding Median Partitions Using Information-Theoretical-Based Genetic Algorithms," J. Universal Computer Science, vol. 8, no. 2, pp. 153-172, 2002.
[34] F. Leisch, "Bagged Clustering," Adaptive Information Systems and Modelling in Economics and Management Science, technical report Working Papers SFB WU Vienna Univ. of Economics and Business, vol. 51, 1999.
[35] E. Dimitriadou, A. Weingessel, and F. Hornik, "A Combination Scheme for Fuzzy Clustering," Int'l J. Pattern Recognition and Artificial Intelligence, vol. 16, no. 7, pp. 901-912, 2002.
[36] S. Osiński and D. Weiss, "A Concept-Driven Algorithm for Clustering Search Results," IEEE Intelligent Systems, vol. 20, no. 3, pp. 48-54, May 2005.
[37] A. Bernardini, C. Carpineto, and M. D'Amico, "Full-Subtopic Retrieval with Keyphrase-Based Search Results Clustering," Proc. IEEE/WIC/ACM Int'l Joint Conf. Web Intelligence and Intelligent Agent Technology, pp. 206-213, 2009.
[38] U. Manber and G. Myers, "Suffix Arrays: A New Method for On-Line String Searches," SIAM J. Computing, vol. 22, no. 5, pp. 935-948, 1993.
[39] S. Deerwester, S.T. Dumais, G.W. Furnas, T.K. Landauer, and T.K. Harshman, "Indexing by Latent Semanic Analysis," J. Am. Soc. for Information Science, vol. 41, no. 6, pp. 391-407, 1990.
[40] E. Ukkonen, "On-Line Construction of Suffix Trees," Algorithmica, vol. 14, no. 3, pp. 249-260, 1995.
[41] C. Carpineto, S. Mizzaro, G. Romano, and M. Snidero, "Mobile Information Retrieval with Search Results Clustering: Prototypes and Evaluations," J. Am. Soc. for Information Science and Technology, vol. 60, no. 5, pp. 877-895, 2009.
[42] K. van Rijsbergen, Information Retrieval. Butterworth-Heinemann, 1979.
[43] C.D. Manning, P. Raghavan, and H. Schütze, Introduction to Information Retrieval. Cambridge Univ. Press, 2008.
16 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool