The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.12 - December (2010 vol.22)
pp: 1649-1663
Gianluigi Greco , University of Calabria, Rende
Antonella Guzzo , University of Calabria, Rende
Luigi Pontieri , ICAR-CNR, Rende
ABSTRACT
The high-order coclustering problem, i.e., the problem of simultaneously clustering heterogeneous types of domain, has become an active research area in the last few years, due to the notable impact it has on several application scenarios. This problem is generally faced by optimizing a weighted combination of functions measuring the quality of coclustering over each pair of domains, where weights are chosen based on the supposed reliability/relevance of their correlation. However, little knowledge is likely to be available, in practice, in order to set these weights in a definite and precise manner. And, more importantly, it might even be conceptually unclear whether to prefer a weighing scheme over others, in those cases where functions encode contrasting goals so that improving the quality for a pair of domains leads to a deterioration for other pairs. The aim of this paper is precisely to shed light on the impact of weighting schemes on techniques based on linear combinations of pairwise objective functions, and to define an approach that overcomes the above problems by looking for an agreement—intuitively, a kind of compromise—among the various domains, thereby getting rid of the need to define an appropriate weighting scheme. Two algorithms performing coclustering on "star-structured” domains, based on linear combinations and agreements, respectively, have been designed within an information-theoretic framework. Results from a thorough experimentation, on both synthetic and real data, are discussed, in order to assess the effectiveness of the approaches and to get more insight into their actual behavior.
INDEX TERMS
Data mining, coclustering, contingency table analysis.
CITATION
Gianluigi Greco, Antonella Guzzo, Luigi Pontieri, "Coclustering Multiple Heterogeneous Domains: Linear Combinations and Agreements", IEEE Transactions on Knowledge & Data Engineering, vol.22, no. 12, pp. 1649-1663, December 2010, doi:10.1109/TKDE.2009.207
REFERENCES
[1] H. Zha, X. He, C. Ding, H. Simon, and M. Gu, "Bipartite Graph Partitioning and Data Clustering," Proc. Int'l Conf. Information and Knowledge Management (CIKM '01), pp. 25-32, 2001.
[2] I.S. Dhillon, "Co-Clustering Documents and Words Using Bipartite Spectral Graph Partitioning," Proc. Int'l Conf. Knowledge Discovery and Data Mining (KDD '01), pp. 269-274, 2001.
[3] S. Madeira and A. Oliveira, "Biclustering Algorithms for Biological Data Analysis: A Survey," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 1, no. 1, pp. 24-45, Jan.-Mar. 2004.
[4] Y. Cheng and G.M. Church, "Biclustering of Expression Data," Proc. Eighth Int'l Conf. Intelligent Systems for Molecular Biology (ISMB '00), pp. 93-103, 2000.
[5] J.A. Hartigan, "Direct Clustering of a Data Matrix," J. Am. Statistical Assoc., vol. 67, no. 337, pp. 123-129, 1972.
[6] I.S. Dhillon, S. Mallela, and D.S. Modha, "Information-Theoretic Co-Clustering," Proc. Int'l Conf. Knowledge Discovery and Data Mining (KDD '03), pp. 89-98, 2003.
[7] P. Berkhin and J.D. Becher, "Learning Simple Relations: Theory and Applications," Proc. SIAM Int'l Conf. Data Mining (SDM '02), pp. 420-436, 2002.
[8] A. Banerjee, I. Dhillon, J. Ghosh, S. Merugu, and D.S. Modha, "A Generalized Maximum Entropy Approach to Bregman Co-Clustering and Matrix Approximation," Proc. Int'l Conf. Knowledge Discovery and Data Mining (KDD '04), pp. 509-514, 2004.
[9] L. Zhao and M.J. Zaki, "Tricluster: An Effective Algorithm for Mining Coherent Clusters in 3D Microarray Data," Proc. ACM SIGMOD, pp. 694-705, 2005.
[10] B. Gao, T.-Y. Liu, X. Zheng, Q.-S. Cheng, and W.-Y. Ma, "Consistent Bipartite Graph Co-Partitioning for Star-Structured High-Order Heterogeneous Data Co-Clustering," Proc. Int'l Conf. Knowledge Discovery and Data Mining (KDD '05), pp. 41-50, 2005.
[11] R. Bekkerman, R. El-Yaniv, and A. McCallum, "Multi-Way Distributional Clustering via Pairwise Interactions," Proc. Int'l Conf. Machine Learning (ICML '05), pp. 41-48, 2005.
[12] A.D. Chiaravalloti, G. Greco, A. Guzzo, and L. Pontieri, "An Information-Theoretic Framework for High-Order Co-Clustering of Heterogeneous Objects," Proc. European Conf. Machine Learning (ECML '06), pp. 598-605, 2006.
[13] B. Long, Z.M. Zhang, X. Wú, and P.S. Yu, "Spectral Clustering for Multi-Type Relational Data," Proc. Int'l Conf. Machine Learning (ICML '06), pp. 585-592, 2006.
[14] B. Long, X. Wu, Z.M. Zhang, and P.S. Yu, "Unsupervised Learning on K-Partite Graphs," Proc. Int'l Conf. Knowledge Discovery and Data Mining (KDD '06), pp. 317-326, 2006.
[15] A. Banerjee, S. Basu, and S. Merugu, "Multi-Way Clustering on Relation Graphs," Proc. SIAM Int'l Conf. Data Mining (SDM '07), pp. 145-156, 2007.
[16] S. Kullback and R.A. Leibler, "On Information and Sufficiency," Annals of Math. Statistics, vol. 22, no. 1, pp. 76-86, 1951.
[17] H. Cho, I.S. Dhillon, Y. Guan, and S. Sra, "A Generalized Maximum Entropy Approach to Bregman Co-Clustering and Matrix Approximation," Proc. Int'l Conf. Knowledge Discovery and Data Mining (KDD '04), pp. 509-514, 2004.
[18] S. Sra, H. Cho, I.S. Dhillon, and Y. Guan, "Minimum Sum-Squared Residue Co-Clustering of Gene Expression Data," Proc. SIAM Int'l Conf. Data Mining (SDM '04), pp. 114-125, 2004.
[19] B. Gao, T.-Y. Liu, and W.-Y. Ma, "Star-Structured High-Order Heterogeneous Data Co-Clustering Based on Consistent Information Theory," Proc. Int'l Conf. Data Mining (ICDM '06), pp. 880-884, 2006.
[20] K. Lang, "News Weeder: Learning to Filter Netnews," Proc. Int'l Conf. Machine Learning (ICML '95), pp. 331-339, 2005.
[21] B. Long, Z.M. Zhang, and P.S. Yu, "A Probabilistic Framework for Relational Clustering," Proc. Int'l Conf. Knowledge Discovery and Data Mining (KDD '07), pp. 470-479, 2007.
[22] J. Shi and J. Malik, "Normalized Cuts and Image Segmentation," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 888-905, Aug. 2000.
[23] C.H.Q. Ding, X. He, H. Zha, M. Gu, and H.D. Simon, "A Min-Max Cut Algorithm for Graph Partitioning and Data Clustering," Proc. Int'l Conf. Data Mining (ICDM '01), pp. 107-114, 2001.
[24] N. Slonim and N. Tishby, "Document Clustering Using Word Clusters via the Information Bottleneck Method," Proc. SIGIR, pp. 208-215, 2000.
[25] K. Punera and J. Ghosh, "Consensus-Based Ensembles of Soft Clusterings," Applied Artificial Intelligence, vol. 22, nos. 7/8, pp. 780-810, 2008.
[26] P. Hore, L. Hall, and D. Goldgof, "A Cluster Ensemble Framework for Large Data Sets," Proc. Int'l Conf. Systems, Man, and Cybernetics (SMC '08), pp. 3342-3347, 2006.
[27] Z.-H. Zhou and W. Tang, "Clusterer Ensemble," Knowledge-Based Systems, vol. 19, no. 1, pp. 77-83, 2006.
[28] H.-S. Yoon, S.-Y. Ahn, S.-H. Lee, S.-B. Cho, and J.H. Kim, "Heterogeneous Clustering Ensemble Method for Combining Different Cluster Results," Proc. Data Mining for Biomedical Applications, PAKDD '06 Workshop, BioDM '06, pp. 82-92, 2006.
[29] T. Hu, L. Liu, C. Qu, and S.Y. Sung, "Joint Cluster Based Co-Clustering for Clustering Ensembles," Proc. Int'l Conf. Advanced Data Mining and Applications (ADMA '06), pp. 284-295, 2006.
[30] V. Singh, L. Mukherjee, J. Peng, and J. Xu, "Ensemble Clustering Using Semidefinite Programming," Proc. Conf. Neural Information Processing Systems (NIPS '07), 2007.
[31] I.S. Dhillon, J. Fan, and Y. Guan, "Efficient Clustering of Very Large Document Collections," Data Mining for Scientific and Engineering Applications, pp. 357-381, Kluwer Academic Publishers, 2001.
[32] I.S. Dhillon, S. Mallela, and R. Kumar, "A Divisive Information-Theoretic Feature Clustering Algorithm for Text Classification," J. Machine Learning Research, vol. 3, no. 4, pp. 1265-1287, 2003.
15 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool