The Community for Technology Leaders
RSS Icon
Issue No.06 - June (2012 vol.24)
pp: 1025-1035
Lijun Zhang , Zhejiang Provincial Key Lab. of Service Robot, Zhejiang Univ., Hangzhou, China
Different from traditional one-sided clustering techniques, coclustering makes use of the duality between samples and features to partition them simultaneously. Most of the existing co-clustering algorithms focus on modeling the relationship between samples and features, whereas the intersample and interfeature relationships are ignored. In this paper, we propose a novel coclustering algorithm named Locally Discriminative Coclustering (LDCC) to explore the relationship between samples and features as well as the intersample and interfeature relationships. Specifically, the sample-feature relationship is modeled by a bipartite graph between samples and features. And we apply local linear regression to discovering the intrinsic discriminative structures of both sample space and feature space. For each local patch in the sample and feature spaces, a local linear function is estimated to predict the labels of the points in this patch. The intersample and interfeature relationships are thus captured by minimizing the fitting errors of all the local linear functions. In this way, LDCC groups strongly associated samples and features together, while respecting the local structures of both sample and feature spaces. Our experimental results on several benchmark data sets have demonstrated the effectiveness of the proposed method.
regression analysis, graph theory, pattern clustering, LDCC groups, locally discriminative coclustering, one-sided clustering techniques, interfeature relationships, intersample relationships, sample-feature relationship, bipartite graph, local linear regression, intrinsic discriminative structures, sample space, feature space, local patch, local linear function, fitting errors, Clustering algorithms, Bipartite graph, Matrix decomposition, Partitioning algorithms, Linear regression, Silicon, Mathematical model, local linear regression., Coclustering, clustering, bipartite graph
Lijun Zhang, "Locally Discriminative Coclustering", IEEE Transactions on Knowledge & Data Engineering, vol.24, no. 6, pp. 1025-1035, June 2012, doi:10.1109/TKDE.2011.71
[1] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning, Springer Series in Statistics. Springer, 2009.
[2] W. Xu, X. Liu, and Y. Gong, "Document Clustering Based on Non-Negative Matrix Factorization," Proc. 26th Ann. Int'l ACM SIGIR Conf. Research and Development in Informaion Retrieval, pp. 267-273, 2003.
[3] A.Y. Ng, M.I. Jordan, and Y. Weiss, "On Spectral Clustering: Analysis and an Algorithm," Proc. Advances in Neural Information Processing Systems, pp. 849-856, 2002.
[4] J. McQueen, "Some Methods of Classification and Analysis of Multivariate Observations," Proc. Fifth Berkeley Symp. Math. Statistics and Probability, pp. 281-297, 1967.
[5] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, "A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise," Proc. Second Int'l Conf. Knowledge Discovery and Data Mining, pp. 226-231, 1996.
[6] M. Rege, M. Dong, and F. Fotouhi, "Co-Clustering Documents and Words Using Bipartite Isoperimetric Graph Partitioning," Proc. Sixth Int'l Conf. Data Mining, pp. 532-541, 2006.
[7] I.S. Dhillon, "Co-Clustering Documents and Words Using Bipartite Spectral Graph Partitioning," Proc. Seventh ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 269-274, 2001.
[8] H. Zha, X. He, C. Ding, H. Simon, and M. Gu, "Bipartite Graph Partitioning and Data Clustering," Proc. 10th Int'l Conf. Information and Knowledge Management, pp. 25-32, 2001.
[9] Y. Cheng and G.M. Church, "Biclustering of Expression Data," Proc. Eighth Int'l Conf. Intelligent Systems for Molecular Biology, pp. 93-103, 2000.
[10] Y. Kluger, R. Basri, J. Chang, and M. Gerstein, "Spectral Biclustering of Microarray Data: Coclustering Genes and Conditions," Genome Research, vol. 13, no. 4, pp. 703-716, 2003.
[11] S.C. Madeira and A.L. Oliveira, "Biclustering Algorithms for Biological Data Analysis: A Survey," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 1, no. 1, pp. 24-45, Jan.-Mar. 2004.
[12] T. George and S. Merugu, "A Scalable Collaborative Filtering Framework Based on Co-Clustering," Proc. IEEE Fifth Int'l Conf. Data Mining, pp. 625-628, 2005.
[13] P. Symeonidis, A. Nanopoulos, A.N. Papadopoulos, and Y. Manolopoulos, "Nearest-Biclusters Collaborative Filtering Based on Constant and Coherent Values," Information Retrieval, vol. 11, no. 1, pp. 51-75, 2008.
[14] N. Slonim and N. Tishby, "Document Clustering Using Word Clusters via the Information Bottleneck Method," Proc. 23rd Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 208-215, 2000.
[15] R. El-Yaniv and O. Souroujon, "Iterative Double Clustering for Unsupervised and Semi-Supervised Learning," Proc. 12th European Conf. Machine Learning (ECML '01), pp. 121-132, 2001.
[16] I.S. Dhillon, S. Mallela, and D.S. Modha, "Information-Theoretic Co-Clustering," Proc. Ninth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 89-98, 2003.
[17] B. Long, Z.M. Zhang, and P.S. Yu, "Co-Clustering by Block Value Decomposition," Proc. 11th ACM SIGKDD Int'l Conf. Knowledge Discovery in Data Mining, pp. 635-640, 2005.
[18] C. Ding, T. Li, W. Peng, and H. Park, "Orthogonal Nonnegative Matrix T-Factorizations for Clustering," Proc. 12th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 126-135, 2006.
[19] J. Han, Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers Inc., 2005.
[20] J.A. Hartigan, "Direct Clustering of a Data Matrix," J. Am. Statistical Assoc., vol. 67, no. 337, pp. 123-129, 1972.
[21] A. Pothen, H.D. Simon, and K.-P. Liou, "Partitioning Sparse Matrices with Eigenvectors of Graphs," SIAM J. Matrix Analysis and Applications, vol. 11, no. 3, pp. 430-452, 1990.
[22] J. Shi and J. Malik, "Normalized Cuts and Image Segmentation," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 888-905, Aug. 2000.
[23] B. Gao, T.-Y. Liu, X. Zheng, Q.-S. Cheng, and W.-Y. Ma, "Consistent Bipartite Graph Co-Partitioning for Star-Structured High-Order Heterogeneous Data Co-Clustering," Proc. 11th ACM SIGKDD Int'l Conf. Knowledge Discovery in Data Mining, pp. 41-50, 2005.
[24] N. Tishby, F.C. Pereira, and W. Bialek, "The Information Bottleneck Method," Proc. 37th Ann. Allerton Conf. Comm., Control and Computing, pp. 368-377, 1999.
[25] A. Banerjee, I. Dhillon, J. Ghosh, S. Merugu, and D.S. Modha, "A Generalized Maximum Entropy Approach to Bregman Co-Clustering and Matrix Approximation," Proc. 10th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 509-514, 2004.
[26] S. Deerwester, S.T. Dumais, G.W. Furnas, T.K. L, and R. Harshman, "Indexing by Latent Semantic Analysis," J. Am. Soc. for Information Science, vol. 41, pp. 391-407, 1990.
[27] D.D. Lee and H.S. Seung, "Learning the Parts of Objects by Non-Negative Matrix Factorization," Nature, vol. 401, no. 6755, pp. 788-791, 1999.
[28] Q. Gu and J. Zhou, "Co-Clustering on Manifolds," Proc. 15th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 359-368, 2009.
[29] F. Pan, X. Zhang, and W. Wang, "CRD: Fast Co-Clustering on Large Datasets Utilizing Sampling-Based Matrix Decomposition," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 173-184, 2008.
[30] D. Zhou, O. Bousquet, T.N. Lal, J. Weston, and B. Schölkopf, "Learning with Local and Global Consistency," Advances in Neural Information Processing Systems 16, vol. 16, pp. 321-328, 2004.
[31] Y. Yang, D. Xu, F. Nie, J. Luo, and Y. Zhuang, "Ranking with Local Regression and Global Alignment for Cross Media Retrieval," Proc. 17th Ann. ACM Int'l Conf. Multimedia, pp. 175-184, 2009.
[32] S.T. Roweis and L.K. Saul, "Nonlinear Dimensionality Reduction by Locally Linear Embedding," Science, vol. 290, no. 5500, pp. 2323-2326, 2000.
[33] M. Wu and B. Schölkopf, "A Local Learning Approach for Clustering," Advances in Neural Information Processing Systems 19, vol. 19, pp. 1529-1536, 2007.
[34] F. Bach and Z. Harchaoui, "DIFFRAC: A Discriminative and Flexible Framework for Clustering," Advances in Neural Information Processing Systems 20, vol. 20, pp. 49-56, 2008.
[35] G. Strang, Introduction to Linear Algebra, third ed. Wellesley-Cambridge Press, 2003.
[36] M. Belkin and P. Niyogi, "Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering," Advances in Neural Information Processing Systems 14, vol. 14, pp. 585-591, 2002.
[37] G.H. Golub and C.F. Van Loan, Matrix Computations, third ed. Johns Hopkins Univ. Press, 1996.
[38] D. Cai, X. He, and J. Han, "Document Clustering Using Locality Preserving Indexing," IEEE Trans. Knowledge and Data Eng., vol. 17, no. 12, pp. 1624-1637, Dec. 2005.
[39] L. Lováz and M.D. Plummer, Matching Theory. North-Holland, 1986.
[40] T.R. Golub, D.K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J.P. Mesirov, H. Coller, M.L. Loh, J.R. Downing, M.A. Caligiuri, C.D. Bloomfield, and E.S. Lander, "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring," Science, vol. 286, no. 5439, pp. 531-537, 1999.
[41] J.-P. Brunet, P. Tamayo, T.R. Golub, and J.P. Mesirov, "Metagenes and Molecular Pattern Discovery Using Matrix Factorization," Proc. Nat'l Academy of Sciences USA, vol. 101, no. 12, pp. 4164-4169, 2004.
[42] S. Pomeroy et al., "Prediction of Central Nervous System Embryonal Tumour Outcome Based on Gene Expression," Nature, vol. 415, no. 6870, pp. 436-442, 2002.
36 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool