The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.01 - January (2008 vol.20)
pp: 83-98
ABSTRACT
Clustering is a popular technique for analyzing microarray datasets, with n genes and m experimental conditions. As explored by biologists, there is a real need to identify co-regulated gene clusters, which includes both positive/negative regulated gene clusters. The existing pattern-based and tendency-based clustering approaches cannot be directly applied to find such co-regulated gene clusters, because they are designed for finding positive regulated gene clusters. In this paper, in order to cluster co-regulated genes, we propose a coding scheme which allows us to cluster two genes into the same cluster if they have the same code, where two genes that have the same code can be either positive or negative regulated. Based on the coding scheme, we propose a new algorithm to find maximal subspace co-regulated gene clusters with new pruning techniques. A maximal subspace co-regulated gene cluster clusters a set of genes on a condition sequence such that the cluster is not included in any other subspace co-regulated gene clusters. We conduct extensive experimental studies. Our approach can effectively and efficiently find maximal subspace co-regulated gene clusters. In addition, our approach outperforms the existing approaches to finding positive regulated gene clusters.
INDEX TERMS
Data mining, Clustering, classification, and association rules
CITATION
Jeffrey Xu Yu, Guoren Wang, Yuhai Zhao, Bin Wang, Ge Yu, "Maximal Subspace Coregulated Gene Clustering", IEEE Transactions on Knowledge & Data Engineering, vol.20, no. 1, pp. 83-98, January 2008, doi:10.1109/TKDE.2007.190670
REFERENCES
[1] A. Ben-Dor, B. Chor, R.M. Karp, and Z. Yakhini, “Discovering Local Structure in Gene Expression Data: The Order-Preserving Submatrix Problem,” J. Computational Biology, vol. 10, no. 3/4, pp.373-384, 2003.
[2] B.J. Breitkreutz, C. Stark, and M. Tyers, Yeast Grid, http://biodata.mshri.on.ca/yeast_gridservlet /, 2006.
[3] Y. Cheng, and G.M. Church, “Biclustering of Expression Data,” Proc. Eighth Int'l Conf. Intelligent Systems for Molecular Biology (ISMB '00), pp. 93-103, 2000.
[4] K.R. Coser et al., “Global Analysis of Ligand Sensitivity of Estrogen Inducible and Suppressible Genes in MCF7/BUS Breast Cancer Cells by DNA Microarray,” Proc. Nat'l Academy of Sciences, 2003.
[5] S. Erdal et al., “A Time Series Analysis of Microarray Data,” Proc. Fourth IEEE Int'l Symp. Bioinformatics and Bioeng. (BIBE '04), pp.366-378, 2004.
[6] J. Ernst, G.J. Nau, and Z. Bar-Joseph, “Clustering Short Time Series Gene Expression Data,” Bioinformatics, vol. 21, pp. 159-168, 2005.
[7] G. Getz, E. Levine, and E. Domany, “Coupled Two-Way Clustering Analysis of Gene Microarray Data,” Proc. Natural Academy of Sciences, pp. 12079-12084, 2000.
[8] T.R. Golub et al., “Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring,” Science, vol. 286, no. 5439, pp. 531-537, 1999.
[9] L. Ji, K.W.-L. Mock, and K.-L. Tan, “Quick Hierarchical Biclustering on Microarray Gene Expression Data,” Proc. Sixth IEEE Int'l Symp. Bioinformatics and Bioeng. (BIBE '06), pp. 110-120, 2006.
[10] D. Jiang, J. Pei, M. Ramanathan, C. Tang, and A. Zhang, “Mining Coherent Gene Clusters from Gene-Sample-Time Microarray Data,” Proc. 10th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '04), pp. 430-439, 2004.
[11] K. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft, “When Is Nearest Neighbors Meaningful,” Proc. Seventh Int'l Conf. Database Theory (ICDT '99), pp. 217-235, 1999.
[12] L. Lazzeroni and A. Owen, “Plaid Models for Gene Expression Data,” http://wwwstat.stanford.edu/owenplaid, 2000.
[13] J. Liu and W. Wang, “Op-Cluster: Clustering by Tendency in High-Dimensional Space,” Proc. Third IEEE Int'l Conf. Data Mining (ICDM '03), pp. 187-194, 2003.
[14] J. Liu, J. Yang, and W. Wang, “Biclustering in Gene Expression Data by Tendency,” Proc. Third IEEE Int'l Computational Systems Bioinformatics Conf. (CSB '04), pp. 182-193, 2004.
[15] L. Parsons, E. Haque, and H. Liu, “Subspace Clustering for High-Dimensional Data: A Review,” SIGKDD Explorations, vol. 6, no. 1, pp. 90-105, 2004.
[16] J. Pei et al., “Prefixspan: Mining Sequential Patterns by Prefix-Projected Growth,” Proc. 17th Int'l Conf. Data Eng. (ICDE '01), pp.215-224, 2001.
[17] J. Pei et al., “Maple: A Fast Algorithm for Maximal Pattern-Based Clustering,” Proc. Third IEEE Int'l Conf. Data Mining (ICDM '03), pp. 259-266, 2003.
[18] E. Segal et al., “Rich Probabilistic Models for Gene Expression,” Proc. Ninth Int'l Conf. Intelligent Systems for Molecular Biology (ISMB '01), 2001.
[19] P. Spellman et al., “Comprehensive Identification of Cell Cycle-Regulated Genes of the Yeast Sacc-Charomyces Cerevisiae by Microarray Hybridization,” Molecular Biology of the Cell, vol. 1, no. 9, pp. 3273-3297, 1998.
[20] S.R. Safavian and D. Landgrebe, “A Survey of Decision Tree Classifier Methodology,” IEEE Trans. Systems, Man, and Cybernetics, vol. 22, pp. 660-674, 1998.
[21] A. Tanay, R. Sharan, and R. Shamir, “Discovering Statistically Significant Biclusters in Gene Expression Data,” Proc. 10th Int'l Conf. Intelligent Systems for Molecular Biology (ISMB '02), pp. 136-144, 2002.
[22] T.R. Hughes et al., “Function Discovery via a Compendium of Expression Profiles,” Cell, vol. 102, pp. 109-126, 2000.
[23] H. Wang, F. Chu, W. Fan, P.S. Yu, and J. Pei, “A Fast Algorithm for Subspace Clustering by Pattern Similarity,” Proc. 16th Int'l Conf. Scientific and Statistical Database Management (SSDBM '04), pp. 51-62, 2004.
[24] H. Wang, W. Wang, J. Yang, and P.S. Yu, “Clustering by Pattern Similarity in Large Data Sets,” Proc. ACM SIGMOD '02, pp. 394-405, 2002.
[25] X. Xu, Y. Lu, A.K.H. Tung, and W. Wang, “Mining Shifting-and-Scaling Co-Regulation Patterns on Gene Expression Profiles,” Proc. 22nd IEEE Int'l Conf. Data Eng. (ICDE '06), pp. 89-100, 2006.
[26] H. Yu et al., “Genomic Analysis of Gene Expression Relationships in Transcriptional Regulatory Networks,” Trends Genet, vol. 19, no. 8, pp. 422-427, 2003.
[27] Y. Zhang, H. Zha, and C.H. Chu, “A Time-Series Biclustering Algorithm for Revealing Co-Regulated Genes,” Proc. IEEE Int'l Symp. Information Technology: Coding and Computing (ITCC '05), pp.32-37, 2005.
[28] L. Zhao and M.J. Zaki, “Tricluster: An Effective Algorithm for Mining Coherent Clusters in 3D Microarray Data,” Proc. ACM SIGMOD '05, pp. 51-62, 2005.
17 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool