The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.04 - April (2011 vol.23)
pp: 568-584
Wen-Hui Yang , Sun Yat-Sen University, Guangzhou
Dao-Qing Dai , Sun Yat-Sen University, Guangzhou
Hong Yan , City University of Hong Kong, Hong Kong
ABSTRACT
Extracting biologically relevant information from DNA microarrays is a very important task for drug development and test, function annotation, and cancer diagnosis. Various clustering methods have been proposed for the analysis of gene expression data, but when analyzing the large and heterogeneous collections of gene expression data, conventional clustering algorithms often cannot produce a satisfactory solution. Biclustering algorithm has been presented as an alternative approach to standard clustering techniques to identify local structures from gene expression data set. These patterns may provide clues about the main biological processes associated with different physiological states. In this paper, different from existing bicluster patterns, we first introduce a more general pattern: correlated bicluster, which has intuitive biological interpretation. Then, we propose a novel transform technique based on singular value decomposition so that identifying correlated-bicluster problem from gene expression matrix is transformed into two global clustering problems. The Mixed-Clustering algorithm and the Lift algorithm are devised to efficiently produce \delta-corBiclusters. The biclusters obtained using our method from gene expression data sets of multiple human organs and the yeast Saccharomyces cerevisiae demonstrate clear biological meanings.
INDEX TERMS
Biclustering, pattern classification, gene expression data, singular-value decomposition, data mining, biology computing.
CITATION
Wen-Hui Yang, Dao-Qing Dai, Hong Yan, "Finding Correlated Biclusters from Gene Expression Data", IEEE Transactions on Knowledge & Data Engineering, vol.23, no. 4, pp. 568-584, April 2011, doi:10.1109/TKDE.2010.150
REFERENCES
[1] G. McoLachlan, K. Do, and C. Ambroise, Analysing Microarray Gene-Expression Data. Wiley, 2004.
[2] D.X. Jiang, C. Tang, and A.D. Zhang, "Cluster Analysis for Gene-Expression Data: A Survey," IEEE Trans. Knowledge and Data Eng., vol. 16, no.11, pp. 1370-1386, Nov. 2004.
[3] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning. Springer-Verlag, 2001.
[4] S. Rovetta and F. Masulli, "Shared Farthest Neighbor Approach to Clustering of High Dimensionality, Low Cardinality Data," Pattern Recognition, vol. 39, pp. 2415-2425, 2006.
[5] G. Peters, "Some Refinements of Rough K-Means Clustering," Pattern Recognition, vol. 39, pp. 1481-1491, 2006.
[6] K. Rose, E. Gurewitz, and G. Fox, "A Deterministic Annealing Approach to Clustering," Pattern Recognition Letters, vol. 11, pp. 589-594, 1990.
[7] F. Masulli and S. Rovetta, "Soft Transition from Probabilistic to Possibilistic Fuzzy Clustering," IEEE Trans. Fuzzy Systems, vol. 14, no. 4, pp. 516-526, Aug. 2006.
[8] S.-B. Cho and S.-H. Yoo, "Fuzzy Bayesian Validation for Cluster Analysis of Yeast Cell-Cycle Data," Pattern Recognition, vol. 39, pp. 2405-2414, 2006.
[9] X.D. Cai and G.B. Giannakis, "Identifying Differentially Expressed Genes in Microarray Experiments with Model-Based Variance Estimation," IEEE Trans. Signal Processing, vol. 54, no. 6, pp. 2418-2426, June 2006.
[10] S.W. Ji and J.P. Ye, "Kernel Uncorrelated and Regularized Discriminant Analysis: A Theoretical and Computational Study," IEEE Trans. Knowledge and Data Eng., vol. 20, no. 10, pp. 1311-1321, Oct. 2008.
[11] W.H. Yang, D.Q. Dai, and H. Yan, "Feature Extraction and Uncorrelated Discriminant Analysis for High-Dimensional Data," IEEE Trans. Knowledge and Data Eng., vol. 20, no.5, pp. 601-614, May 2008.
[12] Y.H. Zhao, J.X. Yu, G.R. Wang, L. Chen, B. Wang, and G. Yu, "Maximal Subspace Coregulated Gene Clustering," IEEE Trans. Knowledge and Data Eng., vol. 20, no. 1, pp. 83-98, Jan. 2008.
[13] Y. Cheng and G.M. Church, "Biclustering of Gene-Expression Data," Proc. Eighth Int'l Conf. Intelligent Systems for Molecular Biology (ISMB '00), pp. 93-103, 2000.
[14] A. Tanay, R. Sharan, and R. Shamir, "Discovering Statistically Significant Biclusters in Gene Expression Data," Bioinformatics, vol. 18, pp. 36-44, 2002.
[15] S.C. Madeira and A.L. Oliveira, "A Linear Time Biclustering Algorithm for Time Series Gene Expression Data," Proc. Fifth Workshop Algorithms in Bioinformatics, vol. 3692, pp. 39-52, 2005.
[16] A. Ben-Dor, B. Chor, R. Karp, and Z. Yakhini, "Discovering Local Structure in Gene Expression Data: The Order-Preserving Sub-Matrix Problem," Proc. Sixth Ann. Int'l Conf. Computational Biology (RECOMB '02), pp. 49-57, 2002.
[17] L.J. Yin, G.R. Wang, K.M. Mao, and Y.H. Zhao, "Mining Time-Delayed Coherent Patterns in Time Series Gene Expression Data," Lecture Notes in Artificial Intelligence, pp. 711-722, vol. 4093, Springer, 2006.
[18] D.J. Reiss, N.S. Baliga, and R. Bonneau, "Integrated Biclustering of Heterogeneous Genome-Wide Data Sets for the Inference of Global Regulatory Networks," BMC Bioinformatics, vol. 7, article no. 280, 2006.
[19] B.J. Beattie and P.N. Robinson, "Binary State Pattern Clustering: A Digital Paradigm for Class and Biomarker Discovery in Gene Microarray Studies of Cancer," J. Computational Biology, vol. 13, pp. 1114-1130, 2006.
[20] S.C. Madeira and A.L. Oliveira, "Biclustering Algorithms for Biological Data Analysis: A Survey," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 1, no. 1, pp. 24-45, Jan.-Mar. 2004.
[21] B. Mirkin, Mathematical Classification and Clustering. Kluwer Academic Press, 1996.
[22] J.A. Hartigan, "Direct Clustering of a Data Matrix," J. Am. Statistical Assoc., vol. 67, no. 337, pp. 123-129, 1972.
[23] A.H. Tewfik, A.B. Tchagang, and L. Vertatschitsch, "Parallel Identification of Gene Biclusters with Coherent Evolutions," IEEE Trans. Signal Processing, vol. 54, no. 6, pp. 2408-2417, June 2006.
[24] S. Mitra and H. Banka, "Multi-Objective Evolutionary Biclustering of Gene Expression Data," Pattern Recognition, vol. 39, pp. 2464-2477, 2006.
[25] F. Divina and J.S. Aguilar-Ruiz, "Biclustering of Expression Data with Evolutionary Computation," IEEE Trans. Knowledge and Data Eng., vol. 18, no. 5, pp. 590-602, May 2006.
[26] A. Abdullah et al., "A New Biclustering Technique Based on Crossing Minimization," Neurocomputing, vol. 69, pp. 1882-1896, 2006.
[27] L. Lazzeroni and A. Owen, "Plaid Models for Gene Expression Data," Statistica Sinica, vol. 12, no. 1, pp. 61-86, 2002.
[28] H.L. Turner, T.C. Bailey, W.J. Krzanowski, and C.A. Hemingway, "Biclustering Models for Structured Microarray Data," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 2, no. 4, pp. 316-329, Oct.-Dec. 2005.
[29] T. Murali and S. Kasif, "Extracting Conserved Gene Expression Motifs from Gene Expression Data," Proc. Pacific Symp. Biocomputing, vol. 8, pp. 77-88, 2003.
[30] J. Ihmels, S. Bergmann, and N. Barkai, "Defining Transcription Modules Using Large-Scale Gene Expression Data," Bioinformatics, vol. 20, pp. 1993-2003, 2004.
[31] X. Gan, A.W.C. Liew, and H. Yan, "Discovering Biclusters in Gene Expression Data Based on High-Dimensional Linear Geometries," BMC Bioinformatics, vol. 9, article no. 209, 2008.
[32] A. Tanay, R. Sharan, and R. Shamir, "Biclustering Algorithms: A Survey," Handbook of Computational Molecular Biology, S. Aluru, ed., Chapman & Hall, 2005.
[33] A.K.C. Wong and G.C.L. Li, "Simultaneous Pattern and Data Clustering for Pattern Cluster Analysis," IEEE Trans. Knowledge and Data Eng., vol. 20, no. 7, pp. 911-923, July 2008.
[34] H. Yu et al., "Genomic Analysis of Gene Expression Relationships in Transcriptional Regulatory Networks," Trends Genet, vol. 19, article no. 209, 2003.
[35] Y. Zhang, H. Zha, and C.H. Chu, "A Time-Series Biclustering Algorithm for Revealing Co-Regulated Genes," Proc. Int'l Conf. Information Technology: Coding and Computing (ITCC '05), pp. 32-37, 2005.
[36] F. Altiparmak, S. Erdal, O. Ozturk, and H. Ferhatosmanoglu, "A Multi-Metric Similarity Based Analysis of Microarray Data," Proc. IEEE Int'l Conf. Bioinformatics and Biomedicine (BIBM '07), pp. 317-324, 2007.
[37] A. Banerjee, I. Dhillon, J. Ghosh, S. Merugu, and D.S. Modha, "A Generalized Maximum Entropy Approach to Bregman Co-Clustering and Matrix Approximation," J. Machine Learning Research, vol. 8, pp. 1919-1986, 2007.
[38] N. Gupta and S. Aggarwal, "MIB: Using Mutual Information for Biclustering High Dimensional Data," Proc. IADIS European Conf. Data Mining, pp. 119-123, 2008.
[39] G. Getz, E. Levine, and E. Domany, "Coupled Two-Way Clustering Analysis of Gene Microarray Data," Proc. Nat'l Academy of Sciences of the USA, vol. 97, no. 22, pp. 12079-12084, 2000.
[40] D.Q. Dai and H. Yan, "Matrix Decomposition for Feature Generation from High Dimensional Data," Pattern Recognition Theory and Application, vol. 4819, pp. 194-205, 2007.
[41] O. Alter, P.O. Brown, and D. Botstein, "Singular Value Decomposition for Genome-Wide Expression Data Processing and Modeling," Proc. Nat'l Academy of Sciences USA, vol. 97, no. 18, pp. 10101-10106, 2000.
[42] O. Alter and G.H. Golub, "Singular Value Decomposition of Genome-Scale mRNA Lengths Distribution Reveals Asymmetry in RNA Gel Electrophoresis Band Broadening," Proc. Nat'l Academy of Sciences USA, vol. 103, no. 32, pp. 11828-11833, 2006.
[43] L. Liu, D.M. Hawkins, S. Ghosh, and S.S. Young, "Robust Singular Value Decomposition Analysis of Microarray Data," Proc. Nat'l Academy of Sciences of the USA, vol. 100, no. 23, pp. 13167-13172, 2003.
[44] W.H. Yang, D.Q. Dai, and H. Yan, "Biclustering of Microarray Data Based on Singular Value Decomposition," Lecture Notes in Artificial Intelligence, vol. 4819, pp. 194-205, Springer, 2007.
[45] P. Carmona-Saez, R.D. Pascual-Marqui, F. Tirado, J.M. Carazo, and A. Pascual-Montano, "Biclustering of Gene Expression Data by Non-Smooth Non-Negative Matrix Factorization," BMC Bioinformatics, vol. 7, article no. 78, 2006.
[46] Y. Kluger, R. Basri, J.T. Chang, and M. Gerstein, "Spectral Biclustering of Microarraydata: Coclustering Genes and Conditions," Genome Research, vol. 13, no. 4, pp. 703-716, 2003.
[47] B. Liu, C. Wan, and L. Wang, "Unsupervised Gene Selection via Spectral Biclustering," Proc. IEEE Int'l Joint Conf. Neural Networks (IJCNN '04), vol. 3, pp. 1681-1686, 2004.
[48] C.G. Son et al., "Database of mRNA Gene Expression Profiles of Multiple Human Organs," Genome Research, vol. 15, pp. 443-450, 2005.
[49] J.S. Aguilar-Ruiz, "Shifting and Scaling Patterns from Gene Expression Data," Bioinformatics, vol. 21, pp. 3840-3845, 2005.
[50] O. Troyanskaya et al., "Missing Value Estimation Methods for DNA Microarrays," Bioinformatics, vol. 17, no. 6, pp. 520-525, 2001.
[51] B. Albert, D. Bray, J. Lewis, M. Raff, K. Roberts, and J.D. Watson, The Molecular Biology of Cell, third ed. Garland, 1994.
[52] G.O. Consortium, "Gene Ontology: Tool for the Unification of Biology," Nature Genetics, vol. 25, pp. 25-29, 2000.
[53] A.P. Gasch, P.T. Spellman, C.M. Kao, O. Carmel-Harel, M.B. Eisen, G. Storz, D. Botstein, and P.O. Brown, "Genomic Expression Programs in the Response of Yeast Cells to Environmental Changes," Molecular Biology of the Cell, vol. 11, pp. 4241-4257, 2000.
[54] A. Prelic, S. Bleuler, P. Zimmermann, A. Wille, P. Buhlmann, W. Gruissem, L. Hennig, L. Thiele, and E. Zitzler, "A Systematic Comparison and Evaluation of Biclustering Methods for Gene Expression Data," Bioinformatics, vol. 22, pp. 1122-1129, 2006.
[55] S. Barkow, S. Bleuler, A. Prelic, P. Zimmermann, and E. Zitzler, "BicAT: A Biclustering Analysis Toolbox," Bioinformatics, vol. 22, pp. 1282-1283, 2006.
[56] R. Shamir, A. Maron-Katz, A. Tanay, C. Linhart, I. Steinfeld, R. Sharan, Y. Shiloh, and R. Elkon, "EXPANDER—An Integrative Program Suite for Microarray Data Analysis," BMC Bioinformatics, vol. 6, article no. 232, 2005.
[57] G.F. Berriz, O.D. King, B. Bryant, C. Sander, and F.P. Roth, "Characterizing Gene Sets with FuncAssociate," Bioinformatics, vol. 19, pp. 2502-2504, 2003.
[58] P.H. Westfall and S.S. Young, Resampling-Based Multiple Testing. Wiley, 1993.
15 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool