The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.04 - October-December (2008 vol.5)
pp: 583-593
ABSTRACT
Many different methods exist for pattern detection in gene expression data. In contrast to classical methods, biclustering has the ability to cluster a group of genes together with a group of conditions (replicates, set of patients or drug compounds). However, since the problem is NP-complex, most algorithms use heuristic search functions and therefore might converge towards local maxima. By using the results of biclustering on discrete data as a starting point for a local search function on continuous data, our algorithm avoids the problem of heuristic initialization. Similar to OPSM, our algorithm aims to detect biclusters whose rows and columns can be ordered such that row values are growing across the bicluster's columns and vice-versa. Results have been generated on the yeast genome (Saccharomyces cerevisiae), a human cancer dataset and random data. Results on the yeast genome showed that 89% of the one hundred biggest non-overlapping biclusters were enriched with Gene Ontology annotations. A comparison with OPSM and ISA demonstrated a better efficiency when using gene and condition orders. We present results on random and real datasets that show the ability of our algorithm to capture statistically significant and biologically relevant biclusters.
INDEX TERMS
Bioinformatics (genome or protein) databases, Data and knowledge visualization, Data mining, Machine learning, Graph and tree search strategies
CITATION
Yann Christinat, Bernd Wachmann, Lei Zhang, "Gene Expression Data Analysis Using a Novel Approach to Biclustering Combining Discrete and Continuous Data", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.5, no. 4, pp. 583-593, October-December 2008, doi:10.1109/TCBB.2007.70251
REFERENCES
[1] Y. Cheng and G.M. Church, “Biclustering of Expression Data,” Proc. Int'l Conf. Intelligent Systems for Molecular Biology (ISMB '00), vol. 8, pp. 93-103, 2000.
[2] S.C. Madeira and A.L. Oliveira, “Biclustering Algorithms for Biological Data Analysis: A Survey,” IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 1, no. 1, pp. 24-45, Jan.-Mar. 2004.
[3] J. Ihmels, G. Friedlander, S. Bergmann, O. Sarig, Y. Ziv, and N. Barkai, “Revealing Modular Organization in the Yeast Transcriptional Network,” Nature Genetics, vol. 31, no. 4, pp. 370-377, 2002.
[4] S. Bergmann, J. Ihmels, and N. Barkai, “Iterative Signature Algorithm for the Analysis of Large-Scale Gene Expression Data,” Physical Rev. E67, 031902, 2003.
[5] G. Getz, E. Levine, and E. Domany, “Coupled Two-Way Clustering Analysis of Gene Microarray Data,” Proc. Nat'l Academy of Sciences, vol. 97, no. 22, pp. 12079-12084, 2000.
[6] Y. Kluger, R. Basri, J.T. Chang, and M. Gerstein, “Spectral Biclustering of Microarray Data: Coclustering Genes and Conditions,” Genome Research, vol. 13, no. 4, pp. 703-716, 2003.
[7] T.M. Murali and S. Kasif, “Extracting Conserved Gene Expression Motifs from Gene Expression Data,” Proc. Pacific Symp. Biocomputing (PSB '03), vol. 8, pp. 77-88, 2003.
[8] J. Yang, H. Wang, W. Wang, and P. Yu, “Enhanced Biclustering on Expression Data,” Proc. Third IEEE Symp. Bioinformatics and Bioengineering (BIBE '03), pp. 321- 327, 2003.
[9] L. Lazzeroni and A. Owen, “Plaid Models for Gene Expression Data,” Statistica Sinica, vol. 12, pp. 61-86, 2002.
[10] A. Ben-Dor, B. Chor, R. Karp, and Z. Yakhini, “Discovering Local Structure in Gene Expression Data: The Order-Preserving Submatrix Problem,” Proc. Sixth Int'l Conf. Computational Biology (RECOMB '02), pp. 49-57, 2002.
[11] A. Tanay, R. Sharan, and R. Shamir, “Discovering Statistically Significant Biclusters in Gene Expression Data,” Bioinformatics, vol. 18, pp. S136-S144, 2002.
[12] I. Tagkopoulos, N. Slavov, and S.Y. Kung, “Multi-Class Biclustering and Classification Based on Modeling of Gene Regulatory Networks,” Proc. Fifth IEEE Symp. Bioinformatics and Bioengineering (BIBE), 2005.
[13] J.S. Aguilar-Ruiz and F. Divina, “Evolutionary Computation for Biclustering of Gene Expression,” Proc. ACM Symp. Applied Computing (SAC), 2005.
[14] K. Bryan, P. Cunningham, and N. Bolshakova, “Biclustering of Expression Data Using Simulated Annealing,” Proc. 18th IEEE Symp. Computer-Based Medical Systems (CMBMS), 2005.
[15] A. Prelic, S. Bleuler, P. Zimmermann, A. Wille, P. Buhlmann, W. Gruissem, L. Hennig, L. Thiele, and E. Zitzler, “A Systematic Comparison and Evaluation of Biclustering Methods for Gene Expression Data,” Bioinformatics, vol. 22, no. 9, pp. 1122-1129, 2006.
[16] S. Dudoit, Y.H. Yang, M.J. Callow, and T.P. Speed, “Statistical Methods for Identifying Genes with Differential Expression in Replicated cDNA Microarray Experiments,” Technical Report 578, Dept. of Biochemistry, Univ. of Stanford, Aug. 2000.
[17] T. Kamishima and S. Akaho, “Learning from Order Examples,” Proc. Second IEEE Int'l Conf. Data Mining (ICDM '02), pp. 645- 648, 2002.
[18] J. Bilmes, “A Gentle Tutorial of the EM Algorithm and Its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models,” technical report, Univ. of Berkeley, 1998.
[19] R. Lowry, “Concepts and Applications of Inferential Statistics,” http://faculty.vassar.edu/lowrywebtext.html , [Online; accessed 27 November 2006], 2006.
[20] S. Yoon, C. Nardini, L. Benini, and G. De Micheli, “Discovering Coherent Biclusters from Gene Expression Data Using Zero-Suppressed Binary Decision Diagrams,” IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 2, no. 4, pp. 339-354, Oct.-Dec. 2005.
[21] W.W. Cohen, R.E. Schapire, and Y. Singer, “Learning to Order Things,” J. Artificial Intelligence Research, vol. 10, pp. 243-270, 1999.
[22] T.R. Hughes et al., “Functional Discovery via a Compendium of Expression Profiles,” Cell, vol. 102, no. 1, pp. 109-126, 2000.
[23] A.A. Alizadeh et al., “Distinct Types of Diffuse Large B-Cell Lymphoma Identified by Gene Expression Profiling,” Nature, vol. 403, no. 6769, pp. 503-511, Feb. 2000.
[24] S. Barkow, S. Bleuler, A. Prelic, P. Zimmermann, and E. Zitzler, “Bicat: A Biclustering Analysis Toolbox,” Bioinformatics, vol. 22, no. 10, pp. 1282-1283, 2006.
[25] G.F. Berriz, O.D. King, B. Bryant, C. Sander, and F.P. Roth, “Characterizing Gene Sets with FuncAssociate,” Bioinformatics, vol. 19, no. 18, pp. 2502-2504, 2003.
[26] M.A. Shipp et al., “Diffuse B-Cell Lymphoma Outcome Prediction by Gene-Expression Profiling and Supervised Machine Learning,” Nature Medicine, vol. 8, no. 1, pp. 68-74, Jan. 2002.
447 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool