This Article 
 Bibliographic References 
 Add to: 
Biclustering of Expression Data with Evolutionary Computation
May 2006 (vol. 18 no. 5)
pp. 590-602
Microarray techniques are leading to the development of sophisticated algorithms capable of extracting novel and useful knowledge from a biomedical point of view. In this work, we address the biclustering of gene expression data with evolutionary computation. Our approach is based on evolutionary algorithms, which have been proven to have excellent performance on complex problems, and searches for biclusters following a sequential covering strategy. The goal is to find biclusters of maximum size with mean squared residue lower than a given \delta. In addition, we pay special attention to the fact of looking for high-quality biclusters with large variation, i.e., with a relatively high row variance, and with a low level of overlapping among biclusters. The quality of biclusters found by our evolutionary approach is discussed and the results are compared to those reported by Cheng and Church, and Yang et al. In general, our approach, named SEBI, shows an excellent performance at finding patterns in gene expression data.

[1] A. Ben-Dor, R. Shamir, and Z. Yakhini, “Clustering Gene Expression Patterns,” J. Computational Biology, vol. 6, nos. 3-4, pp. 281-297, 1999.
[2] H. Wang, W. Wang, J. Yang, and P.S. Yu, “Clustering by Pattern Similarity in Large Data Sets,” Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 394-405, 2002.
[3] J.A. Hartigan, “Direct Clustering of a Data Matrix,” J. Am. Statistical Assoc., vol. 67, no. 337, pp. 123-129, 1972.
[4] Y. Cheng and G.M. Church, “Biclustering of Expression Data,” Proc. Eighth Int'l Conf. Intelligent Systems for Molecular Biology, pp. 93-103, 2000.
[5] G. Getz, E. Levine, and E. Domany, “Coupled Two-Way Clustering Analysis of Gene Microarray Data,” Proc. Natural Academy of Sciences USA, pp. 12,079-12,084, 2000.
[6] L. Lazzeroni and A. Owen, “Plaid Models for Gene Expression Data,” technical report, Stanford Univ., 2000.
[7] J. Yang, W. Wang, H. Wang, and P.S. Yu, “$\delta{\hbox{-}}{\rm{Clusters}}$ : Capturing Subspace Correlation in a Large Data Set,” Proc. 18th IEEE Conf. Data Eng., pp. 517-528, 2002.
[8] J. Yang, W. Wang, H. Wang, and P.S. Yu, “Enhanced Biclustering on Expression Data,” Proc. Third IEEE Conf. Bioinformatics and Bioeng., pp. 321-327, 2003.
[9] A. Tanay, R. Sharan, and R. Shamir, “ Discovering Statistically Significant Biclusters in Gene Expression Data,” Bioinformatics, vol. 19, (Sup. 2), pp. 196-205, 2002.
[10] H. Wang, W. Wang, J. Yang, and P.S. Yu, “Clustering by Pattern Similarity in Large Data Sets,” Proc. ACM SIGMOD Conf., 2002,
[11] J. Liu and W. Wang, “Op-Cluster: Clustering by Tendency in High Dimensional Space,” Proc. Third IEEE Int'l Conf. Data Mining, p. 187-194, 2003.
[12] A. Ben-Dor, B. Chor, R. Karp, and Z. Yakhini, “Discovering Local Structure in Gene Expression Data: The Order-Preserving SubMatrix Problem,” Proc. Sixth Ann. Int'l Conf. Computational Biology, pp. 49-57, 2002.
[13] J. Hipp, U. Güntzer, and G. Nakhaeizadeh, “Algorithms for Association Rule Mining— A General Survey and Comparison,” SIGKDD Explorations Newsletter, vol. 2, no. 1, pp. 58-64, 2000.
[14] J. Pei, X. Zhang, M. Cho, H. Wang, and P.S. Yu, “Maple: A Fast Algorithm for Maximal Pattern-Based Clustering,” Proc. Third IEEE Int'l Conf. Data Mining, p. 259-266, 2003.
[15] J. Orling, “Containment in Graph Theory: Covering Graphs with Cliques,” Nederl. Akad. Wetensch. Indag. Math., vol. 39, pp. 211-218, 1977.
[16] A.E. Eiben and J.E. Smith, Introduction to Evolutionary Computing. Springer-Verlag, 2003.
[17] S. Bleuler, A. Prelić, and E. Zitzler, “An EA Framework for Biclustering of Gene Expression Data,” Congress on Evolutionary Computation (CEC-2004), pp. 166-173, 2004.
[18] S. Bleuler and E. Zitzler, “Order Preserving Clustering over Multiple Time Course Experiments,” Proc. EvoWorkshops 2005, pp. 33-43, 2005.
[19] T. Bäck, D.B. Fogel, and Z. Michalewicz, Evolutionary Computation 1: Basic Algorithms and Operators. Inst. of Physics Publishing, 2000.
[20] X. Yao, Evolutionary Computation: A Gentle Introduction, chapter 2. Kluwer Academic Publishers, pp. 27-53, 2002.
[21] D.E. Goldberg and L. Robert, “Alleles, Loci, and the Travelling Salesman Problem,” Proc. First Int'l Conf. Genetic Algorithms, pp. 154-159, 1985.
[22] P.J. Bentley and D.W. Corne, Creative Evolutionary Systems. Morgan Kaufmann Publishers Inc., 2001.
[23] T. Yamada and R. Nakano, “A Genetic Algorithm Applicable to Large-Scale Job-Shop Problems,” Parallel Problem Solving from Nature, R. Männer and B. Manderick, eds. vol. 2, Amsterdam: Elsevier Science Publishers, B.V., 1992.
[24] D. Corne, P. Ross, and H.-L. Fang, “Fast Practical Evolutionary Timetabling,” Proc. Evolutionary Computing AISB Workshop, pp. 251-263, 1994,
[25] D.K. Gehlhaar, G.M. Verkhivker, P.A. Rejto, C.J. Sherman, D.B. Fogel, L.J. Fogel, and S.T. Freer, “Molecular Recognition of the Inhibitor Ag-1343 by Hiv-1 Protease: Conformationally Flexible Docking by Evolutionary Programming,” Chemistry and Biology, vol. 2, no. 5, pp. 317-324, 1995.
[26] G.F. Spencer, “Automatic Generation of Programs for Crawling and Walking,” Proc. Fifth Int'l Conf. Genetic Algorithms (ICGA '93), p. 654, 1993.
[27] D.B. Fogel, “Evolving Behavious in the Iterated Prisoner's Dilemma,” Evolutionary Computation, vol. 1, no. 1, pp. 77-97, 1993.
[28] F. Divina and E. Marchiori, “Evolutionary Concept Learning,” Proc. Genetic and Evolutionary Computation Conf., pp. 343-350, July 2002.
[29] J.S. Aguilar-Ruiz, J. Riquelme, and M. Toro, “An Evolutionary Approach to Estimating Software Development Projects,” Information and Software Technology, vol. 14, no. 43, pp. 875-882, 2001.
[30] J.S. Aguilar-Ruiz, J. Riquelme, and C.D. Valle, “Evolutionary Learning of Hierarchical Decision Rules,” IEEE Trans. Systems, Man, and Cybernetics, Part B, vol. 33, no. 2, pp. 324-331, 2003.
[31] F. Divina and E. Marchiori, “Knowledge-Based Evolutionary Search forInductive Concept Learning,” Knowledge Incorporation in Evolutionary Computation, Y. Jin, ed. ch. Part 3, Springer-Verlag, pp. 237-254, 2004.
[32] R. Cho, M. Campbell, E. Winzeler, L. Steinmetz, A. Conway, L. Wodicka, T. Wolfsberg, A. Gabrielian, D. Landsman, D. Lockhart, and R. Davis, “A Genome-Wide Transcriptional Analysis of the Mitotic Cell Cycle,” Molecular Cell, vol. 2, pp. 65-73, 1998.
[33] A.A. Alizadeh, M.B. Eisen, R.E. Davis, C. Ma, I.S. Lossos, A. Rosenwald, J.C. Boldrick, H. Sabet, T. Tran, X. Yu, J.I. Powell, L. Yang, G.E. Marti, T. Moore, J. Hudson, L. Lu, D.B. Lewis, R. Tibshirani, G. Sherlock, W.C. Chan, T.C. Greiner, D.D. Weisenburger, J.O. Armitage, R. Warnke, R. Levy, W. Wilson, M.R. Grever, J.C. Byrd, D. Botstein, P.O. Brown, and L.M. Staudt, “Distinct Types of Diffuse Large B-Cell Lymphoma Identified by Gene Expression Profiling,” Nature, vol. 403, pp. 503-511, 2000.
[34] S. Tavazoie, J. Hughes, M. Campbell, R. Cho, and G. Hurch, “Systematic Determination of Genetic Network Architecture,” Bioinformatics, vol. 19, (Sup. 2), pp. 281-285, 1999.
[35] J. Yang, H. Wang, W. Wang, and P. Yu, “Enhanced Biclustering on Expression Data,” Proc. Third IEEE Symp. BioInformatics and BioEng. (BIBE '03), pp. 321-327, 2003,

Index Terms:
Biclustering, gene expression data, evolutionary computation.
Federico Divina, Jes?s S. Aguilar-Ruiz, "Biclustering of Expression Data with Evolutionary Computation," IEEE Transactions on Knowledge and Data Engineering, vol. 18, no. 5, pp. 590-602, May 2006, doi:10.1109/TKDE.2006.74
Usage of this product signifies your acceptance of the Terms of Use.