This Article 
 Bibliographic References 
 Add to: 
Discovering Coherent Biclusters from Gene Expression Data Using Zero-Suppressed Binary Decision Diagrams
October-December 2005 (vol. 2 no. 4)
pp. 339-354

Abstract—The biclustering method can be a very useful analysis tool when some genes have multiple functions and experimental conditions are diverse in gene expression measurement. This is because the biclustering approach, in contrast to the conventional clustering techniques, focuses on finding a subset of the genes and a subset of the experimental conditions that together exhibit coherent behavior. However, the biclustering problem is inherently intractable, and it is often computationally costly to find biclusters with high levels of coherence. In this work, we propose a novel biclustering algorithm that exploits the zero-suppressed binary decision diagrams (ZBDDs) data structure to cope with the computational challenges. Our method can find all biclusters that satisfy specific input conditions, and it is scalable to practical gene expression data. We also present experimental results confirming the effectiveness of our approach.

[1] A.V. Aho, J.E. Hopcroft, and J.D. Ullman, Data Structures and Algorithms. Reading, Mass.: Addison-Wesley, 1983.
[2] A. Alizadeh et al., “Distinct Types of Diffuse Large B-Cell Lymphoma Identified by Gene-Expression Profiling,” Nature, vol. 4051, pp. 503-511, 2000.
[3] R.B. Altman and S. Raychaudhuri, “Whole-Genome Expression Analysis: Challenges beyond Clustering,” Current Opinion in Structural Biology, vol. 11, pp. 340-347, 2001.
[4] A. Ben-Dor, B. Chor, R. Karp, and Z. Yakhini, “Discovering Local Structure in Gene Expression Data: The Order-Preserving Submatrix Problem,” J. Computational Biology, vol. 10, nos. 3-4, pp. 373-384, 2003.
[5] R.E. Bryant, “Graph-Based Algorithms for Boolean Function Manipulation,” IEEE Trans. Computers, vol. 35, no. 8, pp. 677-691, Aug. 1986.
[6] R.E. Bryant, “Binary Decision Diagrams and Beyond: Enabling Technologies for Formal Verification,” Proc. IEEE/ACM Int'l Conf. Computer Aided Design, (ICCAD), pp. 236-243, 1995.
[7] A. Califano, G. Stolovitzky, and Y. Tu, “Analysis of Gene Expression Microarrays for Phenotype Classification,” Proc. Int'l Conf. Intelligent Systems for Molecular Biology, pp. 75-85, 2000.
[8] Y. Cheng and G.M. Church, “Biclustering of Expression Data,” Proc. Conf. Intelligent Systems for Molecular Biology (ISMB), pp. 93-103, 2000.
[9] T.H. Cormen, C.E. Leiserson, R.L. Rivest, and C. Stein, Introduction to Algorithms. Cambridge, Mass.: MIT Press, 2001.
[10] G. De Micheli, Synthesis and Optimization of Digital Circuits. New York: McGraw-Hill, 1994.
[11] R.O. Duda, P.E. Hart, and D.G. Stork, Pattern Classification. New York: Wiley, second ed., 2001.
[12] T. Fawcett, “ROC Graphs: Notes and Practical Considerations for Data Mining Researchers,” HP Laboratories technical report, 2003.
[13] G. Getz, E. Levine, and E. Domany, “Coupled Two-Way Clustering Analysis of Gene Microarray Data,” Proc. Nat'l Academy of Science, vol. 94, pp. 12079-12084, 2000.
[14] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning. New York: Springer-Verlag, 2001.
[15] Y. Kluger, R. Basri, J.T. Chang, and M. Gerstein, “Spectral Biclustering of Microarray Data: Coclustering Genes and Conditions,” Genome Research, vol. 13, no. 4, pp. 703-716, Apr. 2003.
[16] L. Lazzeroni and A. Owen, “Plaid Models for Gene Expression Data,” Stanford Univ. technical report, 2000.
[17] S.C. Madeira and A.L. Oliveira, “Biclustering Algorithms for Biological Data Analysis: A Survey,” IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 1, no. 1, pp. 24-45, 2004.
[18] C. Meinel and T. Theobald, Algorithms and Data Structures in VLSI Design. Berlin: Springer, 1998.
[19] S. Minato, “Zero-Suppressed BDDs for Set Manipulation in Combinatorial Problems,” Proc. IEEE/ACM Design Automation Conf., (DAC), pp. 272-277, 1993.
[20] S. Minato, Binary Decision Diagrams and Applications for VLSI CAD. Kluwer, 1996.
[21] T.M. Murali and S. Kasif, “Extracting Conserved Gene Expression Motifs from Gene Expression Data,” Proc. Pacific Symp. Biocomputing, pp. 77-88, 2003.
[22] S. Raychaundhuri, P.D. Sutphin, J.T. Chang, and R.B. Altman, “Basic Microarray Analysis: Grouping and Feature Reduction,” Trends in Biotechnology, vol. 19, no. 5, pp. 189-193, May 2001.
[23] J.A. Rice, Mathematical Statistics and Data Analysis. Duxbury Press, 1994.
[24] B. Rosner, Fundamentals of Biostatistics. fifth ed., Pacific Grove, Calif.: Duxbury, 2000.
[25] T. Sasao and M. Fujita, Representations of Discrete Functions. Mass.: Kluwer, 1996.
[26] E. Segal, B. Taskar, A. Gasch, N. Friedman, and D. Koller, “Rich Probabilistic Models for Gene Expression,” Bioinformatics, vol. 17, pp. 243-252, 2001.
[27] R.R. Sokal and F.J. Rohlf, Biometry. WH Freeman and Co., 1994.
[28] M. Sultan, D.A. Wigle, C.A. Cumbaa, M. Maziarz, J. Glasgow, M.S. Tsao, and I. Jurisica, “Binary Tree-Structured Vector Quantization Approach to Clustering and Visualizing Microarray Data,” Bioinformatics, vol. 18, pp. 111-119, 2002.
[29] A. Tanay, R. Sharan, and R. Shamir, “Discovering Statistically Significant Biclusters in Gene Expression Data,” Bioinformatics, vol. 18, pp. 136-144, 2002.
[30] S. Tavazoie, J.D. Hughes, M.J. Campbell, R.J. Cho, and G.M. Church, “Systematic Determination of Genetic Network Architecture,” Nature Genetics, vol. 22, pp. 281-285, 1999.
[31] H. Wang, W. Wang, J. Yang, and P.S. Yu, “Clustering by Pattern Similarity in Large Data Sets,” Proc. ACM SIGMOD Conf., pp. 394-405, 2002.
[32] C.-J. Wu, Y. Fu, T.M. Murali, and S. Kasif, “Gene Expression Module Discovery Using Gibbs Sampling,” Genome Informatics, vol. 15, no. 1, pp. 239-248, 2004.
[33] J. Yang, H. Wang, W. Wang, and P. Yu, “Enhanced Biclustering on Expression Data,” Proc. IEEE Third Symp. Bioinformatics and Bioeng., pp. 321-327, 2003.
[34] S. Yoon, C. Nardini, L. Benini, and G. De Micheli, “Enhanced Pclustering and Its Applications to Gene Expression Data,” Proc. IEEE Fourth Symp. Bioinformatics and Bioeng., pp. 275-282, 2004.

Index Terms:
Clustering, life and medical sciences, bioinformatics (genome or protein) databases, logic design.
Sungroh Yoon, Christine Nardini, Luca Benini, Giovanni De Micheli, "Discovering Coherent Biclusters from Gene Expression Data Using Zero-Suppressed Binary Decision Diagrams," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 2, no. 4, pp. 339-354, Oct.-Dec. 2005, doi:10.1109/TCBB.2005.55
Usage of this product signifies your acceptance of the Terms of Use.