This Article 
 Bibliographic References 
 Add to: 
Multiscale Binarization of Gene Expression Data for Reconstructing Boolean Networks
March/April 2012 (vol. 9 no. 2)
pp. 487-498
M. Hopfensitz, Res. Group of Bioinf. & Syst. Biol., Ulm Univ., Ulm, Germany
C. Mussel, Res. Group of Bioinf. & Syst. Biol., Ulm Univ., Ulm, Germany
C. Wawra, Res. Group of Bioinf. & Syst. Biol., Ulm Univ., Ulm, Germany
M. Maucher, Res. Group of Bioinf. & Syst. Biol., Ulm Univ., Ulm, Germany
M. Kuhl, Inst. of Biochem. & Mol. Biol., Ulm Univ., Ulm, Germany
H. Neumann, Inst. of Neural Inf. Process., Ulm Univ., Ulm, Germany
H. A. Kestler, Res. Group of Bioinf. & Syst. Biol., Ulm Univ., Ulm, Germany
Network inference algorithms can assist life scientists in unraveling gene-regulatory systems on a molecular level. In recent years, great attention has been drawn to the reconstruction of Boolean networks from time series. These need to be binarized, as such networks model genes as binary variables (either "expressed” or "not expressed”). Common binarization methods often cluster measurements or separate them according to statistical or information theoretic characteristics and may require many data points to determine a robust threshold. Yet, time series measurements frequently comprise only a small number of samples. To overcome this limitation, we propose a binarization that incorporates measurements at multiple resolutions. We introduce two such binarization approaches which determine thresholds based on limited numbers of samples and additionally provide a measure of threshold validity. Thus, network reconstruction and further analysis can be restricted to genes with meaningful thresholds. This reduces the complexity of network inference. The performance of our binarization algorithms was evaluated in network reconstruction experiments using artificial data as well as real-world yeast expression time series. The new approaches yield considerably improved correct network identification rates compared to other binarization techniques by effectively reducing the amount of candidate networks.

[1] H. de Jong, “Modeling and Simulation of Genetic Regulatory Systems: A Literature Review,” J. Computational Biology, vol. 9, no. 1, pp. 67-103, 2002.
[2] R. Albert and H. Othmer, “The Topology of the Regulatory Interactions Predicts the Expression Pattern of the Segment Polarity Genes in Drosophila Melanogaster,” J. Theoretical Biology, vol. 223, no. 1, pp. 1-18, 2003.
[3] T. Helikar, J. Konvalina, J. Heidel, and J.A. Rogers, “Emergent Decision-Making in Biological Signal Transduction Networks,” Proc. Nat'l Academy of Sciences USA, vol. 105, no. 6, pp. 1913-1918, 2008.
[4] N. Friedman, M. Linial, I. Nachman, and D. Pe'er, “Using Bayesian Networks to Analyze Expression Data,” J. Computational Biology, vol. 7, nos. 3/4, pp. 601-620, 2000.
[5] X. Zhou, X. Wang, and E.R. Dougherty, “Binarization of Microarray Data on the Basis of a Mixture Model,” Molecular Cancer Therapeutics, vol. 2, no. 7, pp. 679-684, 2003.
[6] I. Shmulevich and W. Zhang, “Binary Analysis and Optimization-Based Normalization of Gene Expression Data,” Bioinformatics, vol. 18, no. 4, pp. 555-565, 2002.
[7] D. Sahoo, D.L. Dill, A.J. Gentles, R. Tibshirani, and S.K. Plevritis, “Boolean Implication Networks Derived from Large Scale, Whole Genome Microarray Datasets,” Genome Biology, vol. 9, no. 10, p. R157, Jan. 2008.
[8] F. Markowetz and R. Spang, “Inferring Cellular Networks—A Review,” BMC Bioinformatics, vol. 8(Suppl 6):S5, 2007.
[9] E. Lee, A. Salic, R. Krüger, R. Heinrich, and M.W. Kirschner, “The Roles of APC and Axin Derived from Experimental and Theoretical Analysis of the Wnt Pathway,” PLoS Biology, vol. 1, no. 1, pp. 116-132, 2003.
[10] C. Wawra, M. Kühl, and H.A. Kestler, “Extended Analyses of the Wnt/$\beta$ -Catenin Pathway: Robustness and Oscillatory Behaviour,” FEBS Letters, vol. 581, no. 21, pp. 4043-4048, 2007.
[11] N. Dojer, A. Gambin, A. Mizera, B. Wilczyski, and J. Tiuryn, “Applying Dynamic Bayesian Networks to Perturbed Gene Expression Data,” BMC Bioinformatics, vol. 7, article 249, 2006.
[12] K. Murphy and S. Mian, “Modelling Gene Expression Data Using Dynamic Bayesian Networks,” technical report, Computer Science Division, Univ. of California, Life Sciences Division, Lawrence Berkely Nat'l Laboratory, 1999.
[13] S.A. Kauffman, “Metabolic Stability and Epigensis in Randomly Constructed Genetic Nets,” J. Theoretical Biology, vol. 22, no. 3, pp. 437-467, 1969.
[14] S.A. Kauffman, The Origins of Order: Self-Organization and Selection in Evolution. Oxford Univ. Press, 1993.
[15] O. Brandman, J.E. Ferrell, L. Rong, and T. Meyer, “Interlinked Fast and Slow Positive Feedback Loops Drive Reliable Cell Decisions,” Science, vol. 310, no. 5747, pp. 496-498, 2005.
[16] F. Li, T. Long, Y. Lu, Q. Ouyang, and C. Tang, “The Yeast Cell-Cycle Network is Robustly Designed,” Proc. Nat'l Academy of Sciences USA, vol. 101, no. 14, pp. 4781-4786, 2004.
[17] J. Saez-Rodriguez, L. Simeoni, J. Lindquist, R. Hemenway, U. Bommhardt, B. Arndt, U. Haus, R. Weismantel, E. Gilles, S. Klamt, and B. Schraven, “A Logical Model Provides Insights into T Cell Receptor Signaling,” PLoS Computational Biology, vol. 3, no. 8, p. e163, 2007.
[18] S. Liang, S. Fuhrman, and R. Somogyi, “REVEAL, A General Reverse Engineering Algorithm for Inference of Genetic Network Architectures,” Proc. Pacific Symp. Biocomputing, R.B. Altman, A.K. Dunker, L. Hunter, and T.E.D. Klein, eds., vol. 3, pp. 18-29, 1998.
[19] T. Akutsu, S. Miyano, and S. Kuhara, “Inferring Qualitative Relations in Genetic Networks and Metabolic Pathways,” Bioinformatics, vol. 16, no. 8, pp. 727-734, 2000.
[20] H. Lähdesmäki, I. Shmulevich, and O. Yli-Harja, “On Learning Gene Regulatory Networks under the Boolean Network Model,” Machine Learning, vol. 52, nos. 1/2, pp. 147-167, 2003.
[21] D. Nam, S. Seo, and S. Kim, “An Efficient Top-Down Search Algorithm for Learning Boolean Networks of Gene Expression,” Machine Learning, vol. 65, no. 1, pp. 229-245, 2006.
[22] H. Kim, J.K. Lee, and T. Park, “Boolean Networks Using the Chi-Square Test for Inferring Large-Scale Gene Regulatory Networks,” BMC Bioinformatics, vol. 8, article 37, 2007.
[23] S. Martin, Z. Zhang, A. Martino, and J.-L. Faulon, “Boolean Dynamics of Genetic Regulatory Networks Inferred from Microarray Time Series Data,” Bioinformatics, vol. 23, no. 7, pp. 866-874, 2007.
[24] D. Nam, S. Yoon, and J. Kim, “Ensemble Learning of Genetic Networks from Time-Series Expression Data,” Bioinformatics, vol. 23, no. 23, pp. 3225-3231, 2007.
[25] T. Kämpke and R. Kober, “Discrete Signal Quantization,” Pattern Recognition, vol. 32, no. 4, pp. 619-634, 1999.
[26] A. Witkin, “Scale Space Filtering,” Proc. Int'l Joint Conf. Artificial Intelligence, pp. 1019-1022, 1983.
[27] J.J. Koenderink, “The Structure of Images,” Biological Cybernetics, vol. 50, no. 5, pp. 363-370, 1984.
[28] T. Lindeberg, “Scale-Space for Discrete Signals,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 12, no. 3, pp. 234-254, Mar. 1990.
[29] A. Cunha, R. Teixeira, and L. Velho, “Discrete Scale Spaces via Heat Equation,” Proc. 14th Brazilian Symp. Computer Graphics and Image Processing, pp. 68-75, 2001.
[30] B. Efron and R.J. Tibshirani, An Introduction to the Bootstrap. Chapman & Hall/CRC, 1993.
[31] S.N. Lahiri, Resampling Methods for Dependent Data. Springer, 2003.
[32] P. Hall, J.L. Horowitz, and B.-Y. Jing, “On Blocking Rules for the Bootstrap with Dependent Data,” Biometrika, vol. 82, no. 3, pp. 561-574, 1995.
[33] P. Spellman, G. Sherlock, M. Zhang, V. Iyer, K. Anders, M. Eisen, P. Brown, D. Botstein, and B. Futcher, “Comprehensive Identification of Cell Cycle-Regulated Genes of the Yeast Saccharomyces Cerevisiae by Microarray Hybridization,” Molecular Biology of the Cell, vol. 9, no. 12, pp. 3273-3297, 1998.
[34] R. Cho, M. Campbell, E. Winzeler, L. Steinmetz, A. Conway, L. Wodicka, T.G. Wolfsberg, A.E. Gabrielian, D. Landsman, D.J. Lockghart, and R.W. Davis, “A Genome-Wide Transcriptional Analysis of the Mitotic Cell Cycle,” Molecular Cell, vol. 2, no. 1, pp. 65-73, 1998.
[35] D. Sahoo, D.L. Dill, R. Tibshirani, and S.K. Plevritis, “Extracting Binary Signals from Microarray Time-Course Data,” Nucleic Acids Research, vol. 35, no. 11, pp. 3705-3712, Jan. 2007.
[36] J.A. Hartigan and M.A. Wong, “A K-Means Clustering Algorithm,” Applied Statistics, vol. 28, pp. 100-108, 1979.
[37] I. Simon, J. Barnett, N. Hannett, C. Harbison, N. Rinaldi, T. Volkert, J. Wyrick, J. Zeitlinger, D. Gifford, T. Jaakkola, and R. Young, “Serial Regulation of Transcriptional Regulators in the Yeast Cell Cycle,” Cell, vol. 106, no. 6, pp. 697-708, 2001.
[38] M. Teixeira, P. Monteiro, P. Jain, S. Teneiro, A.R. Fernandes, N.P. Mira, M. Alenquer, A.T. Freitas, A.L. Oliviera, and I. Sá-Correia, “The YEASTRACT Database: A Tool for the Analysis of Transcription Regulatory Associations in Saccharomyces Cerevisiae,” Nucleic Acids Research, vol. 34, no. Suppl. 1, pp. D446-D451, 2006.
[39] V. Matys, E. Fricke, R. Geffers, E. Gössling, M. Haubrock, R. Hehl, K. Hornischer, D. Karas, A.E. Kel, O.V. Kel-Margoulis, D.U. Kloos, S. Land, B. Lewicki-Potapov, H. Michael, R. Münch, I. Reuter, S. Rotert, H. Saxel, M. Scheer, S. Thiele, and E. Wingender, “TRANSFAC: Transcriptional Regulation, from Patterns to Profiles,” Nucleic Acids Research, vol. 31, no. 1, pp. 374-378, 2003.
[40] E.S. Dimitrova, M.P.V. Licona, J. McGee, and R. Laubenbacher, “Discretization of Time Series Data,” J. Computational Biology, vol. 17, no. 6, pp. 853-868, Jan. 2010.
[41] K. Hakamada, T. Hanai, H. Honda, and T. Kobayashi, “A Preprocessing Method for Inferring Genetic Interaction from Gene Expression Data Using Boolean Algorithm,” J. Bioscience Bioeng., vol. 98, no. 6, pp. 457-63, Jan. 2004.
[42] O. Hirose, N. Nariai, Y. Tamada, and H. Bannai, “Estimating Gene Networks from Expression Data and Binding Location Data via Boolean Networks,” Proc. First Int'l Workshop Data Mining and Bioinformatics, pp. 349-356, 2005.
[43] D. Pe'er, A. Regev, G. Elidan, and N. Friedman, “Inferring Subnetworks from Perturbed Expression Profiles,” Bioinformatics, vol. 17, no. Suppl. 1, pp. S215-S224, Jan. 2001.
[44] B.D. Camillo, F. Sanchez-Cabo, G. Toffolo, S.K. Nair, Z. Trajanoski, and C. Cobelli, “A Quantization Method Based on Threshold Optimization for Microarray Short Time Series,” BMC Bioinformatics, vol. 6(Suppl. 4):S11, Dec. 2005.
[45] M. Sezgin and B. Sankur, “Survey over Image Thresholding Techniques and Quantitative Performance Evaluation,” J. Electronic Imaging, vol. 13, no. 1, pp. 146-165, 2004.
[46] C. Mircean, I. Tabus, and J. Astola, “Quantization and Distance Function Selection for Discrimination of Tumors Using Gene Expression Data,” Proc. SPIE, vol. 4623, no. 1, pp. 1-12, Jan. 2002.
[47] D. Lockhart and E. Winzeler, “Genomics, Gene Expression and DNA Arrays,” Nature, vol. 405, no. 6788, pp. 827-836, 2000.
[48] D. Allison, X. Cuia, G. Page, and M. Sabripour, “Microarray Data Analysis: From Disarray to Consolidation and Consensus,” Nature Rev. Genetics, vol. 7, no. 1, pp. 55-66, 2006.
[49] E.S. Lander, “Array of Hope,” Nature Genetics, vol. 21, no. 1, pp. 3-4, 1999.
[50] H.H. McAdams and A. Arkin, “It's a Noisy Business! Genetic Regulation at the Nanomolar Scale,” Trends in Genetics, vol. 15, no. 2, pp. 65-69, 1999.
[51] I. Shmulevich, E.R. Dougherty, S. Kim, and W. Zhang, “Probabilistic Boolean Networks: A Rule-Based Uncertainty Model for Gene Regulatory Networks,” Bioinformatics, vol. 18, no. 2, pp. 261-274, 2002.
[52] U. Braga-Neto, “Classification and Error Estimation for Discrete Data,” Current Genomics, vol. 10, no. 7, pp. 446-462, Jan. 2009.

Index Terms:
time series,binary sequences,biology computing,Boolean functions,genetics,inference mechanisms,microorganisms,molecular biophysics,yeast expression time series,multiscale binarization,gene expression data,Boolean networks,network inference algorithms,gene-regulatory systems,binary variables,binarization,network inference,Time series analysis,Time measurement,Approximation error,Gene expression,Complexity theory,Bioinformatics,Computational biology,reconstruction.,Binarization,gene-regulatory networks,Boolean networks
M. Hopfensitz, C. Mussel, C. Wawra, M. Maucher, M. Kuhl, H. Neumann, H. A. Kestler, "Multiscale Binarization of Gene Expression Data for Reconstructing Boolean Networks," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 9, no. 2, pp. 487-498, March-April 2012, doi:10.1109/TCBB.2011.62
Usage of this product signifies your acceptance of the Terms of Use.