Publication 2010 Issue No. 1 - January-March Abstract - A Trade-Off between Sample Complexity and Computational Complexity in Learning Boolean Networks from Time-Series Data
 This Article Share Bibliographic References Add to: Digg Furl Spurl Blink Simpy Google Del.icio.us Y!MyWeb Search Similar Articles Articles by Theodore J. Perkins Articles by Michael T. Hallett
A Trade-Off between Sample Complexity and Computational Complexity in Learning Boolean Networks from Time-Series Data
January-March 2010 (vol. 7 no. 1)
pp. 118-125
 ASCII Text x Theodore J. Perkins, Michael T. Hallett, "A Trade-Off between Sample Complexity and Computational Complexity in Learning Boolean Networks from Time-Series Data," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 7, no. 1, pp. 118-125, January-March, 2010.
 BibTex x @article{ 10.1109/TCBB.2008.38,author = {Theodore J. Perkins and Michael T. Hallett},title = {A Trade-Off between Sample Complexity and Computational Complexity in Learning Boolean Networks from Time-Series Data},journal ={IEEE/ACM Transactions on Computational Biology and Bioinformatics},volume = {7},number = {1},issn = {1545-5963},year = {2010},pages = {118-125},doi = {http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.38},publisher = {IEEE Computer Society},address = {Los Alamitos, CA, USA},}
 RefWorks Procite/RefMan/Endnote x TY - JOURJO - IEEE/ACM Transactions on Computational Biology and BioinformaticsTI - A Trade-Off between Sample Complexity and Computational Complexity in Learning Boolean Networks from Time-Series DataIS - 1SN - 1545-5963SP118EP125EPD - 118-125A1 - Theodore J. Perkins, A1 - Michael T. Hallett, PY - 2010KW - Machine learningKW - parameter learningKW - time-series analysisKW - feature extraction or constructionKW - clusteringKW - classificationKW - association rules.VL - 7JA - IEEE/ACM Transactions on Computational Biology and BioinformaticsER -
Theodore J. Perkins, McGill University, Montreal
Michael T. Hallett, McGill University, Montreal
A key problem in molecular biology is to infer regulatory relationships between genes from expression data. This paper studies a simplified model of such inference problems in which one or more Boolean variables, modeling, for example, the expression levels of genes, each depend deterministically on a small but unknown subset of a large number of Boolean input variables. Our model assumes that the expression data comprises a time series, in which successive samples may be correlated. We provide bounds on the expected amount of data needed to infer the correct relationships between output and input variables. These bounds improve and generalize previous results for Boolean network inference and continuous-time switching network inference. Although the computational problem is intractable in general, we describe a fixed-parameter tractable algorithm that is guaranteed to provide at least a partial solution to the problem. Most interestingly, both the sample complexity and computational complexity of the problem depend on the strength of correlations between successive samples in the time series but in opposing ways. Uncorrelated samples minimize the total number of samples needed while maximizing computational complexity; a strong correlation between successive samples has the opposite effect. This observation has implications for the design of experiments for measuring gene expression.

[1] B. Alberts, A. Johnson, J. Lewis, M. Raff, K. Roberts, and P. Walter, Molecular Biology of the Cell, fourth ed. Garland Publishing, 2002.
[2] S.A. Kauffman, "Metabolic Stability and Epigenesis in Randomly Constructed Genetic Nets," J. Theoretical Biology, vol. 22, pp. 437-467, 1969.
[3] S.A. Kauffman, The Origins of Order: Self-Organization and Selection in Evolution. Oxford Univ. Press, 1993.
[4] L. Glass, "Combinatorial and Topological Methods in Nonlinear Chemical Kinetics," J. Chemical Physics, vol. 63, no. 4, pp. 1325-1335, 1975.
[5] T. Akutsu and F. Bao, "Approximating Minimum Keys and Optimal Substructure Screens," Proc. Second Ann. Int'l Conf. Computing and Combinatorics (COCOON '96), pp. 290-299, 1996.
[6] A. Blum and P. Langley, "Selection of Relevant Features and Examples in Machine Learning," Artificial Intelligence, vol. 97, pp. 245-272, 1997.
[7] T. Akutsu, S. Miyano, and S. Kuhara, "Identification of Genetic Networks from a Small Number of Gene Expression Patterns under the Boolean Network Model," Proc. Pacific Symp. Biocomputing (PSB '99), pp. 17-28, 1999.
[8] J. Arpe and R. Reischuk, "Robust Inference of Relevant Attributes," Proc. 14th Int'l Conf. Algorithmic Learning Theory (ALT '03), pp. 99-113, 2003.
[9] E. Mossel, R. O'Donnell, and R.A. Servedio, "Learning Functions of $k$ Relevant Variables," J. Computer and System Sciences, vol. 69, pp. 421-434, 2004.
[10] D. Fukagawa and T. Akutsu, "Performance Analysis of a Greedy Algorithm for Inferring Boolean Functions," Information Processing Letters, vol. 93, pp. 7-12, 2005.
[11] B. Krupa, "On the Number of Experiments Required to Find the Causal Structure of Complex Systems," J. Theoretical Biology, vol. 219, pp. 257-267, 2002.
[12] J. Chen, B. Chor, M. Fellows, X. Huang, D. Juedes, I. Kanj, and G. Xia, "Tight Lower Bounds for Certain Parameterized NP-Hard Problems," Proc. 19th IEEE Ann. Conf. Computational Complexity (CCC '04), pp. 150-160, 2004.
[13] P.L. Bartlett, P. Fischer, and K.-U. Höffgen, "Exploiting Random Walks for Learning," Proc. ACM Conf. Computational Learning Theory (COLT '94), pp. 318-327, 1994.
[14] N.H. Bshouty, E. Mossel, R. O'Donnell, and R.A. Servedio, "Learning DNF from Random Walks," J. Computer and System Sciences, vol. 71, pp. 250-265, 2005.
[15] W. Just, "Reverse Engineering Discrete Dynamical Systems from Data Sets with Random Input Vectors," J. Computational Biology, vol. 13, no. 8, pp. 1435-1456, 2006.
[16] T.J. Perkins, M.T. Hallett, and L. Glass, "Inferring Models of Gene Expression Dynamics," J. Theoretical Biology, vol. 230, pp. 289-299, 2004.
[17] T.J. Perkins, M. Hallett, and L. Glass, "Dynamical Properties of Model Gene Networks and Implications for the Inverse Problem," BioSystems, vol. 84, no. 2, pp. 115-123, 2006.
[18] T.E. Ideker, V. Thorsson, and R.M. Karp, "Discovery of Regulatory Interactions through Perturbation: Inference and Experimental Design," Proc. Pacific Symp. Biocomputing (PSB '00), pp. 302-313, 2000.
[19] S. Liang, S. Fuhrman, and R. Somogyi, "REVEAL, A General Reverse-Engineering Algorithm for Inference of Genetic Network Architectures," Proc. Pacific Symp. Biocomputing (PSB '98), pp. 18-29, 1998.
[20] J. Chen, A. Kanj, and W. Jia, "Vertex Cover: Further Observations and Further Improvements," J. Algorithms, vol. 41, pp. 280-301, 2001.
[21] R. Niedermeier and P. Rossmanith, "An Efficient Fixed Parameter Algorithm for 3-Hitting Set," J. Discrete Algorithms, vol. 1, no. 1, pp. 89-102, 1999.
[22] Y. Setty, A.E. Mayo, M.G. Surette, and U. Alon, "Detailed Map of a Cis-Regulatory Input Function," Proc. Nat'l Academy of Sciences USA, vol. 100, no. 13, pp. 7702-7707, 2003.
[23] N. Rosenfeld, J.W. Young, U. Alon, P.S. Swain, and M.B. Elowitz, "Gene Regulation at the Single-Cell Level," Science, vol. 307, pp. 1962-1965, 2005.
[24] S.E. Baranzini, P. Mousavi, J. Rio, S.J. Caillier, A. Stillman, P. Villoslada, M.M. Wyatt, M. Comabella, L.D. Greller, R. Somogyi, X. Montalban, and J.R. Oksenberg, "Transcription-Based Prediction of Response to ${\rm IFN}\beta$ Using Supervised Computational Methods," PLoS Biology, vol. 3, p. e2, 2005.
[25] H. Toivonen, "Sampling Large Databases for Association Rules," Proc. 22nd Int'l Conf. Very Large Data Bases (VLDB), 1996.

Index Terms:
Machine learning, parameter learning, time-series analysis, feature extraction or construction, clustering, classification, association rules.
Citation:
Theodore J. Perkins, Michael T. Hallett, "A Trade-Off between Sample Complexity and Computational Complexity in Learning Boolean Networks from Time-Series Data," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 7, no. 1, pp. 118-125, Jan.-March 2010, doi:10.1109/TCBB.2008.38