The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.12 - December (2010 vol.21)
pp: 1721-1733
Jaroslaw Zola , Iowa State University, Ames
Maneesha Aluru , Iowa State University, Ames
Abhinav Sarje , Iowa State University, Ames
Srinivas Aluru , Iowa State University, Ames and Indian Institute of Technology Bombay, India
ABSTRACT
Constructing genome-wide gene regulatory networks from large-scale gene expression data is an important problem in systems biology. While several techniques have been developed, none of them is parallel, and they do not scale to the whole genome level or incorporate the largest data sets, particularly with rigorous statistical techniques. In this paper, we present a parallel method integrating mutual information, data processing inequality, and statistical testing to detect significant dependencies between genes, and efficiently exploit parallelism inherent in such computations. We present a new method to carry out permutation testing for assessing statistical significance of interactions, while reducing its computational complexity by a factor of \Theta (n^2), where n is the number of genes. Using both synthetic and known regulatory networks, we show that our method produces networks of quality similar to ARACNe, a widely used mutual-information-based method. We further explore the use of accelerators for gene network construction by presenting a parallelization on a cluster of IBM Cell blades. We exploit parallelization across multiple Cells, multiple cores within each Cell, and vector units within the cores to develop a high-performance implementation that effectively addresses the scaling problem. We report the first inference of a plant whole genome network by constructing a 15,222 gene network of the plant Arabidopsis thaliana from 3,137 microarray experiments in 30 minutes on a 2,048-CPU IBM Blue Gene/L, and in 2 hours and 25 minutes on a 8-node Cell blade cluster.
INDEX TERMS
Parallel algorithms, biology and genetics.
CITATION
Jaroslaw Zola, Maneesha Aluru, Abhinav Sarje, Srinivas Aluru, "Parallel Information-Theory-Based Construction of Genome-Wide Gene Regulatory Networks", IEEE Transactions on Parallel & Distributed Systems, vol.21, no. 12, pp. 1721-1733, December 2010, doi:10.1109/TPDS.2010.59
REFERENCES
[1] X. Zhu, M. Gerstein, and M. Snyder, "Getting Connected: Analysis and Principles of Biological Networks," Genes & Development, vol. 21, no. 9, pp. 1010-1024, 2007.
[2] "The Chipping Forecast II," Nature Genetics, Special Supplement, 2002.
[3] T. Torres et al., "Gene Expression Profiling by Massively Parallel Sequencing," Genome Research, vol. 18, no. 1, pp. 172-177, 2008.
[4] A. Butte and I. Kohane, "Unsupervised Knowledge Discovery in Medical Databases Using Relevance Networks," Proc. Am. Medical Informatics Assoc. Symp., pp. 711-715, 1999.
[5] P. D'haeseleer et al., "Mining the Gene Expression Matrix: Inferring Gene Relationships from Large Scale Gene Expression Data," Information Processing in Cells and Tissues, Springer, 1998.
[6] A. de la Fuente et al., "Discovery of Meaningful Associations in Genomic Data Using Partial Correlation Coefficients," Bioinformatics, vol. 20, no. 18, pp. 3565-3574, 2004.
[7] J. Schafer and K. Strimmer, "An Empirical Bayes Approach to Inferring Large-Scale Gene Association Networks," Bioinformatics, vol. 21, no. 6, pp. 754-764, 2005.
[8] K. Basso et al., "Reverse Engineering of Regulatory Networks in Human B Cells," Nature Genetics, vol. 37, no. 4, pp. 382-390, 2005.
[9] N. Friedman et al., "Using Bayesian Networks to Analyze Expression Data," J. Computational Biology, vol. 7, pp. 601-620, 2000.
[10] H. Yu et al., "Using Bayesian Network Inference Algorithms to Recover Molecular Genetic Regulatory Networks," Proc. Int'l Conf. Systems Biology, 2002.
[11] C. Daub et al., "Estimating Mutual Information Using B-spline Functions—an Improved Similarity Measure for Analysing Gene Expression Data," BMC Bioinformatics, vol. 5, p. 118, 2004.
[12] A. Hartemink, "Reverse Engineering Gene Regulatory Networks," Nature Biotechnology, vol. 23, no. 5, pp. 554-555, 2005.
[13] S. Ma, Q. Gong, and H. Bohnert, "An Arabidopsis Gene Network Based on the Graphical Gaussian Model," Genome Research, vol. 17, no. 11, pp. 1614-1625, 2007.
[14] A. Butte and I. Kohane, "Mutual Information Relevance Networks: Functional Genomic Clustering Using Pairwise Entropy Measurements," Proc. Pacific Symp. Biocomputing, pp. 418-429, 2000.
[15] T. Cover and J. Thomas, Elements of Information Theory, second ed. Wiley, 2006.
[16] "EMBL-EBI ArrayExpress," http://www.ebi.ac.uk/microarray-asaer/, 2009.
[17] "NCBI Gene Expression Omnibus," http://www.ncbi.nlm.nih. govgeo/, 2009.
[18] "NASC European Arabidopsis Stock Centre," http:/www. arabidopsis.info/, 2009.
[19] E. Schneidman et al., "Network Information and Connected Correlations," Physical Rev. Letters, vol. 91, no. 23, pp. 238701-1-238701-4, 2003.
[20] I. Nemenman, "Information Theory, Multivariate Dependence, and Genetic Network Inference," 2004.
[21] T. Schreiber, "Measuring Information Transfer," Physical Rev. Letters, vol. 85, no. 2, pp. 461-464, 2000.
[22] S. Khan et al., "Relative Performance of Mutual Information Estimation Methods for Quantifying the Dependence among Short and Noisy Data," Physical Rev. E, vol. 76, no. 2, p. 026209, 2007.
[23] Y. Moon, B. Rajagopalan, and U. Lall, "Estimation of Mutual Information Using Kernel Density Estimators," Physical Rev. E, vol. 52, no. 3, pp. 2318-2321, 1995.
[24] A. Kraskov, H. Stogbauer, and P. Grassberger, "Estimating Mutual Information," Physical Rev. E, vol. 69, no. 6, p. 066138, 2004.
[25] T. van den Bulcke et al., "SynTReN: A Generator of Synthetic Gene Expression Data for Design and Analysis of Structure Learning Algorithms," BMC Bioinformatics, vol. 7, p. 43, 2006.
[26] S. Hoops et al., "COPASI—a Complex Pathway Simulator," Bioinformatics, vol. 22, no. 24, pp. 3067-3074, 2006.
[27] J. Long and M. Roth, "Synthetic Microarray Data Generation with RANGE and NEMO," Bioinformatics, vol. 24, no. 1, pp. 132-134, 2008.
[28] S. Palaniswamy et al., "AGRIS and AtRegNet. A Platform to Link Cis-Regulatory Elements and Transcription Factors into Regulatory Networks," Plant Physiology, vol. 140, no. 3, pp. 818-829, 2006.
[29] C. de Boor, A Practical Guide to Splines. Springer-Verlag, 1978.
[30] "Performance Application Programming Interface," http://icl. cs.utk.edupapi/, 2009.
[31] T. Chen et al., "Cell Broadband Engine Architecture and Its First Implementation," http://www.ibm.com/developerworks/power/ librarypa-cellperf/, 2009.
[32] "Cell Broadband Engine Resource Center," http://www.ibm. com/developerworks/power cell/, 2009.
[33] "Open MPI," http:/www.open-mpi.org/, 2009.
[34] "MPICH2," http://www.mcs.anl.gov/research/projects mpich2/, 2009.
[35] Monte Carlo Library API Reference, IBM, 2008.
[36] E. Luttmann et al., "Accelerating Molecular Dynamic Simulation on the Cell Processor and Playstation 3," J. Computational Chemistry, vol. 30, no. 2, pp. 268-274, 2009.
[37] "TAIR," http:/www.arabidopsis.org/, 2009.
[38] R.A. Irizarry et al., "Multiple-Laboratory Comparison of Microarray Platforms," Nature Methods, vol. 2, pp. 345-350, 2005.
[39] B.M. Bolstad et al., "A Comparison of Normalization Methods for High Density Oligonucleotide Array Data Based on Variance and Bias," Bioinformatics, vol. 19, no. 2, pp. 185-193, 2003.
[40] E. Alm and A. Arkin, "Biological Networks," Current Opinion in Structural Biology, vol. 13, pp. 193-202, 2003.
17 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool