Issue No. 12 - December (2010 vol. 21)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TPDS.2010.59
Jaroslaw Zola , Iowa State University, Ames
Maneesha Aluru , Iowa State University, Ames
Abhinav Sarje , Iowa State University, Ames
Srinivas Aluru , Iowa State University, Ames and Indian Institute of Technology Bombay, India
Constructing genome-wide gene regulatory networks from large-scale gene expression data is an important problem in systems biology. While several techniques have been developed, none of them is parallel, and they do not scale to the whole genome level or incorporate the largest data sets, particularly with rigorous statistical techniques. In this paper, we present a parallel method integrating mutual information, data processing inequality, and statistical testing to detect significant dependencies between genes, and efficiently exploit parallelism inherent in such computations. We present a new method to carry out permutation testing for assessing statistical significance of interactions, while reducing its computational complexity by a factor of \Theta (n^2), where n is the number of genes. Using both synthetic and known regulatory networks, we show that our method produces networks of quality similar to ARACNe, a widely used mutual-information-based method. We further explore the use of accelerators for gene network construction by presenting a parallelization on a cluster of IBM Cell blades. We exploit parallelization across multiple Cells, multiple cores within each Cell, and vector units within the cores to develop a high-performance implementation that effectively addresses the scaling problem. We report the first inference of a plant whole genome network by constructing a 15,222 gene network of the plant Arabidopsis thaliana from 3,137 microarray experiments in 30 minutes on a 2,048-CPU IBM Blue Gene/L, and in 2 hours and 25 minutes on a 8-node Cell blade cluster.
Parallel algorithms, biology and genetics.
Jaroslaw Zola, Maneesha Aluru, Abhinav Sarje, Srinivas Aluru, "Parallel Information-Theory-Based Construction of Genome-Wide Gene Regulatory Networks", IEEE Transactions on Parallel & Distributed Systems, vol. 21, no. , pp. 1721-1733, December 2010, doi:10.1109/TPDS.2010.59