The Community for Technology Leaders
RSS Icon
Issue No.04 - July/August (2011 vol.8)
pp: 929-942
Sanghamitra Bandyopadhyay , Indian Statistical Institute, Kolkata
Malay Bhattacharyya , Indian Statistical Institute, Kolkata
Two genes are said to be coexpressed if their expression levels have a similar spatial or temporal pattern. Ever since the profiling of gene microarrays has been in progress, computational modeling of coexpression has acquired a major focus. As a result, several similarity/distance measures have evolved over time to quantify coexpression similarity/dissimilarity between gene pairs. Of these, correlation coefficient has been established to be a suitable quantifier of pairwise coexpression. In general, correlation coefficient is good for symbolizing linear dependence, but not for nonlinear dependence. In spite of this drawback, it outperforms many other existing measures in modeling the dependency in biological data. In this paper, for the first time, we point out a significant weakness of the existing similarity/distance measures, including the standard correlation coefficient, in modeling pairwise coexpression of genes. A novel measure, called BioSim, which assumes values between -1 and +1 corresponding to negative and positive dependency and 0 for independency, is introduced. The computation of BioSim is based on the aggregation of stepwise relative angular deviation of the expression vectors considered. The proposed measure is analytically suitable for modeling coexpression as it accounts for the features of expression similarity, expression deviation and also the relative dependence. It is demonstrated how the proposed measure is better able to capture the degree of coexpression between a pair of genes as compared to several other existing ones. The efficacy of the measure is statistically analyzed by integrating it with several module-finding algorithms based on coexpression values and then applying it on synthetic and biological data. The annotation results of the coexpressed genes as obtained from gene ontology establish the significance of the introduced measure. By further extending the BioSim measure, it has been shown that one can effectively identify the variability in the expression patterns over multiple phenotypes. We have also extended BioSim to figure out pairwise differential expression pattern and coexpression dynamics. The significance of these studies is shown based on the analysis over several real-life data sets. The computation of the measure by focusing on stepwise time points also makes it effective to identify partially coexpressed genes. On the whole, we put forward a complete framework for coexpression analysis based on the BioSim measure.
Coexpression, gene similarity, correlation, gene ontology, differential expression.
Sanghamitra Bandyopadhyay, Malay Bhattacharyya, "A Biologically Inspired Measure for Coexpression Analysis", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.8, no. 4, pp. 929-942, July/August 2011, doi:10.1109/TCBB.2010.106
[1] D.B. Allison et al., “Microarray Data Analysis: From Discovery to Consolidation and Consensus,” Nature Rev. Genetics, vol. 7, pp. 55-65, 2006.
[2] S. Bandyopadhyay and M. Bhattacharyya, “Analyzing Mirna Co-Expression Networks to Explore Tf-miRNA Regulation,” BMC Bioinformatics, vol. 10, article no. 163, 2009.
[3] S. Bandyopadhyay and M. Bhattacharyya, “Mining the Largest Dense Vertexlet in a Weighted Scale-Free Graph,” Fundamental Informaticae, vol. 96, pp. 1-25, 2009.
[4] W.T. Barry et al., “A Statistical Framework for Testing Functional Categories in Microarray Data,” Ann. Applied Statistics, vol. 2, pp. 286-315, 2008.
[5] W.P. Bergsma, “On a New Correlation Coefficient, its Orthogonal Decomposition, and Associated Tests of Independence,” Proc. 25th Biennial Meeting of the Soc. Multivariate Analysis in the Behavioural Sciences, July 2006.
[6] S.Y. Chen et al., “Activation of $\beta$ -Catenin Signaling in Prostate Cancer by Peptidyl-Prolyl Isomerase Pin1-mediated Abrogation of the Androgen Receptor-$\beta$ -Catenin Interaction,” Molecular Cell Biology, vol. 26, pp. 929-939, 2006.
[7] R.J. Cho et al., “A Genome-Wide Transcriptional Analysis of the Mitotic Cell Cycle,” Molecular Cell, vol. 2, pp. 65-73, 1998.
[8] Y.J. Choi and C. Kendziorski, “Statistical Methods for Gene Set Co-Expression Analysis,” Bioinformatics, vol. 25, pp. 2780-2786, 2009.
[9] S. Chu et al., “The Transcriptional Program of Sporulation in Budding Yeast,” Science, vol. 282, pp. 699-705, 1998.
[10] P. Datta and S. Datta, “Evaluation of Clustering Algorithms for Gene Expression Data,” BMC Bioinformatics, vol. 7, article no. S17, 2006.
[11] A.M. DeMarzo et al., “Pathological and Molecular Aspects of Prostate Cancer,” Lancet, vol. 361, pp. 955-964, 2003.
[12] M. Dettling et al., “Searching for Differentially Expressed Gene Combinations,” Genome Biology, vol. 6, article no. R88, 2005.
[13] D. Dorsett et al., “Distant Liaisons: Long-Range Enhancer-Promoter Interactions in Drosophila,” Current Opinion in Genetics and Development, vol. 9, pp. 505-514, 1999.
[14] R.O. Duda et al., Pattern Classification, second ed. John Wiley & Sons, Inc., 2006.
[15] B. Efron, “The Jackknife, the Bootstrap, and Other Resampling Plans,” Proc. CBMS-NSF Regional Conf. Series in Applied Math., vol. 38, 1982.
[16] Y. Fukuoka et al., “Inter-Species Differences of Co-Expression of Neighboring Genes in Eukaryotic Genomes,” BMC Genomics, vol. 5, article no. 4, 2004.
[17] F.D. Gibbons and F.P. Roth, “Judging the Quality of Gene Expression-Based Clustering Methods Using Gene Annotation,” Genome Research, vol. 12, pp. 1574-1581, 2002.
[18] J. Handl et al., “Computational Cluster Validation in Post-Genomic Data Analysis,” Bioinformatics, vol. 21, pp. 3201-3212, 2005.
[19] J. Harrison, “Formal Verification of Floating Point Trigonometric Functions,” Proc. Third Int'l Conf. Formal Methods, pp. 254-270, 2000.
[20] I. Hedenfalk et al., “Gene-Expression Profiles in Hereditary Breast Cancer,” New England J. Medicine, vol. 344, pp. 539-548, 2001.
[21] L.J. Heyer et al., “Exploring Expression Data: Identification and Analysis of Co-Expressed Genes,” Genome Research, vol. 9, pp. 1106-1115, 1999.
[22] L.D. Hurst et al., “The Evolutionary Dynamics of Eukaryotic Gene Order,” Nature Rev. Genetics, vol. 5, pp. 299-310, 2004.
[23] H. Jin et al., “Protein Modifications as Potential Biomarkers in Breast Cancer,” Biomarker Insights, vol. 4, pp. 191-200, 2009.
[24] P. Kauraniemi et al., “MYB Oncogene Amplification in Hereditary BRCA1 Breast Cancer,” Cancer Research, vol. 60, pp. 5323-5328, 2000.
[25] Y. Lai et al., “A Statistical Method for Identifying Differential Gene-Gene Co-Expression Patterns,” Bioinformatics, vol. 20, pp. 3146-3155, 2004.
[26] A. Leiblich et al., “Lactate Dehydrogenase-B is Silenced by Promoter Hypermethylation in Human Prostate Cancer,” Oncogene, vol. 25, pp. 2953-2960, 2004.
[27] M.J. Lercher et al., “Coexpression of Neighboring Genes in Caenorhabditis Elegans, is Mostly Due to Operons and Duplicate Genes,” Genome Research, vol. 13, pp. 238-243, 2003.
[28] K. Li, “Genome-wide Co-Expression Dynamics: Theory and Application,” Proc. Nat'l Academy of Sciences USA, vol. 99, pp. 16875-16880, 2002.
[29] T.W. Liao, “Clustering of Time Series Data—A Survey,” Pattern Recognition, vol. 38, pp. 1857-1874, 2005.
[30] S.Y. Lin, “$\beta$ -Catenin, a Novel Prognostic Marker for Breast Cancer: Its Roles in Cyclin D1 Expression and Cancer Progression.,” Proc. Nat'l Academy of Sciences USA, vol. 97, pp. 4262-4266, 2000.
[31] A. Marco et al., “Relationship between Gene Co-Expression and Sharing of Transcription Factor Binding Sites in Drosophila Melanogaster,” Bioinformatics, vol. 25, pp. 2473-2477, 2009.
[32] P. Michalak, “Coexpression, Coregulation, and Cofunctionality of Neighboring Genes in Eukaryotic Genomics,” Genomics, vol. 91, pp. 243-248, 2008.
[33] Y.K. Ng et al., “Positive Correlation between Gene Co-Expression and Positional Clustering in the Zebrafish Genome,” BMC Genomics, vol. 10, article no. 42, 2009.
[34] S. Oba et al., “A Bayesian Missing Value Estimation Method for Gene Expression Profile Data,” Bioinformatics, vol. 19, pp. 2088-2096, 2003.
[35] W. Pan, “A Comparative Review of Statistical Methods for Discovering Differentially Expressed Genes in Replicated Microarray Experiments,” Bioinformatics, vol. 18, pp. 546-554, 2002.
[36] A. Reverter and E.K.F. Chan, “Combining Partial Correlation and an Information Theory Approach to the Reversed Engineering of Gene Co-Expression Networks,” Bioinformatics, vol. 24, pp. 2491-2497, 2008.
[37] M. Sémon and L. Duret, “Evolutionary Origin and Maintenance of Co-Expressed Gene Clusters in Mammals,” Molecular Biology Evolution, vol. 23, pp. 1715-1723, 2006.
[38] D. Sproul et al., “The Role of Chromatin Structure in Regulating the Expression of Clustered Genes,” Nature Rev. Genetics, vol. 6, pp. 775-781, 2005.
[39] H.H. Thygesen and A.H. Zwinderman, “Modelling the Correlation between the Activities of Adjacent Genes in Drosophila,” BMC Bioinformatics, vol. 6, article no. 10, 2005.
[40] N.D. Trinklein et al., “An Abundance of Bidirectional Promoters in the Human Genome,” Genome Research, vol. 14, pp. 62-66, 2004.
[41] K.H. Vousden et al. “Live or Let Die: The Cells Response to P53,” Nature Rev. Cancer, vol. 2, pp. 594-604, 2002.
[42] Y. Zhang et al., “A Time-Series Biclustering Algorithm for Revealing Co-Regulated Genes,” Proc. IEEE Int'l Conf. Information and Technology: Coding and Computing, pp. 32-37, 2005.
17 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool