Issue No. 04 - July/August (2011 vol. 8)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TCBB.2010.106
Sanghamitra Bandyopadhyay , Indian Statistical Institute, Kolkata
Malay Bhattacharyya , Indian Statistical Institute, Kolkata
Two genes are said to be coexpressed if their expression levels have a similar spatial or temporal pattern. Ever since the profiling of gene microarrays has been in progress, computational modeling of coexpression has acquired a major focus. As a result, several similarity/distance measures have evolved over time to quantify coexpression similarity/dissimilarity between gene pairs. Of these, correlation coefficient has been established to be a suitable quantifier of pairwise coexpression. In general, correlation coefficient is good for symbolizing linear dependence, but not for nonlinear dependence. In spite of this drawback, it outperforms many other existing measures in modeling the dependency in biological data. In this paper, for the first time, we point out a significant weakness of the existing similarity/distance measures, including the standard correlation coefficient, in modeling pairwise coexpression of genes. A novel measure, called BioSim, which assumes values between -1 and +1 corresponding to negative and positive dependency and 0 for independency, is introduced. The computation of BioSim is based on the aggregation of stepwise relative angular deviation of the expression vectors considered. The proposed measure is analytically suitable for modeling coexpression as it accounts for the features of expression similarity, expression deviation and also the relative dependence. It is demonstrated how the proposed measure is better able to capture the degree of coexpression between a pair of genes as compared to several other existing ones. The efficacy of the measure is statistically analyzed by integrating it with several module-finding algorithms based on coexpression values and then applying it on synthetic and biological data. The annotation results of the coexpressed genes as obtained from gene ontology establish the significance of the introduced measure. By further extending the BioSim measure, it has been shown that one can effectively identify the variability in the expression patterns over multiple phenotypes. We have also extended BioSim to figure out pairwise differential expression pattern and coexpression dynamics. The significance of these studies is shown based on the analysis over several real-life data sets. The computation of the measure by focusing on stepwise time points also makes it effective to identify partially coexpressed genes. On the whole, we put forward a complete framework for coexpression analysis based on the BioSim measure.
Coexpression, gene similarity, correlation, gene ontology, differential expression.
Sanghamitra Bandyopadhyay, Malay Bhattacharyya, "A Biologically Inspired Measure for Coexpression Analysis", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 8, no. , pp. 929-942, July/August 2011, doi:10.1109/TCBB.2010.106