This Article 
 Bibliographic References 
 Add to: 
Correlation between Gene Expression and GO Semantic Similarity
October-December 2005 (vol. 2 no. 4)
pp. 330-338

Abstract—This research analyzes some aspects of the relationship between gene expression, gene function, and gene annotation. Many recent studies are implicitly based on the assumption that gene products that are biologically and functionally related would maintain this similarity both in their expression profiles as well as in their Gene Ontology (GO) annotation. We analyze how accurate this assumption proves to be using real publicly available data. We also aim to validate a measure of semantic similarity for GO annotation. We use the Pearson correlation coefficient and its absolute value as a measure of similarity between expression profiles of gene products. We explore a number of semantic similarity measures (Resnik, Jiang, and Lin) and compute the similarity between gene products annotated using the GO. Finally, we compute correlation coefficients to compare gene expression similarity against GO semantic similarity. Our results suggest that the Resnik similarity measure outperforms the others and seems better suited for use in Gene Ontology. We also deduce that there seems to be correlation between semantic similarity in the GO annotation and gene expression for the three GO ontologies. We show that this correlation is negligible up to a certain semantic similarity value; then, for higher similarity values, the relationship trend becomes almost linear. These results can be used to augment the knowledge provided by clustering algorithms and in the development of bioinformatic tools for finding and characterizing gene products.

[1] M.B. Eisen, P.T. Spellman, P.O. Brown, and D. Botstein, “Cluster Analysis and Display of Genome-Wide Expression Patterns,” Proc. Nat'l Academy of Science USA, vol. 95, pp. 14863-14868, 1998.
[2] J. Khan, L.H. Saal, M.L. Bittner, Y. Chen, J.M. Trent, and P.S. Meltzer, “Expression Profiling in Cancer Using cDNA Microarrays,” Electrophoresis, vol. 20, pp. 223-229, 1999.
[3] C. Debouck and P.N. Goodfellow, “DNA Microarrays in Drug Discovery and Development,” Nature Genetics, vol. 21, no. 1, pp. 48-50, 1999.
[4] J. Tamames, D. Clark, J. Herrero, J. Dopazo, C. Blaschke, J.M. Fernandez, J.C. Oliveros, and A. Valencia, “Bioinformatics Methods for the Analysis of Expression Arrays: Data Clustering and Information Extraction,” J. Biotechnology, vol. 98, pp. 269-283, 2002.
[5] E. Camon, M. Magrane, D. Barrell, D. Binns, W. Fleischmann, P. Kersey, N. Mulder, T. Oinn, and R. Apweiler, “The Gene Ontology Annotation (GOA) Project: Implementation of GO in SWISS-PROT, TrEMBL and InterPro,” Genome Research, vol. 13, pp. 666-672, 2002.
[6] The Gene Ontology Consortium, “Creating the Gene Ontology Resource: Design and Implementation,” Genome Research, vol. 11, pp. 1425-1433, 2001.
[7] P.W. Lord, R.D. Stevens, A. Brass, and C.A. Goble, “Investigating Semantic Similarity Measures across the Gene Ontology: The Relationship between Sequence and Annotation,” Bioinformatics, vol. 19, pp. 1275-1283, 2003.
[8] F.M. Couto, M.J. Silva, and P. Coutinho, “Implementation of a Functional Semantic Similarity Measure between Gene-Products,” technical report, Dept. of Informatics, Faculty of Sciences, Univ. of Lisbon, http://www.di.fc.ul.pttech-reports, 2003.
[9] F. Azuaje and O. Bodenreider, “Incorporating Ontology-Driven Similarity Knowledge into Functional Genomics: An Exploratory Study,” Proc. IEEE Fourth Symp. Bioinformatics and Bioeng. (BIBE 2004), 2004.
[10] A.I. Su, M.P. Cooke, K.A. Ching, Y. Hakak, J.R. Walker, T. Wiltshire, A.P. Orth, R.G. Vega, L.M. Sapinoso, A. Moqrich, A. Patapoutian, G.M. Hampton, P.G. Schultz, and J.B. Hogenesch, “Large-Scale Analysis of the Human and Mouse Transcriptomes,” Proc. Nat'l Academy of Science, vol. 99, pp. 4465-4470, 2002.
[11] H.K. Lee, A.K. Hsu, J. Sajdak, J. Qin, and P. Pavlidis, “Coexpression Analysis of Human Genes across Many Microarray Data Sets,” Genome Research, vol. 14, pp. 1085-1094, 2004.
[12] J.M. Stuart, E. Segal, D. Koller, and S.K. Kim, “A Gene-Coexpression Network for Global Discovery of Conserved Genetic Modules,” Science, vol. 302, no. 5643, pp. 249-255, 2003.
[13] L.A. Martínez-Cruz, A. Rubio, M.L. Martínez-Chantar, A. Labarga, I. Barrio, A. Podhorski, V. Segura, J.L. Sevilla, M.A. Avila, and J.M. Mato, “GARBAN: Genomic Analysis and Rapid Biological Annotation of cDNA Microarray and Proteomic Data,” Bioinformatics, vol. 19, pp. 2158-2160, 2003.
[14] F. Al-Shahrour, R. Díaz-Uriarte, and J. Dopazo, “FatiGO: A Web Tool for Finding Significant Associations of Gene Ontology Terms with Groups of Genes,” Bioinformatics, vol. 20, pp. 578-580, 2004.
[15] B. Zhang, D. Schmoyer, S. Kirov, and J. Snoddy, “GOTree Machine (GOTM): A Web-Based Platform for Interpreting Sets of Interesting Genes Using Gene Ontology Hierarchies,” BMC Bioinformatics, pp. 5-16, 2004.
[16] A. Budanitsky and G. Hirst, “Semantic Distance in Wordnet: An Experimental, Application-Oriented Evaluation of Five Measures,” Proc. Workshop WordNet and Other Lexical Resources, Second Meeting of the North Am. Chapter of the Assoc. for Computational Linguistics, 2001.
[17] P.W. Lord, R.D. Stevens, A. Brass, and C.A. Goble, “Semantic Similarity Measures as Tools for Exploring the Gene Ontology,” Proc. Pacific Symp. Biocomputing, vol. 8, pp. 601-612, 2003.
[18] R. Rada, H. Mili, E. Bicknell, and M. Blettner, “Development and Application of a Metric On Semantic Nets,” IEEE Trans. Systems, Man, and Cybernetics, vol. 1, pp. 17-30, 1989.
[19] P. Resnik, “Semantic Similarity in a Taxonomy: An Information Based Measure and Its Application to Problems of Ambiguity in Natural Language,” J. Artificial Intelligence Research, vol. 11, pp. 95-130, 1999.
[20] R. Richardson and A.F. Smeaton, “Using WordNet in a Knowledge-Based Approach to Information Retrieval,” Working Paper, CA-0395, School of Computer Applications, Dublin City Univ., Ireland, 1995.
[21] G. Miller, “WordNet: An On-Line Lexical Database,” Int'l J. Lexicography, vol. 3, 1990.
[22] P. Resnik, “Using Information Content to Evaluate Semantic Similarity in a Taxonomy,” Proc. 14th Int'l Joint Conf. Artificial Intelligence, pp. 448-453, 1995.
[23] J.J. Jiang and D.W. Conrath, “Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy,” Proc. Int'l Conf. Research in Computational Linguistics, ROCLING X, 1997.
[24] D. Lin, “An Information-Theoretic Definition of Similarity,” Proc. 15th Int'l Conf. Machine Learning, pp. 296-304, 1998.
[25] S. Ross, A First Course in Probability. Macmillan, 1976.
[26] G.A. Miller and W.G. Charles, “Contextual Correlates of Semantic Similarity,” Language and Cognitive Processes, vol. 6, pp. 1-28, 1991.
[27] M. Wills-Karp and S.L. Ewar, “Time to Draw Breath: Asthma Susceptibility Genes Are Identified,” Nature Rev. Genetics 5, pp. 376-387, 2004.
[28] T. Golub et al., “Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring,” Science, vol. 286, pp. 531-537, 1999.

Index Terms:
Expression analysis, gene ontology, semantic similarity.
Jos? L. Sevilla, V?ctor Segura, Adam Podhorski, Elizabeth Guruceaga, Jos? M. Mato, Luis A. Mart?nez-Cruz, Fernando J. Corrales, Angel Rubio, "Correlation between Gene Expression and GO Semantic Similarity," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 2, no. 4, pp. 330-338, Oct.-Dec. 2005, doi:10.1109/TCBB.2005.50
Usage of this product signifies your acceptance of the Terms of Use.