This Article 
 Bibliographic References 
 Add to: 
Fuzzy Measures on the Gene Ontology for Gene Product Similarity
July-September 2006 (vol. 3 no. 3)
pp. 263-274
One of the most important objects in bioinformatics is a gene product (protein or RNA). For many gene products, functional information is summarized in a set of Gene Ontology (GO) annotations. For these genes, it is reasonable to include similarity measures based on the terms found in the GO or other taxonomy. In this paper, we introduce several novel measures for computing the similarity of two gene products annotated with GO terms. The fuzzy measure similarity (FMS) has the advantage that it takes into consideration the context of both complete sets of annotation terms when computing the similarity between two gene products. When the two gene products are not annotated by common taxonomy terms, we propose a method that avoids a zero similarity result. To account for the variations in the annotation reliability, we propose a similarity measure based on the Choquet integral. These similarity measures provide extra tools for the biologist in search of functional information for gene products. The initial testing on a group of 194 sequences representing three proteins families shows a higher correlation of the FMS and Choquet similarities to the BLAST sequence similarities than the traditional similarity measures such as pairwise average or pairwise maximum.

[1] P.W. Lord, R.D. Stevens, A. Brass, and C.A. Goble, “Semantic Similarity Measure as a Tool for Exploring the Gene Ontology,” Proc. Pacific Symp. Biocomputing, pp. 601-612, 2003.
[2] S. Raychaduri and R.B. Altman, “A Literature-Based Method for Assessing the Functional Coherence of a Gene Group,” Bioinformatics, vol. 19, no. 3, pp. 396-401, Feb. 2003.
[3] Fuzzy Measures and Integrals: Theory and Applications, M. Grabisch, et al., eds. Springer-Verlag, 2000.
[4] M. Sugeno, “Fuzzy Measures and Fuzzy Integrals— A Survey,” Fuzzy Automata and Decision Processes, pp. 89-102, 1977.
[5] J. Keller, P. Gader, and A.K. Hocaoglu, “Fuzzy Integrals in Image Processing and Recognition,” Fuzzy Measures and Integrals: Theory and Applications, pp. 435-466, 2000.
[6] P.W. Lord, R.D. Stevens, A. Brass, and C.A. Goble, “Investigating Semantic Similarity Measures across the Gene Ontology: The Relation between Sequence and Annotation,” Bioinformatics, vol. 19, no. 10, pp. 1275-1283, 2003.
[7] N. Speer, C. Spieth, and A. Zell, “A Memetic Clustering Algorithm for the Functional Partition of Genes Based on the Gene Ontology,” Proc. 2004 IEEE Symp. Computational Intelligence in Bioinformatics and Computational Biology (CIBCB 2004), Oct. 2004.
[8] J.J. Jiang and D.W. Conrath, “Semantic Similarity Based on Corpus Statistics and Lexical Ontology,” Proc. Int'l Conf. Research on Computer Linguistics X, 1997.
[9] S.L. Cao, L. Qin, W. He, Y. Zhong, Y. Zhu, and Y. Li, “Semantic Search Among Heterogeneous Databases Based on Gene Ontology,” Acta Biochemistry Biophysics Sinica, vol. 36, no. 5, pp. 365-370, 2004.
[10] P. Resnik, “Semantic Similarity in a Taxonomy: An Information-Base Measure and Its Application to Problems of Ambiguity in Natural Language,” J. Artificial Intelligence Research (JAIR), vol. 11, pp. 95-130, 1999.
[11] P. Ganesan, H. Garcia-Molina, and J. Widom, “Exploiting Hierarchical Domain Structure to Compute Similarity,” ACM Trans. Information Systems, vol. 21, no. 1, pp. 64-93, Jan. 2003.
[12] J. Ontrup, T. Nattkemper, O. Gerstung, and H. Ritter, “A MeSH Term Based Distance Measure for Document Retrieval and Labeling Assistance,” Proc. 25th Ann. Int'l Conf. IEEE Eng. in Medical and Biological Societies (EMBC 2003), Sept. 2003.
[13] C.D. Manning and H. Schutze, Foundations of Statistical Natural Language Processing. MIT Press, 2001.
[14] R. Kosala and H. Blockeel, “Web Mining Research: A Survey,” ACM SIGKDD Explorations Newsletter, vol. 2, no. 1, June 2000.
[15] K.F. Aoki, A. Yamaguchi, Y. Okuno, T. Akutsu, N. Ueda, M. Kanehisa, and H. Mamitsuka, “Efficient Tree-Matching Methods for Accurate Carbohydrate Database Queries,” Genome Informatics, vol. 14, pp. 134-143, 2003.
[16] A. Torsello, D. Hidovic, and M. Pelillo, “Four Metrics for Efficiently Comparing Attributed Trees,” Proc. 17th Int'l Conf. Pattern Recognition, vol. 2, pp. 467-470, 2004.
[17] V.C. Bhavsar, H. Boley, and L. Yang, “A Weighted-Tree Similarity Algorithm for Multi-Agent Systems in E-Business Environments,” Proc. 2003 Workshop Business Agents and the Semantic Web, pp. 53-72, June 2003.
[18] G. Sampson, R. Haigh, and E. Atwell, “Natural Language Analysis by Stochastic Optimization: A Progress Report on Project,” J. Experimental and Theoretical Artificial Intelligence, vol. 1, pp. 271-287, Apr. 1989.
[19] J. Zhong, H. Zhu, J. Li, and Y. Yu, “Conceptual Graph Matching for Semantic Search,” Proc. 10th Int'l Conf. Conceptual Structures, pp. 92-196, 2002.
[20] J. Wang, T.H. Bo, I. Jonassen, O. Myklebost, and E. Hovig, “Tumor Classification and Marker Gene Prediction by Feature Selection and Fuzzy C-Means Clustering Using Microarray Data,” BMC Bioinformatics, vol. 4, no. 1, p. 60, Dec. 2003.
[21] T. Ando, M. Suguro, T. Hanai, T. Kobayashi, H. Honda, and M. Seto, “Fuzzy Neural Network Applied to Gene Expression Profiling for Predicting the Prognosis of Diffuse Large B-Cell Lymphoma,” Japan J. Cancer Research, vol. 93, no. 11, pp. 1207-1212, Nov. 2002.
[22] H. Ressom, R. Reynolds, and R.S. Varghese, “Increasing the Efficiency of Fuzzy Logic-Based Gene Expression Data Analysis,” Physiological Genomics, vol. 13, no. 2, pp. 107-117, Apr. 2003.
[23] C. Perez-Iratxeta, P. Bork, and M.A. Andrade, “Association of Genes to Genetically Inherited Diseases Using Data Mining,” Nature Genetics, vol. 31, pp. 316-319, July 2002.
[24] J.C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms. New York: Plenum Press, 1981.
[25] A.J. Enright, S. Van Dongen, and C.A. Ouzounis, “An Efficient Algorithm for Large-Scale Detection of Protein Families,” Nucleic Acids Research, vol. 30, no. 7, 2002.
[26] S.F. Altschul, W. Gish, W. Miller, E.W. Myers, and D.J. Lipman, “Basic Local Alignment Search Tool,” J. Molecular Biology, vol. 215, no. 3, pp. 403-410, 1990.
[27] J. Keller, M. Popescu, and J.A. Mitchell, “Taxonomy-Based Soft Similarity Measures in Bioinformatics,” Proc. IEEE Int'l Conf. Fuzzy Systems, pp. 23-30, July 2004.
[28] D. Dubois and H. Prade, Fuzzy Sets and Systems: Theory and Applications. New York: Academic Press, 1980.
[29] E.M. Marcotte, M. Pellegrini, H.L. Ng, D.W. Rice, T.O. Yeates, and D. Eisenberg, “Detecting Protein Function and Protein-Protein Interactions from Genome Sequences,” Science, vol. 285, pp. 751-753, 1999.
[30] J. Myllyharju and K.I. Kivirikko, “Collagens, Modifying Enzymes and Their Mutations in Humans, Flies and Worms,” Trends in Genetics, vol. 20, no. 1, pp. 33-43, 2004.
[31] J.-M. Claverie, “Computational Methods for the Identification of Differential and Coordinated Gene Expression,” Human Molecular Genetics, no. 8, pp. 1821-1183, 1999.

Index Terms:
Similarity measure, fuzzy measure, Choquet integral, Gene Ontology.
Mihail Popescu, James M. Keller, Joyce A. Mitchell, "Fuzzy Measures on the Gene Ontology for Gene Product Similarity," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 3, no. 3, pp. 263-274, July-Sept. 2006, doi:10.1109/TCBB.2006.37
Usage of this product signifies your acceptance of the Terms of Use.