Issue No. 01 - January-March (2010 vol. 7)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.29
Purvesh Khatri , Wayne State University, Detriot
Sorin Drăghici , Wayne State University, Detriot
Arina Done , Wayne State University, Detriot
Bogdan Done , Wayne State University, Detriot
The correct interpretation of many molecular biology experiments depends in an essential way on the accuracy and consistency of the existing annotation databases. Such databases are meant to act as repositories for our biological knowledge as we acquire and refine it. Hence, by definition, they are incomplete at any given time. In this paper, we describe a technique that improves our previous method for predicting novel GO annotations by extracting implicit semantic relationships between genes and functions. In this work, we use a vector space model and a number of weighting schemes in addition to our previous latent semantic indexing approach. The technique described here is able to take into consideration the hierarchical structure of the Gene Ontology (GO) and can weight differently GO terms situated at different depths. The prediction abilities of 15 different weighting schemes are compared and evaluated. Nine such schemes were previously used in other problem domains, while six of them are introduced in this paper. The best weighting scheme was a novel scheme, n2tn. Out of the top 50 functional annotations predicted using this weighting scheme, we found support in the literature for 84 percent of them, while 6 percent of the predictions were contradicted by the existing literature. For the remaining 10 percent, we did not find any relevant publications to confirm or contradict the predictions. The n2tn weighting scheme also outperformed the simple binary scheme used in our previous approach.
Gene function prediction, gene annotation, Gene Ontology, vector space model, latent semantic indexing, weighting schemes.
Purvesh Khatri, Sorin Drăghici, Arina Done, Bogdan Done, "Predicting Novel Human Gene Ontology Annotations Using Semantic Analysis", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 7, no. , pp. 91-99, January-March 2010, doi:10.1109/TCBB.2008.29