2017 IEEE 26th International Conference on Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE) (2017)
Poznan, Poznan, Poland
June 21, 2017 to June 23, 2017
Despite the significant contribution from specialized ontologies and text mining methods, the evaluation of the semantic similarity of genes remains difficult because of the complex functions in which genes are involved. A less exploited resource is Wikipedia that stores more than 10400 articles about human genes: each gene name identifies the corresponding Wikipedia page resuming gene's properties in short sentences where hyperlinks define relationships with other genes in Wikipedia. This paper evaluates the extent to which the Wikipedia can be trusted for assessing the similarity of a gene pair as the distance between their Wikipedia pages. We present a set of experiments that make use of TagMe (a powerful tool for evaluating the distance of two Wikipedia pages based on their annotations) to calculate the semantic similarity of several sets of genes on Wikipedia. Results compare well with gold standards and semantic similarity values evaluated on gene ontologies. The paper demonstrates the effectiveness of Wikipedia in recognizing functional groups of genes, the quality and the wealth of its knowledge about genes as well the accuracy of TagMe.
Encyclopedias, Electronic publishing, Internet, Semantics, Ontologies, Tools
N. Dessi and M. Atzori, "Is Wikipedia a Latent Gene Ontology?," 2017 IEEE 26th International Conference on Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE), Poznan, Poznan, Poland, 2017, pp. 164-169.