DeepText2Go: Improving large-scale protein function prediction with deep semantic text representation
2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) (2017)
Kansas City, MO, USA
Nov. 13, 2017 to Nov. 16, 2017
Ronghui You , School of Computer Science and Shanghai Key Lab of Intelligent Information Processing, Fudan University
Shanfeng Zhu , School of Computer Science and Shanghai Key Lab of Intelligent Information Processing, Fudan University
UniProtKB has collected more than 88 million protein sequences by July 2017. Less than 0.2% of these proteins, however, have added experimental GO annotations. To reduce this huge gap, automatic protein function prediction (AFP) becomes increasingly important. Results on CAFA (the Critical Assessment of protein Function Annotation algorithms) benchmark demonstrates that sequence homology based methods are highly competitive in AFP. One imperative issues will be incorporating other information sources other than sequence for AFP. In contrast to using BOW (bag of words) representation in traditional text-based AFP, we proposed a new method called DeepText2GO to improve large-scale AFP by using deep semantic text representation instead. Furthermore, DeepText2GO integrates both text-based and sequence homology-based methods through a consensus approach. Extensive experiments on the benchmark dataset extracted from UniProt/SwissProt have demonstrated that DeepText2GO significantly outperformed both text-based and sequence homology-based methods, validating its superiority.
Proteins, Protein engineering, Semantics, Training data, Benchmark testing, Ontologies, Training
R. You and S. Zhu, "DeepText2Go: Improving large-scale protein function prediction with deep semantic text representation," 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Kansas City, MO, USA, 2017, pp. 42-49.