The Community for Technology Leaders
Green Image
Issue No. 06 - November/December (2011 vol. 8)
ISSN: 1545-5963
pp: 1721-1726
Santuza Maria Ribeiro Teixeira , Federal University of Minas Gerais, Belo Horizonte
Fernanda Caldas Cardoso , Federal University of Minas Gerais, Belo Horizonte
Thiago de Souza Rodrigues , Federal Center of Technological Education of Minas Gerais, Belo Horizonte
Antônio Pádua Braga , Federal University of Minas Gerais, Belo Horizonte
Sérgio Costa Oliveira , Federal University of Minas Gerais, Belo Horizonte
ABSTRACT
A large number of unclassified sequences is still found in public databases, which suggests that there is still need for new investigations in the area. In this contribution, we present a methodology based on Artificial Neural Networks for protein functional classification. A new protein coding scheme, called here Extended-Sequence Coding by Sliding Windows, is presented with the goal of overcoming some of the difficulties of the well method Sequence Coding by Sliding Window. The new protein coding scheme uses more than one sliding window length with a weight factor that is proportional to the window length, avoiding the ambiguity problem without ignoring the identity of small subsequences Accuracy for Sequence Coding by Sliding Windows ranged from 60.1 to 77.7 percent for the first bacterium protein set and from 61.9 to 76.7 percent for the second one, whereas the accuracy for the proposed Extended-Sequence Coding by Sliding Windows scheme ranged from 70.7 to 97.1 percent for the first bacterium protein set and from 61.1 to 93.3 percent for the second one. Additionally, protein sequences classified inconsistently by the Artificial Neural Networks were analyzed by CD-Search revealing that there are some disagreement in public repositories, calling the attention for the relevant issue of error propagation in annotated databases due the incorrect transferred annotations.
INDEX TERMS
Artificial neural network, protein functional classification, protein coding, protein functional classification error.
CITATION
Santuza Maria Ribeiro Teixeira, Fernanda Caldas Cardoso, Thiago de Souza Rodrigues, Antônio Pádua Braga, Sérgio Costa Oliveira, "Protein Classification with Extended-Sequence Coding by Sliding Window", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 8, no. , pp. 1721-1726, November/December 2011, doi:10.1109/TCBB.2011.78
97 ms
(Ver )