Issue No. 06 - November/December (2011 vol. 8)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TCBB.2011.78
Thiago de Souza Rodrigues , Federal Center of Technological Education of Minas Gerais, Belo Horizonte
Fernanda Caldas Cardoso , Federal University of Minas Gerais, Belo Horizonte
Santuza Maria Ribeiro Teixeira , Federal University of Minas Gerais, Belo Horizonte
Sérgio Costa Oliveira , Federal University of Minas Gerais, Belo Horizonte
Antônio Pádua Braga , Federal University of Minas Gerais, Belo Horizonte
A large number of unclassified sequences is still found in public databases, which suggests that there is still need for new investigations in the area. In this contribution, we present a methodology based on Artificial Neural Networks for protein functional classification. A new protein coding scheme, called here Extended-Sequence Coding by Sliding Windows, is presented with the goal of overcoming some of the difficulties of the well method Sequence Coding by Sliding Window. The new protein coding scheme uses more than one sliding window length with a weight factor that is proportional to the window length, avoiding the ambiguity problem without ignoring the identity of small subsequences Accuracy for Sequence Coding by Sliding Windows ranged from 60.1 to 77.7 percent for the first bacterium protein set and from 61.9 to 76.7 percent for the second one, whereas the accuracy for the proposed Extended-Sequence Coding by Sliding Windows scheme ranged from 70.7 to 97.1 percent for the first bacterium protein set and from 61.1 to 93.3 percent for the second one. Additionally, protein sequences classified inconsistently by the Artificial Neural Networks were analyzed by CD-Search revealing that there are some disagreement in public repositories, calling the attention for the relevant issue of error propagation in annotated databases due the incorrect transferred annotations.
Artificial neural network, protein functional classification, protein coding, protein functional classification error.
Thiago de Souza Rodrigues, Fernanda Caldas Cardoso, Santuza Maria Ribeiro Teixeira, Sérgio Costa Oliveira, Antônio Pádua Braga, "Protein Classification with Extended-Sequence Coding by Sliding Window", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 8, no. , pp. 1721-1726, November/December 2011, doi:10.1109/TCBB.2011.78