Normalized Feature Vectors: A Novel Alignment-Free Sequence Comparison Method Based on the Numbers of Adjacent Amino Acids
Issue No. 02 - March-April (2013 vol. 10)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TCBB.2013.10
De-Shuang Huang , Sch. of Electron. & Inf. Eng., Tongji Univ., Shanghai, China
Hong-Jie Yu , Dept. of Math., Anhui Sci. & Technol. Univ., Fengyang, China
Based on all kinds of adjacent amino acids (AAA), we map each protein primary sequence into a 400 by (L-1) matrix M. In addition, we further derive a normalized 400-tuple mathematical descriptors D, which is extracted from the primary protein sequences via singular values decomposition (SVD) of the matrix. The obtained 400-D normalized feature vectors (NFVs) further facilitate our quantitative analysis of protein sequences. Using the normalized representation of the primary protein sequences, we analyze the similarity for different sequences upon two data sets: 1) ND5 sequences from nine species and 2) transferrin sequences of 24 vertebrates. We also compared the results in this study with those from other related works. These two experiments illustrate that our proposed NFV-AAA approach does perform well in the field of similarity analysis of sequence.
Proteins, Amino acids, Vectors, Feature extraction, Bioinformatics, Educational institutions
De-Shuang Huang, Hong-Jie Yu, "Normalized Feature Vectors: A Novel Alignment-Free Sequence Comparison Method Based on the Numbers of Adjacent Amino Acids", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 10, no. , pp. 457-467, March-April 2013, doi:10.1109/TCBB.2013.10