V.D. Francesco, Analytical Biostatistics Section, Nat. Inst. of Health, Bethesda, MD, USA
P.J. Munson, Analytical Biostatistics Section, Nat. Inst. of Health, Bethesda, MD, USA
J. Garnier, Analytical Biostatistics Section, Nat. Inst. of Health, Bethesda, MD, USA
Using a new database of 20 proteins not included in any of the previously used training datasets, we have incorporated multiple alignment information from homologous proteins into two well-characterized prediction methods: COMBINE (a jury method) and the Q-L (or quadratic-logistic) method. It is found that the increase in accuracy from the use of related proteins is similar for both methods (5.8% and 6.3%, respectively) yielding a per residue prediction accuracy (Q3) of 68.7% and 69.0%, respectively, for a three state prediction. Most of the improvement came from consideration of averaging, profiling or consensus predictions. Of this improvement, a small amount (0.5%) came from recognition that "gap-permissive" positions in the alignment are most frequently in the coil state. Our finding is consistent with the hypothesis of a common secondary structure for the aligned family, and that improved accuracy is due to reduced noise in the prediction.
Index Terms:
proteins; biology computing; chemistry computing; chemistry; database management systems; multiple alignments; protein secondary structure prediction; database; training datasets; homologous proteins; well-characterized prediction methods; COMBINE; jury method; quadratic-logistic method; related proteins; prediction accuracy; three state prediction; averaging; profiling; consensus predictions; gap-permissive positions; coil state; secondary structure; reduced noise
Citation:
V.D. Francesco, P.J. Munson, J. Garnier, "Use of multiple alignments in protein secondary structure prediction," hicss, pp.285, 28th Hawaii International Conference on System Sciences (HICSS'95), 1995