This Article 
 Bibliographic References 
 Add to: 
Translation Initiation Sites Prediction with Mixture Gaussian Models in Human cDNA Sequences
August 2005 (vol. 17 no. 8)
pp. 1152-1160
Translation initiation sites (TISs) are important signals in cDNA sequences. Many research efforts have tried to predict TISs in cDNA sequences. In this paper, we propose to use mixture Gaussian models for TIS prediction. Using both local features and some features generated from global measures, the proposed method predicts TISs with a sensitivity of 98 percent and a specificity of 93.6 percent. Our method outperforms many other existing methods in sensitivity while keeping specificity high. We attribute the improvement in sensitivity to the nature of the global features and the mixture Gaussian models.

[1] P.K. Agarwal and V. Bafna, “Detecting Non-Adjoining Correlations within Signals in DNA,” Proc. Second Ann. Int'l Conf. Computational Molecular Biology RECOMB, pp. 2-8, 1998.
[2] C.M. Bishop, Neural Networks for Pattern Recognition. Oxford: Clarendon Press, 1995.
[3] C.J.C Burges, “A Tutorial on Support Vector Machines for Pattern Recognition,” Data Mining and Knowledge Discovery, vol. 2, pp. 121-167, 1998.
[4] A. Cigan, L. Feng, and T. Donahue, “tRNAi(met) Functions in Directing the Scanning Ribosome to the Start Site of Translation,” Science, vol. 242, no. 4875, pp. 93-97, 1988.
[5] A.P. Dempster, N.M. Laird, and D.B. Rubin, “Maximum Likelihood from Incomplete Data via The EM Algorithm,” J. Royal Statistical Soc., vol. 39, pp. 1-38, 1977.
[6] C. Derst, M. Reczko, and A. Hatzigeorgiou, “Prediction of Human Translational Initiation Sites Using a Multiple Neural Network Approach,” The Int'l J. Computers, Systems and Signals, vol. 1, no. 2, pp. 169-179, 2000.
[7] R.W. DeVaul, “Creating a Gaussian Mixture Model Using BNT— A Short Tutorial,” Version 1.0, edu/wearables/mithril/ BNTmixtureBNT.txt, 2004.
[8] T.E. Dever, “Gene-Specific Regulation by General Translation Fac-tors,” Cell, vol. 108, pp. 545-556, 2002.
[9] J.W. Fickett, “The Gene Identification Problem: An Overview for Developers,” Computer & Chemistry, vol. 20, pp. 103-108, 1996.
[10] A.G. Hatzigeorgiou, “Translation Initiation Start Prediction in Human cDNAs with High Accuracy,” Bioinformatics, vol. 18, no. 2, pp. 343-350, 2002.
[11] D.W. Hosmer and S. Lemeshow, Applied Logistic Regression. New York: John Wiley & Sons, 2000.
[12] M. Kozak, “At Least Six Nucleotides Preceding the AUG Initiator Codon Enhance Translation in Mammalian Cells,” Molecular Biology, vol. 196, no. 4, pp. 947-950, 1987.
[13] M. Kozak, “How Do Eucaryotic Ribosomes Select Initiation Regions in Messenger RNA?” Cell, vol. 15, no. 4, pp. 1109-1123, 1978.
[14] M. Kozak, “Interpreting cDNA Sequences: Some Insights from Studies on Translation,” Mammalian Genome, vol. 7, pp. 563-574, 1996.
[15] M. Kozak, “Pushing the Limits of the Scanning Mechanism for Initiation of Translation,” Gene, vol. 299, pp. 1-34, 2002.
[16] M. Kozak, “The Scanning Model for Translation: An Update,” Cell Biology, vol. 108, no. 2, pp. 229-241, 1989.
[17] H. Li and T. Jiang, “A Class of Edit Kernels for SVMs to Predict Translation Initiation Sites in Eukaryotic mRNAs,” Proc. Eighth Ann. Int'l Conf. Computational Molecular Biology, pp. 262-271, 2004.
[18] J. Li, H. Liu, L. Wong, and R. Yap, “Techniques for Recognition of Translation Initiation Sites,” The Practical Bioinformatician, L. Wong, ed., New Jersey: World Scientific, pp. 71-90, 2004.
[19] H. Liu, H. Han, J. Li, and L. Wong, “Using Amino Acid Patterns to Accurately Predict Translation Initiation Sites,” Silico Biology, vol. 4, no. 0022, 2004.
[20] K. Murphy, “Bayes Net Toolbox for Matlab,” BNTbnt.html, 2004.
[21] A. Nadershahi, S.C. Fahrenkrug, and L.B. M. Ellis, “Comparison of Computational Methods for Identifying Translation Initiation Sites in EST Data,” BMC Bioinformatics, vol. 5, no. 14,available from: http://www. biomedcentral. com/1471-2105/ 514, 2004.
[22] T. Nishikawa, T. Ota, and T. Isogai, “Prediction Whether a Human cDNA Sequence Contains Initiation Codon by Combining Statistical Information and Similarity with Protein Sequences,” Bioinformatics, vol. 16, no. 11, pp. 960-967, 2000.
[23] A. Pedersen and H. Nielsen, “Neural Network Prediction of Translation Initiation Sites in Eukaryotes: Prespectives for EST and Genome Analysis,” Proc. Fifth Int'l Conf. Intelligent Systems for Molecular Biology (ISMB '97), pp. 226-233, 1997.
[24] M. Pertea and S. Salzberg, “A Method to Improve the Performance of Translation Start Site Detection and Its Application for Gene Finding,” Proc. Second Workshop Algorithms in BioInformatics (WABI2002), pp. 210-219, 2002.
[25] J.R. Quinlan, C4.5: Programs for Machine Learning. San Mateo, Calif.: Morgan Kaufmann, 1993.
[26] I.B. Rogozin, A.V. Kochetov, F.A. Kondrashov, E.V. Koonin, and L. Milanesi, “Presence of ATG Triplets in 5' Untranslated Regions of Eukaryotic cDNAs Correlates with a “Weak” Context of the Start Codon,” Bioinformatics, vol. 17, no. 10, pp. 890-900, 2001.
[27] A. Salamov, T. Nishikawa, and M.B. Swindells, “Assessing Protein Coding Region Integrity in cDNA Sequencing Projects,” Bioinformatics, vol. 14, no. 5, pp. 384-390, 1998.
[28] S. Salzberg, “A Method for Identifying Splice Sites and Translational Start Sites in Eukaryotic mRNA,” Computer Applications in Biosciences (CABIOS), vol. 13, no. 4, pp. 365-376, 1997.
[29] G.D. Stormo, T.D. Schneider, L. Gold, and A. Ehrenfeucht, “Use of the “Perceptron” Algorithm to Distinguish Translational Initiation Sites in E. Coli,” Nucleic Acids Research, vol. 10, pp. 2997-3011, 1982.
[30] I.H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. San Francisco: Morgan Kaufmann, 1999.
[31] F. Zeng, R. Yap, and L. Wong, “Using Feature Generation and Feature Selection for Accurate Prediction of Translation Initiation Sites,” Proc. 13th Int'l Conf. Genome Informatics, pp. 192-200, 2002.
[32] A. Zien, G. Ratsch, S. Mika, B. Scholkopf, T. Lengauer, and K.-R. Muller, “Engineering Support Vector Machine Kernels That Recognize Translation Initiation Sites,” Bioinformatics, vol. 16, no. 9, pp. 799-807, 2000.
[33] G. Li, T.-Y. Leong, and L. Zhang, “Translation Initiation Sites Prediction with Mixture Gaussian Models,” Proc. Fourth Workshop Algorithms in Bioinformatics (WABI 2004), pp. 338-349, 2004.

Index Terms:
Index Terms- Bioinformatics, classification, feature extraction, mixture Gaussian model, translation initiation sites.
Guoliang Li, Tze-Yun Leong, Louxin Zhang, "Translation Initiation Sites Prediction with Mixture Gaussian Models in Human cDNA Sequences," IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 8, pp. 1152-1160, Aug. 2005, doi:10.1109/TKDE.2005.133
Usage of this product signifies your acceptance of the Terms of Use.