The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.02 - April-June (2008 vol.5)
pp: 198-207
ABSTRACT
An important topic in genomic sequence analysis is the identification of protein coding regions. In this context, several coding DNA model-independent methods, based on the occurrence of specific patterns of nucleotides at coding regions, have been proposed. Nonetheless, these methods have not been completely suitable due to their dependence on an empirically pre-defined window length required for a local analysis of a DNA region. We introduce a method, based on a modified Gabor-wavelet transform (MGWT), for the identification of protein coding regions. This novel transform is tuned to analyze periodic signal components and presents the advantage of being independent of the window length. We compared the performance of the MGWT with other methods using eukaryote datasets. The results show that the MGWT outperforms all assessed model-independent methods with respect to identification accuracy. These results indicate that the source of at least part of the identification errors produced by the previous methods is the fixed working scale. The new method not only avoids this source of errors, but also makes available a tool for detailed exploration of the nucleotide occurrence.
INDEX TERMS
Biology and genetics, Signal processing, Pattern Recognition
CITATION
Jesús P. Mena-Chalco, Helaine Carrer, Yossi Zana, Roberto M. Cesar Jr., "Identification of Protein Coding Regions Using the Modified Gabor-Wavelet Transform", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.5, no. 2, pp. 198-207, April-June 2008, doi:10.1109/TCBB.2007.70259
REFERENCES
[1] L.F. Costa, “Biological Sequence Analysis through the One-Dimensional Percolation Transform and Its Enhanced Version,” Bioinformatics, vol. 21, no. 5, pp. 608-616, 2005.
[2] E.R. Dougherty, I. Shmulevich, J. Chen, and Z.J. Wang, Genomic Signal Processing and Statistics. Hindawi Publishing Corp., 2005.
[3] M.Q. Zhang, “Computational Prediction of Eukaryotic Protein-Coding Genes,” Nature Rev. Genetics, vol. 3, no. 9, pp. 698-709, 2002.
[4] J.H. Do and D.K. Choi, “Computational Approaches to Gene Prediction,” The J. Microbiology, vol. 44, no. 2, pp. 137-144, 2006.
[5] E. Blanco and R. Guigó, “Predictive Methods Using DNA Sequences,” Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins, third ed., John Wiley & Sons, 2004.
[6] R. Staden, “Computer Methods to Locate Genes and Signals in Nucleic Acid Sequences,” Genetic Eng. Principles and Methods, vol. 7, pp. 67-114, 1985.
[7] M. Borodovsky and J. McIninch, “Genemark: Parallel Gene Recognition for Both DNA Strands,” Computers and Chemistry, vol. 17, no. 2, pp. 123-133, 1993.
[8] R. Guigó, “DNA Composition, Codon Usage and Exon Prediction,” Genetic Databases, pp. 53-80, Academic Press, 1999.
[9] G. Gutierrez, J.L. Oliver, and A. Marin, “On the Origin of the Periodicity of Three in Protein Coding DNA Sequences,” J.Theoretical Biology, vol. 167, no. 4, pp. 413-414, 1994.
[10] J. Sanchez and I. Lopez-Villasenor, “A Simple Model to Explain Three-Base Periodicity in Coding DNA,” FEBS Letters, vol. 580, no. 27, pp. 6413-6422, 2006.
[11] B.D. Silverman and R. Linsker, “A Measure of DNA Periodicity,” J. Theoretical Biology, vol. 118, no. 3, pp. 295-300, 1986.
[12] S.T. Eskesen, F.N. Eskesen, B. Kinghorn, and A. Ruvinsky, “Periodicity of DNA in Exons,” J. Molecular Biology, vol. 5, no. 12, pp. 1-11, 2004.
[13] I. Lopez-Villasenor, M.V. Jose, and J. Sanchez, “Three-Base Periodicity Patterns and Self-Similarity in Whole Bacterial Chromosomes,” Biochemical and Biophysical Research Comm., vol. 325, no. 2, pp. 467-478, 2004.
[14] S. Hosid, E.N. Trifonov, and A. Bolshoy, “Sequence Periodicity of Escherichia coli Is Concentrated in Intergenic Regions,” BMC Molecular Biology, vol. 5, no. 14, pp. 1-7, 2004.
[15] X. Zhang, F. Chen, Y. Zhang, S.C. Agner, M. Akay, Z. Lu, M.M.Y. Waye, and S.K. Tsui, “Signal Processing Techniques in Genomic Engineering,” Proc. IEEE, vol. 90, no. 12, pp. 1822-1833, 2002.
[16] J. Chen, H. Li, K. Sun, and B. Kim, “How Will Bioinformatics Impact Signal Processing Research,” IEEE Signal Processing Magazine, vol. 20, no. 6, pp. 16-26, 2003.
[17] P. Liò, “Wavelets in Bioinformatics and Computational Biology: State of Art and Perspectives,” Bioinformatics, vol. 19, no. 1, pp. 2-9, 2003.
[18] W. Li, “The Study of Correlation Structures of DNA Sequences: A Critical Review,” Computers and Chemistry, vol. 21, no. 4, pp. 257-271, 1997.
[19] S. Tiwari, S. Ramachandran, A. Bhattacharya, S. Bhattacharya, and R. Ramaswamy, “Prediction of Probable Genes by Fourier Analysis of Genomic Sequences,” Bioinformatics, vol. 13, no. 3, pp. 263-270, 1997.
[20] D. Anastassiou, “Frequency-Domain Analysis of Biomolecular Sequences,” Bioinformatics, vol. 16, no. 12, pp. 1073-1081, 2000.
[21] D. Kotlar and Y. Lavner, “Gene Prediction by Spectral Rotation Measure: A New Method for Identifying Protein-Coding Regions,” Genome Research, vol. 13, no. 8, pp. 1930-1937, 2003.
[22] S. Datta and A. Asif, “A Fast DFT-Based Gene Prediction Algorithm for Identification of Protein Coding Regions,” Proc. 30th IEEE Int'l Conf. Acoustics, Speech, and Signal Processing, vol. 3, pp. 113-116, 2005.
[23] P.P. Vaidyanathan and B. Yoon, “The Role of Signal-Processing Concepts in Genomics and Proteomics,” J. Franklin Inst., vol. 341, no. 1-2, pp. 111-135, 2004.
[24] A.K. Konopka, “Towards Mapping Functional Domains and Indiscriminantly Sequenced Nucleic Acids: A Computational Approach,” Human Genome Initiative and DNA Recombination, pp.113-125, Adenine Press, 1990.
[25] J.W. Fickett and C.S. Tung, “Assessment of Protein Coding Measures,” Nucleic Acids Research, vol. 20, no. 24, pp. 6441-6450, 1992.
[26] I. Grosse, H. Herzel, S.V. Buldyrev, and H.E. Stanley, “Species Independence of Mutual Information in Coding and Noncoding DNA,” Physical Rev. E, vol. 61, no. 5, pp. 5624-5629, 2000.
[27] A.W.-C. Liew, H. Yan, and M. Yang, “Pattern Recognition Techniques for the Emerging Field of Bioinformatics: A Review,” Pattern Recognition, vol. 38, no. 11, pp. 2055-2073, 2005.
[28] F. Chen and Y.-T. Zhang, “A DNA Structure-Based Bionic Wavelet Transform and Its Applications to DNA Sequence Analysis,” Applied Bionics and Biomechanics, vol. 1, no. 1, pp. 3-9, 2003.
[29] J. Ning, C.N. Moore, and J.C. Nelson, “Preliminary Wavelet Analysis of Genomic Sequences,” Proc. IEEE Bioinformatics Conf., pp. 509-510, 2003.
[30] L.F. Costa and R.M. Cesar Jr., Shape Analysis and Classification: Theory and Practice. CRC Press, 2001.
[31] Y.T. Chan, Wavelet Basics. Kluwer Academic, 1995.
[32] M. Burset and R. Guigó, “Evaluation of Gene Structure Prediction Programs,” Genomics, vol. 34, no. 3, pp. 353-367, 1996.
[33] S. Rogic, A.K. Mackworth, and B.F. Ouellette, “Evaluation of Gene-Finding Programs on Mammalian Sequences,” Genome Research, vol. 11, no. 5, pp. 817-832, 2001.
[34] W. Wang and D.H. Johnson, “Computing Linear Transforms of Symbolic Signals,” IEEE Trans. Signal Processing, vol. 50, no. 3, pp.628-634, 2002.
[35] S.V. Buldyrev, A.L. Goldberger, S. Havlin, R.N. Mantegna, M.E. Matsa, C.-K. Peng, M. Simons, and H.E. Stanley, “Long-Range Correlation Properties of Coding and Noncoding DNA Sequences: GenBank Analysis,” Physical Rev. E, vol. 51, no. 5, pp. 5084-5091, 1995.
[36] N. Chakravarthy, A. Spanias, L.D. Iasemidis, and K. Tsakalis, “Autoregressive Modeling and Feature Analysis of DNA Sequences,” EURASIP J. Applied Signal Processing, vol. 1, pp. 13-28, 2004.
[37] V. Afreixo, P.J.S.G. Ferreira, and D. Santos, “Fourier Analysis of Symbolic Data: A Brief Review,” Digital Signal Process, vol. 14, no. 6, pp. 523-530, 2004.
[38] J.A. Hanley and B.J. McNeil, “The Meaning and Use of the Area under a Receiver Operating Characteristic (ROC) Curve,” Radiology, vol. 143, no. 1, pp. 29-36, Apr. 1982.
[39] W. Li, T.G. Marr, and K. Kaneko, “Understanding Long-Range Correlations in DNA Sequences,” Physica D, vol. 75, pp. 392-416, 1994.
[40] F. Gao and C.T. Zhang, “Comparison of Various Algorithms for Recognizing Short Coding Sequences of Human Genes,” Bioinformatics, vol. 20, no. 5, pp. 673-681, 2004.
[41] C. Mathe, M.F. Sagot, T. Schiex, and P. Rouze, “Current Methods of Gene Prediction, Their Strengths and Weakness,” Nucleic Acids Research, vol. 30, no. 19, pp. 4103-4117, 2002.
21 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool