The Community for Technology Leaders
RSS Icon
Issue No.05 - Sept.-Oct. (2013 vol.10)
pp: 1241-1252
Devendra Kumar Shakya , Dept. of Biomed. Eng., Samrat Ashok Technol. Inst., Vidisha, India
Rajiv Saxena , Jaypee Univ. of Eng. & Technol., Guna, India
Sanjeev Narayan Sharma , Dept. of Biomed. Eng., Samrat Ashok Technol. Inst., Vidisha, India
Signal processing-based algorithms for identification of coding sequences (CDS) in eukaryotes are non-data driven and exploit the presence of three-base periodicity in these regions for their detection. Three-base periodicity is commonly detected using short time Fourier transform (STFT) that uses a window function of fixed length. As the length of the protein coding and noncoding regions varies widely, the identification accuracy of STFT-based algorithms is poor. In this paper, a novel signal processing-based algorithm is developed by enabling the window length adaptation in STFT of DNA sequences for improving the identification of three-base periodicity. The length of the window function has been made adaptive in coding regions to maximize the magnitude of period-3 measure, whereas in the noncoding regions, the window length is tailored to minimize this measure. Simulation results on bench mark data sets demonstrate the advantage of this algorithm when compared with other non-data-driven methods for CDS prediction.
Encoding, DNA, Bioinformatics, Genomics, Signal processing algorithms, Prediction algorithms,deoxyribonucleic acid (DNA), Bioinformatics, signal processing, window function, short time Fourier transform (STFT)
Devendra Kumar Shakya, Rajiv Saxena, Sanjeev Narayan Sharma, "An Adaptive Window Length Strategy for Eukaryotic CDS Prediction", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.10, no. 5, pp. 1241-1252, Sept.-Oct. 2013, doi:10.1109/TCBB.2013.76
[1] M.Q. Zhang, "Computational Prediction of Eukaryotic Protein-Coding Genes," Nature Rev. Genetics, vol. 3, pp. 698-709, Sept. 2002.
[2] R. Guigo, P. Flicek, J.F. Abril, A. Reymond, J. Lagarde, F. Denoeud, S. Antonarakis, M. Ashburner, V.B. Bajic, E. Birney, R. Castelo, E. Eyras, C. Ucla, T.R. Gingeras, J. Harrow, T. Hubbard, S.E. Lewis, and M.G. Reese, "EGASP: The Human ENCODE Genome Annotation Assessment Project," Genome Biology, vol. 7, no. Suppl 1, article S2, 2006.
[3] M.B. Gerstein et al., "What is a Gene, Post-ENCODE? History and Updated Definition," Genome Research, vol. 17, pp. 669-681, 2007.
[4] M. Yandell and D. Ence, "A Beginner's Guide to Eukaryotic Genome Annotation," Nature Rev. Genetics, vol. 13, pp. 329-342, May 2012.
[5] E. Blanco and R. Guigo, "Predictive Methods Using DNA Sequences," Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins, third ed., John Wiley & Sons, 2004.
[6] R. Guigo, "DNA Composition, Codon Usage and Exon Prediction," Genetic Databases, pp. 53-80, Academic Press, 1999.
[7] M. Akhtar, J. Epps, and E. Ambikairajah, "Signal Processing in Sequence Analysis: Advances in Eukaryotic Gene Prediction," IEEE J. Selected Topics in Signal Processing, vol. 2, no. 3, pp. 310-321, June 2008.
[8] K.D. Rao and M.N.S. Swamy, "Analysis of Genomics and Proteomics Using DSP Techniques," IEEE Trans. Circuits and Systems I: Regular Papers, vol. 55, no. 1, pp. 370-378, Feb. 2008.
[9] J.P. Mena-Chalco, H. Carrer, Y. Zana, and R.M. CesarJr., "Identification of Protein Coding Regions Using the Modified Gabor-Wavelet Transform," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 5, no. 2, pp. 198-207, Apr.-June 2008.
[10] T.S. Gunawan, J. Epps, and E. Ambikairajah, "Boosting Approach to Exon Detection in DNA Sequences," Electronics Letters, vol. 44, no. 4, pp. 323-324, 2008.
[11] M. Akhtar, E. Ambikairajah, and J. Epps, "Optimizing Period-3 Methods for Eukaryotic Gene Prediction," Proc. IEEE Third Int'l Conf. Acoustics, Speech, and Signal Processing, pp. 621-624, 2004.
[12] S.A. Marhon and S.C. Kremer, "Gene Prediction Based on DNA Spectral Analysis: A Literature Review," J. Computational Biology, vol. 18, no. 4, pp. 639-676, 2011.
[13] S. Tiwari, S. Ramachandran, S. Bhattacharya, and R. Ramaswamy, "Prediction of Probable Genes by Fourier Analysis of Genomic Sequences," Computational Applied Bioscience, vol. 13, pp. 263-270, 1997.
[14] D. Anastassiou, "Frequency-Domain Analysis of Biomolecular Sequences," Bioinformatics, vol. 16, no. 2, pp. 1073-1081, 2000.
[15] D. Kotlar and Y. Lavner, "Gene Prediction by Spectral Rotation Measure: A New Method for Identifying Protein-Coding Regions," Genome Research, vol. 13, no. 8, pp. 1930-1937, 2003.
[16] L. Jubisa Stankovic and V. Katkovnik, "Algorithm for the Instantaneous Frequency Estimation Using Time-Frequency Distributions with Adaptive Window Width," IEEE Signal Processing Letters, vol. 5, no. 9, pp. 224-227, Sept. 1998.
[17] V. Katkovnik and L. Jubisa Stankovic, "Instantaneous Frequency Estimation Using the Wigner Distribution with Varying and Data-Driven Window Length," IEEE Trans. Signal Processing, vol. 46, no. 9, pp. 2315-2325, Sept. 1998.
[18] S.K. Mitra, Digital Signal Processing a Computer-Based Approach, second ed., McGraw Hill, 2002.
[19] D.K. Shakya, R. Saxena, and S.N. Sharma, "Improved Exon Prediction with Transforms by De-Noising Period-3 Measure," Digital Signal Processing, vol. 23, no. 2, pp. 499-505, 2013.
[20] S. Mallat, A Wavelet Tour of Signal Processing, third ed., Academic Press, 2009.
[21] S. Rogic, A.K. Mackworth, and B.F. Ouellette, "Evaluation of Gene Finding Program on Mammalian Sequence," Genomic Research, vol. 11, no. 5, pp. 817-832, 2001.
[22] M. Burset and R. Guigo, "Evaluation of Gene Structure Prediction Program," Genomic, vol. 34, pp. 353-367, 1996.
[23] "Asp67 Dataset," datasets/, 2013.
[24] "EGASP Data Set," Location Genome, 2013.
[25] P. Baldi and S. Brunak, Bioinformatics: The Machine Learning Approach, second ed., MIT Press, 2001.
[26] X. Jiang, D. Lavenier, and S. Yau, "Coding Region Prediction Based on a Universal DNA Sequence Representation Method," J. Computational Biology, vol. 15, pp. 1237-1256, 2008.
[27] P.P. Vaidyanathan and B.J. Yoon, "Digital Filters for Gene Prediction Applications," Proc. 36th Asilomer Conf. Signals Systems and Computers, Nov. 2002.
[28] T.S. Gunawan, "On the Optimal Window Shape for Genomic Signal Processing," Proc. Int'l Conf. Computer and Communication Eng., pp. 252-255, 2008.
[29] S.S. Sahu and G. Panda, "Efficient Localization of Hot Spots in Proteins Using a Novel S-Transform Based Filtering Approach," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 8, no. 5, pp. 1235-1246, Sept./Oct. 2011.
72 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool