This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
An Evolutionary Algorithm Approach for Feature Generation from Sequence Data and Its Application to DNA Splice Site Prediction
Sept.-Oct. 2012 (vol. 9 no. 5)
pp. 1387-1398
Uday Kamath, Dept. of Comput. Sci., George Mason Univ., Ashburn, VA, USA
Jack Compton, Barquin Int., Alexandria, VA, USA
Rezarta Islamaj-Dogan, Nat. Center for Biotechnol. Inf. (NCBI), Nat. Inst. of Health (NIH), Bethesda, MD, USA
Kenneth A. De Jong, Dept. of Comput. Sci., George Mason Univ., Fairfax, VA, USA
Amarda Shehu, Dept. of Comput. Sci., George Mason Univ., Fairfax, VA, USA
Associating functional information with biological sequences remains a challenge for machine learning methods. The performance of these methods often depends on deriving predictive features from the sequences sought to be classified. Feature generation is a difficult problem, as the connection between the sequence features and the sought property is not known a priori. It is often the task of domain experts or exhaustive feature enumeration techniques to generate a few features whose predictive power is then tested in the context of classification. This paper proposes an evolutionary algorithm to effectively explore a large feature space and generate predictive features from sequence data. The effectiveness of the algorithm is demonstrated on an important component of the gene-finding problem, DNA splice site prediction. This application is chosen due to the complexity of the features needed to obtain high classification accuracy and precision. Our results test the effectiveness of the obtained features in the context of classification by Support Vector Machines and show significant improvement in accuracy and precision over state-of-the-art approaches.
Index Terms:
support vector machines,biological techniques,DNA,evolutionary computation,genetic algorithms,molecular biophysics,genetic programming,evolutionary algorithm approach,feature generation,DNA splice site prediction,biological sequence data,machine learning methods,gene-finding problem,support vector machines,state-of-the-art approach,DNA,Support vector machines,Bioinformatics,Accuracy,Training data,Prediction algorithms,DNA splice sites.,Evolutionary computation,genetic programming,feature extraction and construction,classifier design and evaluation,data mining
Citation:
Uday Kamath, Jack Compton, Rezarta Islamaj-Dogan, Kenneth A. De Jong, Amarda Shehu, "An Evolutionary Algorithm Approach for Feature Generation from Sequence Data and Its Application to DNA Splice Site Prediction," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 9, no. 5, pp. 1387-1398, Sept.-Oct. 2012, doi:10.1109/TCBB.2012.53
Usage of this product signifies your acceptance of the Terms of Use.