This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05)
Adapting Support Vector Machines to Predict Translation Initiation Sites the Human Genome
Stanford, California
August 08-August 11
ISBN: 0-7695-2442-7
Rehan Akbani, University of Texas at San Antonio
Stephen Kwek, University of Texas at San Antonio

This study is concerned with predicting Translation Initiation Sites (TIS) in the human genome that start with the nucleotide sequence ATG. This sequence occurs 104 million times in the entire genome. However, current estimates predict that there are only about 30,000 or so TIS in the human genome, giving an imbalance ratio of about 1:3500 for TIS ATG vs. non-TIS ATG sites. Algorithms that are designed using datasets that have low imbalance ratio may not be well suited to predict TIS at the genomic level. In this paper, we modified the SVM algorithm that can handle moderately high imbalance ratio. The F-measures for other approaches were: Linear Discriminant 0%, SVM with under-sampling 4.1%, SVM with over-sampling 8.2%, Neural Network 13.3%, Decision Tree 20%, our approach 44%. This shows how poorly standard approaches perform at the genomic level due to the high imbalance ratio. Our approach improves the performance significantly.

Citation:
Rehan Akbani, Stephen Kwek, "Adapting Support Vector Machines to Predict Translation Initiation Sites the Human Genome," csbw, pp.143-148, 2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05), 2005
Usage of this product signifies your acceptance of the Terms of Use.