This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Design and Analysis of Classifier Learning Experiments in Bioinformatics: Survey and Case Studies
Nov.-Dec. 2012 (vol. 9 no. 6)
pp. 1663-1675
O. Irsoy, Dept. of Comput. Eng., Bogazici Univ., Istanbul, Turkey
O. T. Yildiz, Dept. of Comput. Eng., Isik Univ., Istanbul, Turkey
E. Alpaydin, Dept. of Comput. Eng., Bogazici Univ., Istanbul, Turkey
In many bioinformatics applications, it is important to assess and compare the performances of algorithms trained from data, to be able to draw conclusions unaffected by chance and are therefore significant. Both the design of such experiments and the analysis of the resulting data using statistical tests should be done carefully for the results to carry significance. In this paper, we first review the performance measures used in classification, the basics of experiment design and statistical tests. We then give the results of our survey over 1,500 papers published in the last two years in three bioinformatics journals (including this one). Although the basics of experiment design are well understood, such as resampling instead of using a single training set and the use of different performance metrics instead of error, only 21 percent of the papers use any statistical test for comparison. In the third part, we analyze four different scenarios which we encounter frequently in the bioinformatics literature, discussing the proper statistical methodology as well as showing an example case study for each. With the supplementary software, we hope that the guidelines we discuss will play an important role in future studies.
Index Terms:
statistical testing,bioinformatics,data analysis,design of experiments,learning (artificial intelligence),statistical methodology,classifier learning experiments,data analysis,statistical tests,experiment design,bioinformatics literature,Bioinformatics,Algorithm design and analysis,Measurement,Approximation algorithms,Computational biology,model selection,Statistical tests,classification
Citation:
O. Irsoy, O. T. Yildiz, E. Alpaydin, "Design and Analysis of Classifier Learning Experiments in Bioinformatics: Survey and Case Studies," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 9, no. 6, pp. 1663-1675, Nov.-Dec. 2012, doi:10.1109/TCBB.2012.117
Usage of this product signifies your acceptance of the Terms of Use.