The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.05 - Sept.-Oct. (2012 vol.9)
pp: 1379-1386
Blaise Hanczar , LIPADE, Univ. Paris Descartes, Paris, France
Avner Bar-Hen , MAP5, Univ. Paris Descartes, Paris, France
ABSTRACT
One of the major aims of many microarray experiments is to build discriminatory diagnosis and prognosis models. A large number of supervised methods have been proposed in literature for microarray-based classification for this purpose. Model evaluation and comparison is a critical issue and, the most of the time, is based on the classification cost. This classification cost is based on the costs of false positives and false negative, that are generally unknown in diagnostics problems. This uncertainty may highly impact the evaluation and comparison of the classifiers. We propose a new measure of classifier performance that takes account of the uncertainty of the error. We represent the available knowledge about the costs by a distribution function defined on the ratio of the costs. The performance of a classifier is therefore computed over the set of all possible costs weighted by their probability distribution. Our method is tested on both artificial and real microarray data sets. We show that the performance of classifiers is very depending of the ratio of the classification costs. In many cases, the best classifier can be identified by our new measure whereas the classic error measures fail.
INDEX TERMS
probability, biology computing, genetics, genomics, lab-on-a-chip, pattern classification, real microarray data sets, classifier performance, gene expression data, discriminatory diagnosis, prognosis models, microarray-based classification, probability distribution, artificial microarray data sets, Error analysis, Cost function, Support vector machines, Bioinformatics, Computational biology, Measurement uncertainty, Training, gene expression., Classifier performance, supervised classification, microarray analysis
CITATION
Blaise Hanczar, Avner Bar-Hen, "A New Measure of Classifier Performance for Gene Expression Data", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.9, no. 5, pp. 1379-1386, Sept.-Oct. 2012, doi:10.1109/TCBB.2012.21
REFERENCES
[1] M.H. Asyali, D. Colak, O. Demirkaya, and M.S. Inan, "Gene Expression Profile Classification: A Review," Current Bioinformatics, vol. 1, pp. 55-73(19), 2006.
[2] M. West, C. Blanchette, E. Dressman, H. Huang, S. Ishisa, and R. Spang, "Predicting the Clinical Status of Human Breast Cancer by Using Gene Expression Profiles," Proc. Nat'l Academy of Sciences USA, vol. 98, no. 20, pp. 11462-11467, 2001.
[3] M. Van de Vijver, "A Gene-Expression Signature as a Predictor of Survival in Breast Cancer," New England J. Medicine, vol. 347, pp. 1999-2009, 2002.
[4] O.E.A. Halvorsen, "Gene Expression Profiles in Prostate Cancer: Association with Patient Subgroups and Tumour Differentiation," Int'l J. Oncology, vol. 26, pp. 329-336, 2005.
[5] S. Dudoit, J. Fridlyand, and P. Speed, "Comparison of Discrimination Methods for Classification of Tumors Using Gene Expression Data," J. Am. Statistical Assoc., vol. 97, pp. 77-87, 2002.
[6] J. Khan, J. Wei, M. Ringner, L. Saal, M. Ladanyi, F. Westermann, F. Berthold, M. Schwarb, C. Antonescu, C. Peterson, and P. Meltzer, "Classification and Diagnostic Prediction of Cancers Using Gene Expression Profiling and Artificial Neural Networks," Nature Medicine, vol. 7, pp. 673-679, 2001.
[7] A. Statnikov, C.F. Aliferis, I. Tsamardinos, D. Hardin, and S. Levy, "A Comprehensive Evaluation of Multicategory Classification Methods for Microarray Gene Expression Cancer Diagnosis," Bioinformatics, vol. 21, no. 5, pp. 631-643, 2005.
[8] R. Diaz-Uriarte and S. Alvarez de Andres, "Gene Selection and Classification of Microarray Data Using Random Forest," BMC Bioinformatics, vol. 7, no. 1,article 3, 2006.
[9] M. Dettling and P. Buhlmann, "Boosting for Tumor Classification with Gene Expression Data," Bioinformatics, vol. 19, no. 9, pp. 1061-1069, June 2003.
[10] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning. Springer, 2001.
[11] T. Fawcett, "An Introduction to roc Analysis," Pattern Recognition Letter, vol. 27, no. 8, pp. 861-874, 2006.
[12] D.J. Hand, "Measuring Classifier Performance: a Coherent Alternative to the Area under the ROC Curve," Machine Learning, vol. 77, no. 1, pp. 103-123, 2009.
[13] P. Flach, "The Geometry of roc Space: Understanding Machine Learning Metrics through roc Isometrics," Proc. 20th Int'l Conf. Machine Learning (ICML '03), pp. 194-201, 2003.
[14] E.R. Dougherty, C. Sima, J. Hua, B. Hanczar, and U.M. Braga-Neto, "Performance of Error Estimators for Classification," Current Bioinformatics, vol. 5, no. 1, pp. 53-67, 2010.
[15] B. Hanczar and E. Dougherthy, "On the Comparison of Classifiers for Microarray Data," Current Bioinformatics, vol. 5, no. 1, pp. 29-39, 2010.
[16] B. Hanczar, J. Hua, C. Sima, J. Weinstein, M. Bittner, and E.R. Dougherty, "Small-Sample Precision of ROC-Related Estimates," Bioinformatics, vol. 26, no. 6, pp. 822-830, 2010.
28 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool