This Article 
 Bibliographic References 
 Add to: 
Optimal Aggregation of Binary Classifiers for Multiclass Cancer Diagnosis Using Gene Expression Profiles
April-June 2009 (vol. 6 no. 2)
pp. 333-343
Multiclass classification is one of the fundamental tasks in bioinformatics and typically arises in cancer diagnosis studies by gene expression profiling. There have been many studies of aggregating binary classifiers to construct a multiclass classifier based on one-versus-the-rest (1R), one-versus-one (11), or other coding strategies, as well as some comparison studies between them. However, the studies found that the best coding depends on each situation. Therefore, a new problem, which we call the “optimal coding problem,” has arisen: how can we determine which coding is the optimal one in each situation? To approach this optimal coding problem, we propose a novel framework for constructing a multiclass classifier, in which each binary classifier to be aggregated has a weight value to be optimally tuned based on the observed data. Although there is no a priori answer to the optimal coding problem, our weight tuning method can be a consistent answer to the problem. We apply this method to various classification problems including a synthesized data set and some cancer diagnosis data sets from gene expression profiling. The results demonstrate that, in most situations, our method can improve classification accuracy over simple voting heuristics and is better than or comparable to state-of-the-art multiclass predictors.

[1] T.R. Golub, D.K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J.P. Mesirov, H. Coller, M.L. Loh, J.R. Downing, M.A. Caligiuri, C.D. Bloomfield, and E.S. Lander, “Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring,” Science, vol. 286, no. 5439, pp. 531-537, Oct. 1999.
[2] J. Khan, J.S. Wei, M. Ringner, L.H. Saal, M. Ladanyi, F. Westermann, F. Berthold, M. Schwab, C.R. Antonescu, C. Peterson, and P.S. Meltzer, “Classification and Diagnostic Prediction of Cancers Using Gene Expression Profiling and Artificial Neural Networks,” Nature Medicine, vol. 7, no. 6, pp. 673-679, June 2001.
[3] S. Ramaswamy, P. Tamayo, R. Rifkin, S. Mukherjee, C. Yeang, M. Angelo, C. Ladd, M. Reich, E. Latulippe, J.P. Mesirov, T. Poggio, W. Gerald, M. Loda, E.S. Lander, and T.R. Golub, “Multiclass Cancer Diagnosis Using Tumor Gene Expression Signatures,” Proc. Nat'l Academy Sciences USA, vol. 98, no. 26, pp. 15149-15154, Dec. 2001.
[4] I. Hedenfalk, M. Ringner, A. Ben-Dor, Z. Yakhini, Y. Chen, G. Chebil, R. Ach, N. Loman, H. Olsson, P. Meltzer, A. Borg, and J. Trent, “Molecular Classification of Familial non-BRCA1/BRCA2 Breast Cancer,” Proc. Nat'l Academy Sciences USA, vol. 100, no. 5, pp.2532-2537, Mar. 2003.
[5] B. Schoelkopf, C. Burges, and V. Vapnik, “Extracting Support Data for a Given Task,” Proc. First Int'l Conf. Knowledge Discovery and Data Mining, pp. 252-257, 1995.
[6] B. Schoelkopf, C. Burges, and A. Smola, Advances in Kernel Methods Support Vector Learning. MIT Press, 1999.
[7] T.G. Dietterich and G. Bakiri, “Solving Multiclass Learning Problems via Error-Correcting Output Codes,” J. Artificial Intelligence Research, vol. 2, pp. 263-286, 1995.
[8] E.L. Allwein, R.E. Schapire, and Y. Singer, “Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers,” Proc. 17th Int'l Conf. Machine Learning, pp. 9-16, 2000.
[9] T. Hastie and R. Tibshirani, “Classification by Pairwise Coupling,” Advances in Neural Information Processing Systems, vol. 10, pp. 507-513, 1998.
[10] B. Zadrozny, “Reducing Multiclass to Binary by Coupling Probability Estimates,” Advances in Neural Information Processing Systems, vol. 14, pp. 1041-1048, 2001.
[11] T. Li, C. Zhang, and M. Ogihara, “A Comparative Study of Feature Selection and Multiclass Classification Methods for Tissue Classification Based on Gene Expression,” Bioinformatics, vol. 20, no. 15, pp. 2429-2437, Oct. 2004.
[12] A. Statnikov, C.F. Aliferis, I. Tsamardinos, D. Hardin, and S. Levy, “A Comprehensive Evaluation of Multicategory Classification Methods for Microarray Gene Expression Cancer Diagnosis,” Bioinformatics, vol. 21, no. 5, pp. 631-643, 2005.
[13] J. Weston and C. Watkins, “Multi-Class Support Vector Machine,” technical report, Univ. of London, 1998.
[14] K. Crammer and Y. Singer, “On the Algorithmic Implementation of Multiclass Kernel-Based Vector Machines,” J. Machine Learning Research, vol. 2, pp. 265-292, 2001.
[15] E.L. Allwein, R.E. Schapire, and Y. Singer, “Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers,” J.Machine Learning Research, vol. 1, pp. 113-141, 2001.
[16] L. Shen and E.C. Tan, “Reducing Multiclass Cancer Classification to Binary by Output Coding and SVM,” Computational Biology and Chemistry, vol. 30, no. 1, pp. 63-71, Feb. 2006.
[17] J. Platt, “Probabilistic Outputs for Support Vector Machines and Comparison to Regularized Likelihood Methods,” Advances in Large Margin Classifiers, A.J. Smola, P. Bartlett, B. Schoelkopf, and D. Schuurmans, eds., pp. 61-74, 2000.
[18] K. Kato, “Adaptor-Tagged Competitive PCR: A Novel Method for Measuring Relative Gene Expression,” Nucleic Acids Research, vol. 25, no. 22, pp. 4694-4696, Nov. 1997.
[19] E. Saxen, K. Franssila, O. Bjarnason, T. Normann, and N. Ringertz, “Observer Variation in Histologic Classification of Thyroid Cancer,” Acta Pathologica et Microbiologica Scandinavica A, vol. 86A, no. 6, pp. 483-486, Nov. 1978.
[20] A.S. Fassina, M.C. Montesco, V. Ninfo, P. Denti, and G. Masarotto, “Histological Evaluation of Thyroid Carcinomas: Reproducibility of the WHO Classification,” Tumori, vol. 79, no. 5, pp. 314-320, Oct. 1993.
[21] Z.W. Baloch, S. Fleisher, V.A. LiVolsi, and P.K. Gupta, “Diagnosis of Follicular Neoplasm: A Gray Zone in Thyroid Fine-Needle Aspiration Cytology,” Diagnostic Cytophathology, vol. 26, no. 1, pp.41-44, Jan. 2002.
[22] K. Kato, R. Yamashita, R. Matoba, M. Monden, S. Noguchi, T. Takagi, and K. Nakai, “Cancer Gene Expression Database (CGED): A Database for Gene Expression Profiling and Accompanying Clinical Information of Human Cancer Tissues,” Nucleic Acids Research, vol. 33, pp. D533-D536, 2005.
[23] K. Taniguchi, T. Takano, A. Miyauchi, K. Koizumi, Y. Ito, Y. Takamura, M. Ishitobi, Y. Miyoshi, T. Taguchi, Y. Tamaki, K. Kato, and S. Noguchi, “Differentiation of Follicular Thyroid Adenoma from Carcinoma by Gene Expression Profiling with Adapter-Tagged Competitive Polymerase Chain Reaction,” Oncology, vol. 69, pp. 428-435, 2005.
[24] S.A. Armstrong, J.E. Staunton, L.B. Silverman, R. Pieters, M.L. den Boer, M.D. Minden, S.E. Sallan, E.S. Lander, T.R. Golub, and S.J. Korsmeyer, “MLL Translocations Specify a Distinct Gene Expression Profile that Distinguishes a Unique Leukemia,” Nature Genetics, vol. 30, no. 1, pp. 41-47, Jan. 2002.
[25] R. Tibshirani, T. Hastie, B. Narasimhan, and G. Chu, “Diagnosis of Multiple Cancer Types by Shrunken Centroids of Gene Expression,” Proc. Nat'l Academy Sciences USA, vol. 99, no. 10, pp. 6567-6572, May 2002.
[26] M. Ohira, S. Oba, Y. Nakamura, E. Isogai, S. Kaneko, A. Nakagawa, T. Hirata, H. Kubo, T. Goto, S. Yamada, Y. Yoshida, M. Fuchioka, S. Ishii, and A. Nakagawara, “Expression Profiling Using a Tumor-Specific cDNA Microarray Predicts the Prognosis of Intermediate Risk Neuroblastomas,” Cancer Cell, vol. 7, no. 4, pp. 337-350, Apr. 2005.
[27] T.S. Furey, N. Cristianini, N. Duffy, D.W. Bednarski, M. Schummer, and D. Haussler, “Support Vector Machine Classification and Validation of Cancer Tissue Samples Using Microarray Expression Data,” Bioinformatics, vol. 16, no. 10, pp. 906-914, evaluation studies, Oct. 2000.
[28] S. Dudoit, J. Fridlyand, and T.P. Speed, “Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data,” J. Am. Statistical Assoc., vol. 97, pp. 77-87, 2002.
[29] Y. Freund and R. Schapire, “Experiments with a New Boosting Algorithm,” Proc. Int'l Conf. Machine Learning (ICML '96), pp. 148-156, 1996.
[30] I. Guyon, J. Weston, S.M.D. Barnhill, and V. Vapnik, “Gene Selection for Cancer Classification Using Support Vector Machines,” Machine Learning, vol. 46, pp. 389-422, 2002.

Index Terms:
Multiclass classification, error correcting output coding, gene expression profiling, cancer diagnosis.
Naoto Yukinawa, Shigeyuki Oba, Kikuya Kato, Shin Ishii, "Optimal Aggregation of Binary Classifiers for Multiclass Cancer Diagnosis Using Gene Expression Profiles," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 6, no. 2, pp. 333-343, April-June 2009, doi:10.1109/TCBB.2007.70239
Usage of this product signifies your acceptance of the Terms of Use.