CSDL Home IEEE/ACM Transactions on Computational Biology and Bioinformatics 2011 vol.8 Issue No.06 - November/December

Subscribe

Issue No.06 - November/December (2011 vol.8)

pp: 1522-1534

Jung Hun Oh , University of Texas at Arlington, Arlington, TX

Jean Gao , The University of Texas at Arlington, Arlington, TX

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TCBB.2010.42

ABSTRACT

The classification of serum samples based on mass spectrometry (MS) has been increasingly used for monitoring disease progression and for diagnosing early disease. However, the classification task in mass spectrometry data is extremely challenging due to the very huge size of peaks (features) on mass spectra. Linear discriminant analysis (LDA) has been widely used for dimension reduction and feature extraction in many applications. However, the conversional LDA suffers from the singularity problem when dealing with high-dimensional features. Another critical limitation is its linearity property which results in failing in classification problems over nonlinearly clustered data sets. To overcome such problems, we develop a new fast kernel discriminant analysis (FKDA) that is pretty fast in the calculation of optimal discriminant vectors. FKDA is applied to the classification of liver cancer mass spectrometry data that consist of three categories: hepatocellular carcinoma, cirrhosis, and healthy that was originally analyzed by Ressom et al. [CHECK END OF SENTENCE]. We demonstrate the superiority and effectiveness of FKDA when compared to other classification techniques.

INDEX TERMS

FKDA, LDA, hepatocellular carcinoma, cirrhosis, classification, singularity.

CITATION

Jung Hun Oh, Jean Gao, "Fast Kernel Discriminant Analysis for Classification of Liver Cancer Mass Spectra",

*IEEE/ACM Transactions on Computational Biology and Bioinformatics*, vol.8, no. 6, pp. 1522-1534, November/December 2011, doi:10.1109/TCBB.2010.42REFERENCES

- [1] H. Ressom, R. Varghese, S. Drake, G. Hortin, M. Abdel-Hamid, C. Loffredo, and R. Goldman, “Peak Selection from Maldi-TOF Mass Spectra Using Ant Colony Optimization,”
Bioinformatics, vol. 23, no. 5, pp. 619-626, 2007.- [2] E. Petricoin and L. Liotta, “Seldi-TOF-Based Serum Proteomic Pattern Diagnostics for Early Detection of Cancer,”
Current Opinion in Biotechnology, vol. 15, pp. 24-30, 2004.- [3] C. Tan, A. Ploner, A. Quandt, J. Lehtio, and Y. Pawitan, “Finding Regions of Significance in Seldi Measurements for Identifying Protein Biomarkers,”
Bioinformatics, vol. 22, pp. 1515-1523, 2006.- [4] S. Pan, J. Rush, E. Peskind, D. Galasko, K. Chung, J. Quinn, J. Jankovic, J. Leverenz, C. Zabetian, C. Pan, Y. Wang, J. Oh, J. Gao, J. Zhang, T. Montine, and J. Zhang, “Application of Targeted Quantitative Proteomics Analysis in Human Cerebrospinal Fluid Using an LC Maldi TOF/TOF Platform,”
J. Proteome Research, vol. 7, pp. 720-730, 2008.- [5] J. Yu, S. Ongarello, R. Fiedler, X. Chen, G. Toffolo, C. Cobelli, and Z. Trajanoski, “Ovarian Cancer Identification Based on Dimensionality Reduction for High-Throughput Mass Spectrometry Data,”
Bioinformatics, vol. 21, pp. 2200-2209, 2005.- [6] J. Yu and X. Chen, “Bayesian Neural Network Approaches to Ovarian Cancer Identification from High-Resolution Mass Spectrometry Data,”
Bioinformatics, vol. 21, pp. i487-i494, 2005.- [7] B. Wu, T. Abbott, D. Fishman, W. McMurray, G. Mor, K. Stone, D. Ward, K. Williams, and H. Zhao, “Comparison of Statistical Methods for Classification of Ovarian Cancer Using Mass Spectrometry Data,”
Bioinformatics, vol. 19, pp. 1636-1643, 2003.- [8] P. Geurts, M. Fillet, D. de Seny, M. Meuwis, M. Malaise, M. Merville, and L. Wehenkel, “Proteomic Mass Spectra Classification Using Decision Tree Based Ensemble Methods,”
Bioinformatics, vol. 21, pp. 3138-3145, 2005.- [9] L. Lancashire, O. Schmid, H. Shah, and G. Ball, “Classification of Bacterial Species from Proteomic Data Using Combinatorial Approaches Incorporating Artificial Neural Networks, Cluster Analysis and Principal Components Analysis,”
Bioinformatics, vol. 21, pp. 2191-2199, 2005.- [10] R. Tibshirani, T. Hastie, B. Narasimhan, S. Soltys, G. Shi, A. Koong, and Q. Le, “Sample Classification from Protein Mass Spectrometry, by Peak Probability Contrasts,”
Bioinformatics, vol. 20, pp. 3034-3044, 2004.- [11] X. Zhang and Y. Jia, “A Linear Discriminant Analysis Framework Based on Random Subspace for Face Recognition,”
Pattern Recognition, vol. 40, pp. 2585-2591, 2007.- [12] Y. Guo, S. Li, J. Yang, T. Shu, and L. Wu, “A Generalized Foley-Sammon Transform Based on Generalized Fisher Discriminant Criterion and Its Application to Face Recognition,”
Pattern Recognition Letters, vol. 24, pp. 147-158, 2003.- [13] Z. Jin, J. Yand, Z. Hu, and Z. Lou, “Face Recognition Based on the Uncorrelated Discriminant Transformation,”
Pattern Recognition, vol. 34, pp. 1405-1416, 2001.- [14] X. Jing, H. Wong, and D. Zhang, “Face Recognition Based on 2d Fisherface Approach,”
Pattern Recognition, vol. 39, pp. 707-710, 2006.- [15] J. Ye, T. Li, T. Xiong, and R. Janardan, “Using Uncorrelated Discriminant Analysis for Tissue Classification with Gene Expression Data,”
IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 1, no. 4, pp. 181-190, Oct.-Dec. 2004.- [16] P. Howland, M. Jeon, and H. Park, “Structure Preserving Dimension Reduction for Clustered Text Data Based on the Generalized Singular Value Decomposition,”
SIAM J. Matrix Analysis and Applications, vol. 25, pp. 165-179, 2003.- [17] P. Howland and H. Park, “Generalizing Discriminant Analysis Using the Generalized Singular Value Decomposition,”
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, no. 8, pp. 995-1006, Aug. 2004.- [18] H. Li, K. Zhang, and T. Jiang, “Robust and Accurate Cancer Classification with Gene Expression Profiling,”
Proc. IEEE Computational Systems Bioinformatics Conf., pp. 310-321, 2005.- [19] Z. Liang and P. Shi, “An Efficient and Effective Method to Solve Kernel Fisher Discriminant Analysis,”
Neurocomputing, vol. 61, pp. 485-493, 2004.- [20] G. Baudat and F. Anouar, “Generalized Discriminant Analysis Using a Kernel Approach,”
Neural Computation, vol. 12, pp. 2385-2404, 2000.- [21] S. Mika, G. Ratsch, J. Weston, B. Scholkopf, and K. Muller, “Fisher Discriminant Analysis with Kernels,”
Proc. IEEE Neural Networks Processing Workshop, pp. 41-48, 1999.- [22] J. Yang, Z. Jin, J. Yand, D. Zhang, and A. Frangi, “Essence of Kernel Fisher Discriminant: KPCA Plus LDA,”
Pattern Recognition, vol. 37, pp. 2097-2100, 2004.- [23] C. Park and H. Park, “Nonlinear Discriminant Analysis Using Kernel Functions and the Generalized Singular Value Decomposition,”
SIAM J. Matrix Analysis and Applications, vol. 27, pp. 87-102, 2005.- [24] C. Burges, “A Tutorial on Support Vector Machines for Pattern Recognition,”
Data Mining and Knowledge Discovery, vol. 2, pp. 121-167, 1998.- [25] B. Fei and J. Liu, “Binary Tree of SVM: A New Fast Multiclass Training and Classification Algorithm,”
IEEE Trans. Neural Networks, vol. 17, no. 3, pp. 696-704, May 2006.- [26] I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, “Gene Selection for Cancer Classification Using Support Vector Machines,”
Machine Learning, vol. 46, pp. 389-422, 2002.- [27] C. Hsu and C. Lin, “A Comparison of Methods for Multi-Class Support Vector Machines,”
IEEE Trans. Neural Networks, vol. 13, no. 2, pp. 415-425, Mar. 2002.- [28] C. Chang and C. Lin, “LIBSVM: A Library for Support Vector Machines,” http://www.csie.ntu.edu.tw/~cjlinlibsvm, 2001.
- [29] I. Witten and E. Frank,
Data Mining: Practical Machine Learning Tools and Techniques, second ed. Morgan Kaufmann, 2005.- [30] L. Shen and C.E.C. Tan, “Reducing Multiclass Cancer Classification to Binary by Output Coding and SVM,”
Computational Biology and Chemistry, vol. 30, pp. 63-71, 2006.- [31] E. Allwein, R. Schapire, and Y. Singer, “Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers,”
J. Machine Learning Research, vol. 1, pp. 113-141, 2002.- [32] F. Aiolli and A. Sperduti, “Multiclass Classification with Multi-Prototype Support Vector Machines,”
J. Machine Learning Research, vol. 6, pp. 817-850, 2005.- [33] Y. Lee and C. Lee, “Classification of Multiple Cancer Types by Multicategory Support Vector Machines Using Gene Expression Data,”
Bioinformatics, vol. 19, pp. 1132-1139, 2003.- [34] K. Crammer and Y. Singer, “On the Algorithmic Implementation of Multiclass Kernel-Based Vecto Machines,”
J. Machine Learning Research, vol. 2, pp. 265-292, 2001.- [35] A. Statnikov, C. Aliferis, I. Tsamardinos, D. Hardin, and S. Levy, “A Comprehensive Evaluation of Multicategory Classification Methods for Microarray Gene Expression Cancer Diagnosis,”
Bioinformatics, vol. 21, pp. 631-643, 2005.- [36] L. Breiman, “Random Forest,”
Machine Learning, vol. 45, pp. 5-32, 2001.- [37] J. Quinlan,
C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., 1993.- [38] K. Fukunaga,
Introduction to Statistical Pattern Recognition. Morgan Kaufmann Publishers Inc., 1990.- [39] S. Dudoit, J. Fridlyand, and T. Speed, “Comparison of Discriminant Methods for the Classification of Tumors Using Gene Expression Data,”
J. Am. Statistical Assoc, vol. 97, pp. 77-87, 2002.- [40] T. Furey, N. Cristianini, N. Duffy, D. Bednarski, M. Schummer, and D. Haussler, “Support Vector Machine Classification and Validation of Cancer Tissue Samples Using Microarray Expression Data,”
Bioinformatics, vol. 16, pp. 906-914, 2000.- [41] T. Golub, D. Slonim, and P. Tamayo, “Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring,”
Science, vol. 286, pp. 531-537, 1999.- [42] R. Diaz-Uriarte and S.A. de Andres, “Gene Selection and Classification of Microarray Data Using Random Forest,”
BMC Bioinformatics, vol. 7, article no. 3, 2006. |