The Community for Technology Leaders
RSS Icon
Issue No.02 - March-April (2013 vol.10)
pp: 447-456
Yifeng Li , Sch. of Comput. Sci., Univ. of Windsor, Windsor, ON, Canada
Alioune Ngom , Sch. of Comput. Sci., Univ. of Windsor, Windsor, ON, Canada
Microarray data can be used to detect diseases and predict responses to therapies through classification models. However, the high dimensionality and low sample size of such data result in many computational problems such as reduced prediction accuracy and slow classification speed. In this paper, we propose a novel family of nonnegative least-squares classifiers for high-dimensional microarray gene expression and comparative genomic hybridization data. Our approaches are based on combining the advantages of using local learning, transductive learning, and ensemble learning, for better prediction performance. To study the performances of our methods, we performed computational experiments on 17 well-known data sets with diverse characteristics. We have also performed statistical comparisons with many classification techniques including the well-performing SVM approach and two related but recent methods proposed in literature. Experimental results show that our approaches are faster and achieve generally a better prediction performance over compared methods.
Algorithms, Classificaiton, Least squares methods, Diseases, Medical information systems,classifier design and evaluation, algorithms, Medicine
Yifeng Li, Alioune Ngom, "Nonnegative Least-Squares Methods for the Classification of High-Dimensional Biological Data", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.10, no. 2, pp. 447-456, March-April 2013, doi:10.1109/TCBB.2013.30
[1] A. Zhang, Advanced Analysis of Gene Expression Microarray Data. World Scientific, 2009.
[2] M.A. van de Wiel, F. Picard, W.N. van Wieringen, and B. Ylstra, "Preprocessing and Downstream Analysis of Microarray DNA Copy Number Profiles," Briefings in Bioinformatics, vol. 12, no. 1, pp. 10-21, 2010.
[3] X. Li and H. Zhao, "Weighted Random Subspace Method for High Dimensional Data Classification," Statistics and Its Interface, vol. 2, no. 2, pp. 153-159, 2009.
[4] H. Zheng and H. Wu, "Short Prokaryotic Fragment Binning Using Hierarchical Classifier Based on Linear Discriminant Analysis and Principle Component Analysis," J. Bioinformatics and Computational Biology, vol. 8, no. 6, pp. 995-1011, 2010.
[5] G. Fort and S. Lambert-Lacroix, "Classification Using Partial Least Squares with Penalized Logistic Regression," Bioinformatics, vol. 21, no. 7, pp. 1104-1111, 2005.
[6] D. Chung and S. Keles, "Sparse Partial Least Squares Classification for High Dimensional Data," Statistical Applications in Genetics and Molecular Bioinformatics, vol. 9, no. 1,article 1, 2010.
[7] D.D. Lee and S. Seung, "Learning the Parts of Objects by Non-Negative Matrix Factorization," Nature, vol. 401, pp. 788-791, 1999.
[8] J.P. Brunet, P. Tamayo, T.R. Golub, and J.P. Mesirov, "Metagenes and Molecular Pattern Discovery Using Matrix Factorization," Proc. Nat'l Academy of Sciences USA, vol. 101, no. 12, pp. 4164-4169, 2004.
[9] H. Kim and H. Park, "Sparse Non-Negatice Matrix Factorization via Alternating Non-Negativity-Constrained Least Squares for Microarray Data Analysis," Bioinformatics, vol. 23, no. 12, pp. 1495-1502, 2007.
[10] C. Ding, T. Li, and M.I. Jordan, "Convex and Semi-Nonnegative Matrix Factorizations," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 32, no. 1, pp. 45-55, Jan. 2010.
[11] Y. Li and A. Ngom, "Non-Negative Matrix and Tensor Factorization Based Classification of Clinical Microarray Gene Expression Data," Proc. IEEE Int'l Conf. Bioinformatics and Biomedicine, pp. 438-443, 2010.
[12] A. Blum and A. Langley, "Selection of Relevant Features and Examples in Machine Learning," Artificial Intelligence, vol. 97, pp. 245-271, 1997.
[13] P.A. Mundra and J.C. Rajapakse, "Gene and Sample Selection for Cancer Classification with Support Vectors Based t-Statistic," Neurocomputing, vol. 73, nos. 13-15, pp. 2353-2362, 2010.
[14] R.E. Schapire, "The Strength of Weak Learnability," Machine Learning, vol. 5, no. 2, pp. 197-227, 1990.
[15] H. Liu, H. Motoda, and L. Yu, "A Selective Sampling Approach to Active Feature Selection," Artificial Intelligence, vol. 159, pp. 49-74, 2004.
[16] L. Bottou and V.N. Vapnik, "Local Learning Algorithms," Neural Computation, vol. 4, no. 6, pp. 888-900, 1992.
[17] V.N. Vapnik, "Principles of Risk Minimization for Learning Theory," Advances in Neural Information Processing Systems, vol. 4, pp. 831-838, 1992.
[18] J.H. Friedman, "Local Learning Based on Recursive Covering," technical report, Dept. of Statistics, Stanford Univ., CA, 1996.
[19] H. Cheng, P.-N. Tan, and R. Jin, "Efficient Algorithm for Localized Support Vector Machine," IEEE Trans. Knowledge and Data Eng., vol. 22, no. 4, pp. 537-549, Apr. 2010.
[20] C.G. Atkeson, A.W. Moore, and S. Schaal, "Locally Weighted Learning," Artificial Intelligence Rev., vol. 11, pp. 11-73, 1997.
[21] A. Sierra and C.S. Cruz, "Global and Local Neural Network Ensembles," Pattern Recognition Letters, vol. 19, no. 8, pp. 651-655, 1998.
[22] J. Peng and B. Bhanu, "Local Discriminative Learning for Pattern Recognition," Pattern Recognition, vol. 34, pp. 139-150, 2001.
[23] V. Vapnik, Statistical Learning Theory, pp. 339-371, Wiley, 1998.
[24] O. Chapelle, B. Schölkopf, and A. Zien, Semi-Supervised Learning, pp. 453-472, MIT Press, 2006.
[25] X. Zhu and A.B. Goldberg, Introduction to Semi-Supervised Learning, pp. 9-19, Morgan & Claypool, 2009.
[26] L. Rokach, "Ensemble-Based Classifiers," Artificial Intelligence Rev., vol. 33, nos. 1/2, pp. 1-39, Feb. 2010.
[27] R.A. Jacobs, I.J. Michael, S.J. Nowlan, and C.E. Hinton, "Adaptive Mixtures of Local Experts," Neural Computation, vol. 3, no. 1, pp. 79-87, 1991.
[28] J. Wright, A Y. Yang, A. Ganesh, S.S. Sastry, and Y. Ma, "Robust Face Recognition via Sparse Representation," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 31, no. 2, pp. 210-227, Feb. 2009.
[29] S. Mukherjee, P. Tamayo, S. Rogers, R. Rifkin, A. Engle, C. Campbell, T.R. Golub, and J.P. Mesirov, "Estimating Dataset Size Requirements for Classifying DNA Microarray Data," J. Computational Biology, vol. 10, no. 2, pp. 119-142, 2003.
[30] C. Bi, M. Becker, and S. Leeder, "Derivation of Minimum Best Sample Size from Microarray Data Sets: A Monte Carlo Approach," Proc. IEEE Symp. Computational Intelligence in Bioinformatics and Computational Biology, pp. 1-6, 2011.
[31] A. Cichocki, R. Zdunek, A.H. Phan, and S. Amari, Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-Way Data Analysis and Blind Source Separation. Wiley, 2009.
[32] D.D. Lee and S. Seung, "Algorithms for Non-Negative Matrix Factorization," Advances in Neural Information Processing Systems, vol. 13, pp. 556-562, 2001.
[33] M.W. Berry, M. Browne, A.N. Langville, V.P. Pauca, and R.J. Plemmons, "Algorithms and Applications for Approximate Nonnegative Matrix Factorization," Computational Statistics & Data Analysis, vol. 52, pp. 155-173, 2007.
[34] H. Kim and H. Park, "Nonnegative Matrix Factorization Based on Alternating Nonnegativity Constrained Least Squares and Active Set Method," SIAM J. Matrix Analysis and Applications, vol. 30, no. 2, pp. 713-730, 2008.
[35] C.L. Lawson and R.J. Hanson, Solving Least Squares Problems, pp. 160-165, SIAM, 1995.
[36] M.H. Van Benthem and M.R. Keenan, "Fast Algorithm for the Solution of Large-Scale Non-Negaive Constrained Least Squares Problems," J. Chemometrics, vol. 18, pp. 441-450, 2004.
[37] D.A. Notterman et al., "Transcriptional Gene Expression Profiles of Colorectal Adenoma, Adenocarcinoma, and Normal Tissue Examined by Oligonucleotide Arrays," Cancer Research, vol. 61, no. 7, pp. 3124-3130, 2001.
[38] L.J. Van't Veer et al., "Gene Expression Profiling Predicts Clinical Outcome of Breast Cancer," Nature, vol. 415, no. 6871, pp. 530-536, 2002.
[39] U. Alon et al., "Broad Patterns of Gene Expression Revealed by Clustering of Tumor and Normal Colon Tissues Probed by Oligonucleotide Arrays," Proc. Nat'l Academy of Sciences USA, vol. 96, no. 12, pp. 6745-6750, 1999.
[40] A. Rosenwald et al., "The Use of Molecular Profiling to Predict Survival after Chemotherapy for Diffuse Large-B-Cell Lymphoma," New England J. Medicine, vol. 346, no. 25, pp. 1937-1947, 2002.
[41] T.R. Golub et al., "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring," Science, vol. 286, no. 15, pp. 531-537, 1999.
[42] D.G. Beer et al., "Gene-Expression Profiles Predict Survival of Patients with Lung Adenocarcinoma," Nature Medicine, vol. 8, no. 8, pp. 816-823, 2002.
[43] S.L. Pomeroy et al., "Gene Expression-Based Classification and Outcome Prediction of Central Nervous System Embryonal Tumors," Nature, vol. 415, no. 6870, pp. 436-442, 2002.
[44] D. Singh et al., "Gene Expression Correlates of Clinical Prostate Cancer Behavior," Cancer Cell, vol. 1, no. 2, pp. 203-209, 2002.
[45] E.J. Yeoh et al., "Classification, Subtype Discovery, and Prediction of Outcome in Pediatric Acute Lymphoblastic Leukemia by Gene Expression Profiling," Cancer Cell, vol. 1, no. 2, pp. 133-143, 2002.
[46] Z. Hu et al., "The Molecular Portraits of Breast Tumors Are Conserved across Microarray Platforms," BMC Genomics, vol. 7, article 96, 2006.
[47] S.L. Pomeroy et al., "Prediction of Central Nervous System Embryonal Tumour Outcome Based on Gene Expression," Nature, vol. 415, no. 6870, pp. 436-442, 2002.
[48] S.A. Armstrong et al., "MLL Translocations Specify a Distinct Gene Expression Profile That Distinguishes a Unique Leukemia," Nature Genetics, vol. 30, no. 1, pp. 41-47, 2002.
[49] J. Khan et al., "Classification and Diagnostic Prediction of Cancers Using Gene Expression Profiling and Artificial Neural Networks," Nature Medicine, vol. 7, no. 6, pp. 673-679, 2001.
[50] N. Stransky et al., "Regional Copy Number-Independent Deregulation of Transcription in Cancer," Nature Genetics, vol. 38, no. 12, pp. 1386-1396, 2006.
[51] K. Chin et al., "Genomic and Transcriptional Aberrations Linked to Breast Cancer Pathophysiologies," Cancer Cell, vol. 10, no. 6, pp. 529-541, 2006.
[52] J. Trolet et al., "Genomic Profiling and Identification of High-Risk Uveal Melanoma by Array CGH Analysis of Primary Tumors and Liver Metastases," Investigative Ophthalmology and Visual Science, vol. 50, no. 6, pp. 2572-2580, 2009.
[53] J. Demšar, "Statistical Comparisons of Classifiers over Multiple Data Sets," J. Machine Learning Research, vol. 7, pp. 1-30, 2006.
[54] J. Driesen and H. Van hamme, "Modelling Vocabulary Acquisition, Adaptation and Generalization in Infants Using Adaptive Bayesian PLSA," Neurocomputing, vol. 74, no. 11, pp. 1874-1882, 2011.
124 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool