The Community for Technology Leaders
RSS Icon
Issue No.03 - March (2010 vol.22)
pp: 381-391
Shu-Qin Wang , Northeast Normal University, Jilin
Xiao-Jie Yuan , Nankai University, Tianjin
Cancer classification is the critical basis for patient-tailored therapy. Conventional histological analysis tends to be unreliable because different tumors may have similar appearance. The advances in microarray technology make individualized therapy possible. Various machine learning methods can be employed to classify cancer tissue samples based on microarray data. However, few methods can be elegantly adopted for generating accurate and reliable as well as biologically interpretable rules. In this paper, we introduce an approach for classifying cancers based on the principle of minimal rough fringe. For training rough hypercuboid classifiers from gene expression data sets, the method dynamically evaluates all available genes and sifts the genes with the smallest implicit regions as the dimensions of implicit hypercuboids. An unseen object is predicted to be a certain class if it falls within the corresponding class hypercuboid. Based upon the method, ensemble rough hypercuboid classifiers are subsequently constructed. Experimental results on some open cancer gene expression data sets show that the proposed method is capable of generating accurate and interpretable rules compared with some other machine learning methods. Hence, it is a feasible way of classifying cancer tissues in biomedical applications.
Rough sets, rough hypercuboid, explicit region, implicit region, gene expression data.
Shu-Qin Wang, Xiao-Jie Yuan, "Ensemble Rough Hypercuboid Approach for Classifying Cancers", IEEE Transactions on Knowledge & Data Engineering, vol.22, no. 3, pp. 381-391, March 2010, doi:10.1109/TKDE.2009.114
[1] U. Alon, N. Barkai, D.A. Notterman, K. Gish, S. Ybarra, D. Mack, and A.J. Levine, “Broad Patterns of Gene Expression Revealed by Clustering Analysis of Tumor and Normal Colon Tissues Probed by Oligonucleotide Arrays,” Proc. Nat'l Academy of Sciences USA, vol. 96, no. 12, pp. 6745-6750, 1998.
[2] T.R. Golub, D.K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J.P. Mesirov, H. Coller, M.L. Loh, J.R. Downing, M.A. Caligiuri, C.D. Bloomfield, and E.S. Lander, “Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring,” Science, vol. 286, no. 5439, pp. 531-537, Oct. 1999.
[3] C.M. Perou, T. Sorlie, M.B. Eisen, M. van de Rijn, S.S. Jeffrey, C.A. Rees, J.R. Pollack, D.T. Ross, H. Johnsen, L.A. Akslen, O. Fluge, A. Pergamenschikov, C. Williams, S.X. Zhu, P.E. Lonning, A.L. Borresen-Dale, P.O. Brown, and D. Botstein, “Molecular Portraits of Human Breast Tumors,” Nature, vol. 406, no. 6797, pp. 747-752, 2000.
[4] A.A. Alizadeh et al., “Distinct Types of Diffuse Large B-cell Lymphoma Identified by Gene Expression Profiling,” Nature, vol. 403, pp. 503-511, Feb. 2000.
[5] K.C. Chou, “Review: Prediction of Human Immunodeficiency Virus Protease Cleavage Sites in Proteins,” Analytical Biochemistry, vol. 233, no. 1, pp. 1-14, Jan. 1996.
[6] K.C. Chou, “Structural Bioinformatics and Its Impact to Biomedical Science,” Current Medicinal Chemistry, vol. 11, no. 16, pp.2105-2134, Aug. 2004.
[7] G. Lubec, L. Afjehi-Sadat, J.W. Yang, and J.P. John, “Searching for Hypothetical Proteins: Theory and Practice Based upon Original Data and Literature,” Progress in Neurobiology, vol. 77, nos. 1/2, pp. 90-127, Oct. 2005.
[8] K.C. Chou, D.Q. Wei, Q.S. Du, S. Sirois, and W.Z. Zhong, “Progress in Computational Approach to Drug Development against SARS,” Current Medicinal Chemistry, vol. 13, no. 27, pp.3263-3270, Nov. 2006.
[9] R. Tibshirani, T. Hastie, B. Narasimhan, and G. Chu, “Diagnosis of Multiple Cancer Types by Shrunken Centroids of Gene Expression,” Proc. Nat'l Academy of Sciences USA, vol. 99, no. 10, pp. 6567-6572, May 2002.
[10] D. Geman, C. d'Avignon, D.Q. Naiman, and R.L. Winslow, “Classifying Gene Expression Profiles from Pairwise mRNA Comparisons,” Statistical Applications in Genetics and Molecular Biology, vol. 3, article 19, 2004.
[11] A.C. Tan, D.Q. Naiman, L. Xu, R.L. Winclow, and D. Geman, “Simple Decision Rules for Classifying Human Cancers from Gene Expression Profiles,” Bioinformatics, vol. 21, no. 20, pp. 3896-3904, 2005.
[12] Z. Pawlak, “Rough Sets,” Int'l J. Computer and Information Science, vol. 11, pp. 341-356, 1982.
[13] Z. Pawlak, S.K.M. Wang, and W. Ziarko, “Rough Sets: Probabilistic versus Deterministic Approach,” Int'l J. Man-Machine Studies, vol. 29, pp. 81-95, 1988.
[14] T.Y. Lin and N. Cercone, Rough Sets and Data Mining: Analysis for Imprecise Data. Springer, 1997.
[15] Y.Y. Yao, “Information Granulation and Rough Set Approximation,” Int'l J. Intelligent Systems, vol. 16, no. 1, pp. 87-104, 2000.
[16] N. Zhong, J. Dong, and S. Ohsuga, “Using Rough Sets with Heuristics for Feature Selection,” J. Intelligent Information Systems, vol. 16, no. 3, pp. 199-214, Aug. 2001.
[17] D. Kim, “Data Classification Based on Tolerant Rough Set,” Pattern Recognition, vol. 34, no. 8, pp. 1613-1624, Aug. 2001.
[18] S. Greco, B. Matarazzo, and R. Slowinski, “Rough Sets Theory for Multicriteria Decision Analysis,” European J. Operational Research, vol. 129, pp. 1-47, 2001.
[19] Q. Shen and A. Chouchoulas, “A Rough-Fuzzy Approach for Generating Classification Rules,” Pattern Recognition, vol. 35, no. 11, pp. 2425-2438, Nov. 2002.
[20] V.S. Ananthanarayana, M.N. Murty, and D.K. Subramanian, “Tree Structure for Efficient Data Mining Using Rough Sets,” Pattern Recognition Letters, vol. 24, no. 6, pp. 851-862, Mar. 2003.
[21] A. Roya and S.K. Pal, “Fuzzy Discretization of Feature Space for a Rough Set Classifier,” Pattern Recognition Letters, vol. 24, no. 6, pp.895-902, Mar. 2003.
[22] R.W. Swiniarski and A. Skowron, “Rough Set Methods in Feature Selection and Recognition,” Pattern Recognition Letters, vol. 24, no. 6, pp. 833-849, Mar. 2003.
[23] J.G. Bazan, “Behavioral Pattern Identification through Rough Set Modeling,” Fundamenta Informaticae, vol. 72, nos. 1-3, pp. 37-50, 2006.
[24] X. Wang, J. Yang, X. Teng, W. Xia, and R. Jensen, “Feature Selection Based on Rough Sets and Particle Swarm Optimization,” Pattern Recognition Letters, vol. 28, no. 4, pp. 459-471, Mar. 2007.
[25] J.M. Wei, S.Q. Wang, M.Y. Wang, J.P. You, and D.Y. Liu, “Rough Set Based Approach for Inducing Decision Trees,” Knowledge-Based Systems, vol. 20, no. 8, pp. 695-702, Dec. 2007.
[26] C. Thornton, “Hypercuboid Formation Behaviour of Two Learning Algorithms,” Proc. Int'l Joint Conf. Artificial Intelligence (IJCAI '87), pp. 301-303, 1987.
[27] A. Bundy, B. Silver, and D. Plummer, “An Analytical Comparison of Some Rule-Learning Programs,” Artificial Intelligence, vol. 27, no. 2, pp. 137-181, Nov. 1985.
[28] S. Salzberg, “A Nearest Hyperrectangle Learning Method,” Machine Learning, vol. 6, no. 3, pp. 251-276, May 1991.
[29] D. Wettschereck and T.G. Dietterich, “An Experimental Comparison of the Nearest-Neighbor and Nearest-Hyperrectangle Algorithms,” Machine Learning, vol. 19, no. 1, pp. 5-27, Apr. 1995.
[30] T.M. Cover and P.E. Hart, “Nearest Neighbor Pattern Classification,” IEEE Trans. Information Theory, vol. 13, no. 1, pp. 21-27, Jan. 1967.
[31] P.E. Hart, “The Condensed Nearest Neighbor Rule,” IEEE Trans. Information Theory, vol. IT-14, no. 3, pp. 515-516, May 1968.
[32] R. Duda and P.E. Hart, Pattern Classification and Scene Analysis. Wiley, 1973.
[33] H.S. Nguyen, “Approximate Boolean Reasoning: Foundations and Applications in Data Mining,” Trans. Rough Sets V, vol. 4100, pp.334-506, 2006.
[34] S. Theodoridis and K. Koutroumbas, Pattern Recognition, second ed. Academic Press, 2003.
[35] P. Pudil, J. Novovicova, and J. Kittler, “Floating Search Methods in Feature Selection,” Pattern Recognition Letters, vol. 15, no. 11, pp.1119-1125, 1994.
[36] Z. Pawlak, Rough Sets: Theoretical Aspects of Reasoning about Data. Kluwer Academic Publisher, 1991.
[37] J.G. Bazan, A. Skowron, and P. Synak, “Dynamic Reducts As a Tool for Extracting Laws from Decisions Tables,” Proc. Eighth Int'l Symp. Methodologies for Intelligent Systems, pp. 346-355, 1994.
[38] J.G. Bazan, “Dynamic Reducts and Statistical Inference,” Proc. Sixth Int'l Conf. Information Processing and Management of Uncertainty on Knowledge Based Systems, pp. 1147-1152, 1996.
[39] L.K. Hansen and P. Salamon, “Neural Network Ensembles,” IEEE Trans. Pattern Analysis and Machine intelligence, vol. 12, no. 10, pp.993-1001, Oct. 1990.
[40] K. Ali and M.J. Pazzani, “On the Link between Error Correlation and Error Reduction in Decision Tree Ensembles,” technical report, ICS-UCI 1995.
[41] K. Tumer and J. Ghosh, “Error Correlation and Error Reduction in Ensemble Classifiers,” Connection Science, vol. 8, no. 3, pp. 385-404, Dec. 1996.
[42] X. Yao and Y. Liu, “Making Use of Population Information in Evolutionary Artificial Neural Networks,” IEEE Trans. Systems, Man, and Cybernetics-Part B: Cybernetics, vol. 28, no. 3, pp. 417-425, June 1998.
[43] J. Kittler, “Combining Classifiers: A Theoretical Framework,” Pattern Analysis and Applications, vol. 1, no. 1, pp. 18-27. Mar. 1998.
[44] T.G. Dietterich, “Ensemble Methods in Machine Learning,” Lecture Notes in Computer Science, pp. 1-15, Springer, 2000.
[45] H.C. Kim, S. Pang, H.M. Je, D. Kim, and S.Y. Bang, “Constructing Support Vector Machine Ensemble,” Pattern Recognition, vol. 36, no. 12, pp. 2757-2767, Dec. 2003.
[46] Z.H. Zhou, J. Wu, and W. Tang, “Ensembling Neural Networks: Many Could Be Better than All,” Artificial Intelligence, vol. 137, nos. 1/2, pp. 239-263, May 2002.
[47] Z.H. Zhou and Y. Yu, “Ensembling Local Learners through Multimodal Perturbation,” IEEE Trans. Systems, Man, and Cybernetics-Part B: Cybernetics, vol. 35, no. 4, pp. 725-735, Aug. 2005.
[48] X. Hu, “Using Rough Sets Theory and Database Operations to Construct a Good Ensemble of Classifiers for Data Mining Applications,” Proc. First IEEE Int'l Conf. Data Mining (ICDM '01), pp. 233-240, 2001.
[49] K.C. Chou and H.B. Shen, “Hum-PLoc: A Novel Ensemble Classifier for Predicting Human Protein Subcellular Localization,” Biochemical and Biophysical Research Comm., vol. 347, no. 1, pp. 150-157, Aug. 2006.
[50] H.B. Shen and K.C. Chou, “Hum-mPLoc: An Ensemble Classifier for Large-Scale Human Protein Subcellular Location Prediction by Incorporating Samples with Multiple Sites,” Biochemical and Biophysical Research Comm., vol. 355, no. 4, pp. 1006-1011, Apr. 2007.
[51] K.C. Chou and H.B. Shen, “MemType-2L: A Web Server for Predicting Membrane Proteins and Their Types by Incorporating Evolution Information through Pse-PSSM,” Biochemical and Biophysical Research Comm., vol. 360, no. 2, pp. 339-345, Aug. 2007.
[52] K.C. Chou and H.B. Shen, “Signal-CF: A Subsite-Coupled and Window-Fusing Approach for Predicting Aignal Peptides,” Biochemical and Biophysical Research Comm., vol. 357, no. 3, pp.633-640, June 2007.
[53] H.B. Shen and K.C. Chou, “Ensemble Classifier for Protein Fold Pattern Recognition,” Bioinformatics, vol. 22, no. 14, pp. 1717-1722, 2006.
[54] Q. Hu, M. Wang, and D. Yu, “Construct Rough Decision Forests Based on Sequentially Data Reduction,” Proc. 2006 IEEE Conf. Machine Learning and Cybernetics, pp. 13-16, Aug. 2006.
[55] Q. Hu, D. Yu, Z. Xie, and X. Li, “EROS: Ensemble Rough Subspaces,” Pattern Recognition, vol. 40, no. 12, pp. 3728-3739, Dec. 2007.
[56] Y. Bi, S. McClean, and T. Anderson, “Combining Rough Decisions for Intelligent Text Mining Using Dempsters Rule,” J. Artificial Intelligence Rev., vol. 26, pp. 191-209, 2006.
[57] J. Khan, S.W. Jun, R. Markus, H.S. Lao, L. Marc, W. Frank, B. Frank, S. Manfred, R.A. Cristina, P. Carsten, and S.M. Paul, “Classification and Diagnostic Prediction of Cancers Using Gene Expression Profiling and Artificial Neural Networks,” Nature Medicine, vol. 7, pp. 673-679, 2001.
[58] A. Bhattacharjee, W.G. Richards, J. Staunton, C. Li, S. Monti, P. Vasa, C. Ladd, J. Beheshti, R. Bueno, M. Gillette, M. Loda, G. Weber, E.J. Mark, E.S. Lander, W. Wong, B.E. Johnson, T.R. Golub, D.J. Sugarbaker, and M. Meyerson, “Classification of Human Lung Carcinomas by mRNA Expression Profiling Reveals Distinct Adenocarcinoma Subclasses,” Proc. Nat'l Academy of Sciences USA, vol. 98, no. 24, pp. 13790-13795, Nov. 2001.
[59] A.I. Su, J.B. Welsh, L.M. Sapinoso, S.G. Kern, P. Dimitrov, H. Lapp, P.G. Schultz, S.M. Powell, C.A. Moskaluk, H.F. Frierson, and G.M. Hampton, “Molecular Classification of Human Carcinomas by Use of Gene Expression Signatures,” Cancer Research, vol. 61, pp. 7388-7393, Oct. 2001.
[60] S. Ramaswamy, P. Tamayo, R. Rifkin, S. Mukherjee, C.H. Yeang, M. Angelo, C. Ladd, M. Reich, E. Latulippe, J.P. Mesirov, T. Poggio, W. Gerald, M. Loda, E.S. Lander, and T.R. Golub, “Multiclass Cancer Diagnosis Using Tumor Gene Expression Signatures,” Proc. Nat'l Academy of Sciences USA, vol. 98, no. 26, pp.15149-15154, Dec. 2001.
[61] D.G. Beer, S.L.R. Kardia, C.C. Huang, T.J. Giordano, A.M. Levin, D.E. Misek, L. Lin, G.A. Chen, T.G. Gharib, D.G. Thomas, M.L. Lizyness, R. Kuick, S. Hayasaka, J.M.G. Taylor, M.D. Iannettoni, M.B. Orringer, and S. Hanash, “Gene-Expression Profiles Predict Survival of Patients with Lung Adenocarcinoma,” Nature Medicine, vol. 8, no. 8, pp. 816-824, Aug. 2002.
[62] S.A. Armstrong, J.E. Staunton, L.B. Silverman, R. Pieters, M.L. den Boer, M.D. Minden, S.E. Sallan, E.S. Lander, T.R. Golub, and S.J. Kormeyer, “MLL Translocations Specify a Distinct Gene Expression Profile That Distinguishes a Unique Leukemia,” Nature Genetics, vol. 30, pp. 41-47, Jan. 2002.
[63] E.J. Yeoh, M.E. Ross, S.A. Shurtleff, W.K. Williams, D. Patel, R. Mahfouz, F.G. Behm, S.C. Raimondi, M.V. Relling, A. Patel, C. Cheng, D. Campana, D. Wikins, X. Zhou, J. Li, H. Liu, C.H. Pui, W.E. Evans, C. Naeve, L. Wong, and J.R. Downing, “Classification, Subtype Discovery, and Prediction of Outcome in Pediatric Acute Lymphoblastic Leukemia by Gene Expression Profiling,” Cancer Cell, vol. 1, no. 2, pp. 133-143, 2002.
[64] V.N. Vapnik, The Nature of Statistical Learning Theory. Springer-Verlag, 1995.
[65] K.C. Chou and C.T. Zhang, “Review: Prediction of Protein Structural Classes,” Critical Rev. in Biochemistry and Molecular Biology, vol. 30, pp. 275-349, 1995.
[66] K.D. Kedarisetti, L.A. Kurgan, and S. Dick, “Classifier Ensembles for Protein Structural Class Prediction with Varying Homology,” Biochemical and Biophysical Research Comm., vol. 348, no. 3, pp. 981-988, Sept. 2006.
[67] H.B. Shen and K.C. Chou, “Gpos-PLoc: An Ensemble Classifier for Predicting Subcellular Localization of Gram-positive Bacterial Proteins,” Protein Eng. Design and Selection, vol. 20, no. 1, pp. 39-46, Jan. 2007.
[68] V. Sindhwani, P. Bhattacharya, and S. Rakshit, “Information Theoretic Feature Crediting Multiclass Support Vector Machine,” Proc. First SIAM Int'l Conf. Data Mining (ICDM '01), pp. 5-7, Apr. 2001.
[69] A. Statnikov, C.F. Aliferis, I. Tsamardinos, D. Hardin, and S. Levy, “A Comprehensive Evaluation of Multicategory Classification Methods for Microarray Gene Expression Cancer Diagnosis,” Bioinformatics, vol. 25, no. 5, pp. 631-643, 2005.
30 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool