The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.06 - Nov.-Dec. (2012 vol.9)
pp: 1751-1765
Zhiwen Yu , Higher Educ. Megacenter, South China Univ. of Technol., Guangzhou, China
Le Li , Higher Educ. Megacenter, South China Univ. of Technol., Guangzhou, China
J. You , Dept. of Comput., Hong Kong Polytech. Univ., Kowloon, China
Hau-San Wong , Dept. of Comput. Sci., City Univ. of Hong Kong, Kowloon, China
Guoqiang Han , Higher Educ. Megacenter, South China Univ. of Technol., Guangzhou, China
ABSTRACT
In order to perform successful diagnosis and treatment of cancer, discovering, and classifying cancer types correctly is essential. One of the challenging properties of class discovery from cancer data sets is that cancer gene expression profiles not only include a large number of genes, but also contains a lot of noisy genes. In order to reduce the effect of noisy genes in cancer gene expression profiles, we propose two new consensus clustering frameworks, named as triple spectral clustering-based consensus clustering (SC3) and double spectral clustering-based consensus clustering (SC2 Ncut) in this paper, for cancer discovery from gene expression profiles. SC3 integrates the spectral clustering (SC) algorithm multiple times into the ensemble framework to process gene expression profiles. Specifically, spectral clustering is applied to perform clustering on the gene dimension and the cancer sample dimension, and also used as the consensus function to partition the consensus matrix constructed from multiple clustering solutions. Compared with SC3, SC2 Ncut adopts the normalized cut algorithm, instead of spectral clustering, as the consensus function. Experiments on both synthetic data sets and real cancer gene expression profiles illustrate that the proposed approaches not only achieve good performance on gene expression profiles, but also outperforms most of the existing approaches in the process of class discovery from these profiles.
INDEX TERMS
Cancer, Gene expression, Clustering algorithms, Partitioning algorithms, Bioinformatics, Noise measurement,cancer gene expression profiles, Cluster ensemble, spectral clustering
CITATION
Zhiwen Yu, Le Li, J. You, Hau-San Wong, Guoqiang Han, "SC³: Triple Spectral Clustering-Based Consensus Clustering Framework for Class Discovery from Cancer Gene Expression Profiles", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.9, no. 6, pp. 1751-1765, Nov.-Dec. 2012, doi:10.1109/TCBB.2012.108
REFERENCES
[1] T.R. Golub, D.K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J.P. Mesirov, H. Coller, M. Loh, J. Downing, M. Caligiuri, C. Bloomfield, and E. Lander, “Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression" Science, vol. 286, no. 5439, pp. 531-537, 1999.
[2] U. Alon, N. Barkai, D. Notterman, K. Gish, S. Ybarra, D. Mack, and A. Levine, “Broad Patterns of Gene Expression Revealed by Clustering Analysis of Tumor and Normal Colon Tissues Probed by Oligonucleotide Arrays,” Proc. Nat'l Academy of Sciences USA, vol. 96, pp. 6745-6750, 1999.
[3] H. Cho and I.S. Dhillon, “Coclustering of Human Cancer Microarrays Using Minimum Sum-Squared Residue Coclustering,” IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 5, no. 2, pp. 385-400, July 2008.
[4] P. Mahata, “Exploratory Consensus of Hierarchical Clusterings for Melanoma and Breast Cancer,” IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 7, no. 1, pp. 138-152, Jan.-Mar. 2010.
[5] K.-S. Leung, K.H. Lee, J.-F. Wang, E.Y.T. Ng, H.L.Y. Chan, S.K.W. Tsui, T.S.K. Mok, P.C.-H. Tse, and J.J.Y. Sung, “Data Mining on DNA Sequences of Hepatitis B Virus,” IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 8, no. 2, pp. 428-440, Mar./Apr. 2011.
[6] C.-H. Zheng, D.-S. Huang, L. Zhang, and X.-Z. Kong, “Tumor Clustering Using Nonnegative Matrix Factorization With Gene Selection,” IEEE Trans. Information Technology in Biomedicine, vol. 13, no. 4, pp. 599-607, July 2009.
[7] C.-H. Zheng, L. Zhang, V.T. Ng, C.K. Shiu, and D.-S. Huang, “Molecular Pattern Discovery Based on Penalized Matrix Decomposition,” IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 8, no. 6, pp. 1592-1603, Nov./Dec. 2011.
[8] S.A. Salem, L.B. Jack, and A.K. Nandi, “Investigation of Self-Organizing Oscillator Networks for Use in Clustering Microarray Data,” IEEE Trans. NanoBioscience, vol. 7, no. 1, pp. 65-79, Mar. 2008.
[9] M. Pradipta, “Mutual Information-Based Supervised Attribute Clustering for Microarray Sample Classification,” IEEE Trans. Knowledge and Data Eng., vol. 24, no. 1, pp. 127-140, Jan. 2012.
[10] R. Shen, A.B. Olshen, and M. Ladanyi, “Integrative Clustering of Multiple Genomic Data Types Using a Joint Latent Variable Model with Application to Breast and Lung Cancer Subtype Analysis,” Bioinformatics, vol. 25, no. 22, pp. 2906-2912, 2009.
[11] J. Baek and G.J. McLachlan, “Mixtures of Common T-Factor Analyzers for Clustering High-Dimensional Microarray Data,” Bioinformatics, vol. 27, no. 9, pp. 1269-1276, 2011.
[12] G. Kerr, H.J. Ruskin, M. Crane, and P. Doolan, “Techniques for Clustering Gene Expression Data,” Computers in Biology and Medicine, vol. 38, no. 3, pp. 283-293, 2008.
[13] B.P.P.v. Houte and J. Heringa, “Accurate Confidence Aware Clustering of Array CGH Tumor Profiles,” Bioinformatics, vol. 26, no. 1, pp. 6-14, 2010.
[14] A.E. Teschendorff, A. Naderi, N.L. Barbosa-Morais, and C. Caldas, “PACK: Profile Analysis Using Clustering and Kurtosis to Find Molecular Classifiers in Cancer,” Bioinformatics, vol. 22, no. 18, pp. 2269-2275, 2006.
[15] D.C. Koestler, C.J. Marsit, B.C. Christensen, M.R. Karagas, R. Bueno, D.J. Sugarbaker, K.T. Kelsey, and E.A. Houseman, “Semi-Supervised Recursively Partitioned Mixture Models for Identifying Cancer Subtypes,” Bioinformatics, vol. 26, no. 20, pp. 2578-2585, 2010.
[16] R.D. Bin and D. Risso, “A Novel Approach to the Clustering of Microarray Data Via Nonparametric Density Estimation,” BMC Bioinformatics, vol. 12, article 49, 2011.
[17] Z. Yu and H.-S. Wong, “Knowledge Based Cluster Ensemble for Cancer Discovery from Biomolecular Data,” IEEE Trans. NanoBioscience, vol. 10, no. 2, pp. 76-85, June 2011.
[18] S. Dudoit and J. Fridlyand, “A Prediction-Based Resampling Method to Estimate the Number of Clusters in a Data Set,” Genome Biology, vol. 3, no. 7, pp. 0036.1-0036.21, 2002.
[19] S. Dudoit and J. Fridlyand, “Bagging to Improve the Accuracy of a Clustering Procedure,” Bioinformatics, vol. 19, no. 9, pp. 1090-1099, 2003.
[20] M. Smolkin and D. Ghosh, “Cluster Stability Scores for Microarray Data in Cancer Studies,” BMC Bioinformatics, vol. 4, article 36, 2003.
[21] S. Monti, P. Tamayo, J. Mesirov, and T. Golub, “Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data,” Machine Learning, vol. 52, pp. 91-118, 2003.
[22] Z. Yu and H.-S. Wong, “Knowledge Based Cluster Ensemble for Cancer Discovery from Biomolecular Data,” IEEE Trans. NanoBioscience, vol. 10, no. 2, pp. 76-85, June 2011.
[23] Z. Yu and H.-S. Wong, “Class Discovery from Gene Expression Data Based on Perturbation and Cluster Ensemble,” IEEE Trans. NanoBioscience, vol. 8, no. 2, pp. 147-160, June 2009.
[24] C. Smyth and D. Coomans, “Clustering Microarrays with Predictive Weighted Ensembles,” IEEE Symp. Computational Intelligence and Bioinformatics and Computational Biology (CIBCB '07), pp. 98-105, 2007.
[25] G. Valentini, “Clusterv: A Tool for Assessing the Reliability of Clusters Discovered in DNA Microarray Data,” Bioinformatics, vol. 22, no. 3, pp. 369-370, 2006.
[26] A. Bertoni and G. Valentini, “Randomized Maps for Assessing the Reliability of Patients Clusters in DNA Microarray Data Analyses,” Artificial Intelligence in Medicine, vol. 37, no. 2, pp. 85-109, 2006.
[27] G. Valentini, “Mosclust: A Software Library for Discovering Significant Structures in Bio-Molecular Data,” Bioinformatics, vol. 23, no. 3, pp. 387-389, 2007.
[28] A. Bertoni and G. Valentini, “Model Order Selection for Biomolecular Data Clustering,” BMC Bioinformatics, vol. 8 (Suppl 2), article S7, 2007.
[29] T.I. Simpson, J.D. Armstrong, and A.P. Jarman, “Merged Consensus Clustering to Assess and Improve Class Discovery with Microarray Data,” BMC Bioinformatics, vol. 11, article 590, 2010.
[30] N. Iam-on, T. Boongoen, and S. Garrett, “LCE: A Link-Based Cluster Ensemble Method for Improved Gene Expression Data Analysis,” Bioinformatics, vol. 26, no. 12, pp. 1513-1519, 2010.
[31] T. Grotkjaer, O. Winther, B. Regenberg, J. Nielsen, and L.K. Hansen, Robust Multi-Scale Clustering of Large DNA Microarray Data Sets with the Consensus Algorithm,” Bioinformatics, vol. 22, no. 1, pp. 58-67, 2006.
[32] R. Avogadri and G. Valentini, “Fuzzy Ensemble Clustering Based on Random Projections for DNA Microarray Data Analysis,” Artificial Intelligence in Medicine, vol. 45, nos. 2/3, pp. 173-183, 2009.
[33] A. Bertoni and G. Valentini, “Discovering Multi-Level Structures in Bio-Molecular Data through the Bernstein Inequality,” BMC Bioinformatics, vol. 9 (Suppl 2), article S4, 2008.
[34] J. Handl, J. Knowles, and D.B. Kell, “Computational Cluster Validation in Post-Genomic Data Analysis,” Bioinformatics, vol. 21, no. 15, pp. 3201-3212, 2005.
[35] A. Ng, M. Jordan, and Y. Weiss, “On Spectral Clustering: Analysis and an Algorithm,” Advances in Neural Information Processing Systems, vol. 14, pp. 849-856, 2001.
[36] S. Jianbo and M. Jitendra, “Normalized Cuts and Image Segmentation,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 888-905, Aug. 2000.
[37] Y. Hoshida, J.P. Brunet et al., “Subclass Mapping: Identifying Common Subtypes in Independent Disease Data Sets,” PLoS One, vol. 2, no. 11, p. e1195, 2007.
[38] I. Hedenfalk, D. Duggan, Y. Chen, M. Radmacher, M. Bittner, R. Simon, P. Meltzer, B. Gusterson, M. Esteller, O.P. Kallioniemi, B. Wilfond, A. Borg, J. Trent, M. Raffeld, Z. Yakhini, A. Ben-Dor, E. Dougherty, J. Kononen, L. Bubendorf, W. Fehrle, S. Pittaluga, S. Gruvberger, N. Loman, O. Johannsson, H. Olsson, and G. Sauter, “Gene-Expression Profiles in Hereditary Breast Cancer,” New England J. Medicine, vol. 344, no. 8, pp. 539-548, 2001.
[39] E.J. Yeoh, M.E. Ross et al., , Classification, Subtype Discovery, and Prediction of Outcome in Pediatric Acute Lymphoblastic Leukemia by Gene Expression Profiling,” Cancer Cell, vol. 1, pp. 133-143, Mar. 2002.
[40] A.I. Su, M.P. Cooke et al., “, Large-Scale Analysis of the Human and Mouse Transcriptomes,” Proc. Nat'l Academy of Sciences USA, vol. 99, no. 7, pp. 4465-4470, 2002.
[41] D. Chowdary, J. Lathrop, J. Skelton, K. Curtin, T. Briggs, Y. Zhang, J. Yu, Y. Wang, and A. Mazumder, “Prognostic Gene Expression Signatures Can be Measured in Tissues Collected in RNAlater Preservative,” J. Molecular Diagnosis, vol. 8, no. 1, pp. 31-39, 2006.
[42] S. Ramaswamy, P. Tamayo, R. Rifkin, S. Mukherjee, C.-H. Yeang, M. Angelo, C. Ladd, M. Reich, E. Latulippe, J.P. Mesirov, T. Poggio, W. Gerald, M. Loda, E.S. Lander, and T.R. Golub, “Multi-Class Cancer Diagnosis Using Tumor Gene Expression Signatures,” Proc. Nat'l Academy of Sciences USA, vol. 98, no. 26, pp. 15149-15154, 2001.
[43] W.M. Rand, “Objective Criteria for the Evaluation of Clustering Methods,” J. Am. Statistical Assoc., vol. 66, pp. 846-850, 1971.
[44] L.I. Kuncheva and D. Vetrov, “Evaluation of Stability of k-Means Cluster Ensembles with Respect to Random Initialization,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28, no. 11, pp. 1798-1808, Nov. 2006.
[45] R. Tibshirani and G. Walther, “Cluster Validation by Prediction Strength,” J. Computational and Graphical Statistics, vol. 14, pp. 511-528, 2005.
[46] B. Haibe-Kains, C. Desmedt, A. Culhane, S.M. Loi, G. Bontempi, J. Quackenbush, and C. Sotiriou, “A Three-Gene Model to Robustly Identify Breast Cancer Molecular Subtypes,” J. Nat'l Cancer Inst., vol. 104, no. 4, pp. 311-325, 2012, doi:10.1093/jnci/djr545.
[47] R. Braun, G. Leibon, S. Pauls, and D. Rockmore, “Partition Decoupling for Multi-Gene Analysis of Gene Expression Profiling Data,” BMC Bioinformatics, vol. 12, article 497, 2011, doi:10.1186/1471-2105-12-497.
[48] P. Qiu and S.K. Plevritis, “Simultaneous Class Discovery and Classification of Microarray Data Using Spectral Analysis,” J. Computational Biology, vol. 16, pp. 935-944, 2009.
[49] Y. Kluger, R. Basri, J.T. Chang, and M. Gerstein, “Spectral Biclustering of Microarray Data: Coclustering Genes and Conditions,” Genome Research, vol. 13, pp. 703-716, 2003.
[50] M. De Souto, I. Costa, D. De Araujo, T. Ludermir, and A. Schliep, “Clustering Cancer Gene Expression Data: A Comparative Study,” BMC Bioinformatics, vol. 9, article 497, 2008.
[51] D.J. Higham, G. Kalna, and M. Kibble, “Spectral Clustering and Its Use in Bioinformatics,” J. Computational and Applied Math., vol. 204, pp. 25-37, 2007.
[52] H. Akaike, “Prediction and Entropy,” Celebration of Statistics, A.C. Atkinson and S.E. Fienberg, eds., pp. 1-24, Springer-Verlag, 1985.
[53] J. Rissanen, “Modeling by Shortest Data Description,” Automatica, vol. 14, pp. 465-471, 1978.
[54] G. Schwarz, “Estimating the Dimension of a Model,” Annals of Statistics, vol. 6, pp. 461-464, 1976.
[55] L. Konopnicki and P.J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley and Sons, 1990.
[56] D.L. Davies and D.W. Bouldin, “A Cluster Separation Measure,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 1, no. 4, pp. 224-227, Apr. 1979.
[57] J.C. Dunn, “Well Separated Clusters and Optimal Fuzzy Partitions,” J. Cybernetics, vol. 4, pp. 95-104, 1974.
[58] R. Tibshirani, G. Walther, and T. Hastie, “Estimating the Number of Clusters in a Data Set via the Gap Statistic,” J. Royal Statistical Soc. B, vol. 63, no. 2, pp. 411-423, 2001.
59 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool