The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.01 - January-March (2010 vol.7)
pp: 138-152
Pritha Mahata , University of Newcastle, Australia, Newcastle
ABSTRACT
Finding subtypes of heterogeneous diseases is the biggest challenge in the area of biology. Often, clustering is used to provide a hypothesis for the subtypes of a heterogeneous disease. However, there are usually discrepancies between the clusterings produced by different algorithms. This work introduces a simple method which provides the most consistent clusters across three different clustering algorithms for a melanoma and a breast cancer data set. The method is validated by showing that the Silhouette, Dunne's and Davies-Bouldin's cluster validation indices are better for the proposed algorithm than those obtained by k-means and another consensus clustering algorithm. The hypotheses of the consensus clusters on both the data sets are corroborated by clear genetic markers and 100 percent classification accuracy. In Bittner et al.'s melanoma data set, a previously hypothesized primary cluster is recognized as the largest consensus cluster and a new partition of this cluster into two subclusters is proposed. In van't Veer et al.'s breast cancer data set, previously proposed "basal” and "luminal A” subtypes are clearly recognized as the two predominant clusters. Furthermore, a new hypothesis is provided about the existence of two subgroups within the "basal” subtype in this data set. The clusters of van't Veer's data set is also validated by high classification accuracy obtained in the data set of van de Vijver et al.
INDEX TERMS
Consensus clustering, melanoma, breast cancer.
CITATION
Pritha Mahata, "Exploratory Consensus of Hierarchical Clusterings for Melanoma and Breast Cancer", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.7, no. 1, pp. 138-152, January-March 2010, doi:10.1109/TCBB.2008.33
REFERENCES
[1] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning, Statistics. Springer, 2001.
[2] M.B. Eisen, P.T. Spellman, P.O. Brown, and D. Botstein, "Cluster Analysis and Display of Genome-Wide Expression Patterns," Proc. Nat'l Academy of Sciences USA, vol. 95, pp. 14 863-14 868, 1998.
[3] T. Sorlie, R. Tibshirani, J. Parker, T. Hastie, J.S. Marron, A. Nobel, S. Deng, H. Johnsen, R. Pesich, S.G.J. Demeter, C.M. Perou, P.E. Lonning, P.O. Brown, A. Brresen-Dale, and D. Botstein, "Repeated Observation of Breast Tumor Subtypes in Independent Gene Expression Data Sets," Proc. Nat'l Academy of Sciences USA, vol. 100, no. 14, pp. 8418-8423, July 2003.
[4] L.J. van't Veer, H. Dai, M.J. van de Vijver, Y.D. He, A.A.M. Hart, M. Mao, H.L. Peterse, K. van der Kooy, M.J. Marton, A.T. Witteveen, G.J. Schreiber, R.M. Kerkhoven, C. Roberts, P.S. Linsley, R. Bernards, and S.H. Friend, "Computational Cluster Validation in Post-Genomic Data Analysis," Nature, vol. 415, pp. 530-536, 2005.
[5] E.N. Adams, "Consensus Techniques and the Comparison of Taxonomic Trees," Systematic Zoology, vol. 21, no. 4, pp. 390-397, 1972.
[6] D.A. Neumann, "Faithful Consensus Methods for n-Trees," Math. Biosciences, vol. 63, pp. 271-287, 1983.
[7] R.R. Sokal and F.J. Rohlf, "Taxonomic Congruence in the Leptopodomorpha Re-Examined," Systematic Zoology, vol. 30, pp. 304-325, 1981.
[8] T. Margush and F.R. McMorris, "Consensus n-Trees," Bull. Math. Biology, vol. 43, pp. 239-244, 1981.
[9] M. Wilkinson, "Common Cladistic Information and Its Consensus Representation: Reduced Adams and Reduced Cladistic Consensus Trees and Profiles," Systematic Biology, vol. 43, pp. 343-368, 1994.
[10] A.M. Krieger and P.E. Green, "A Generalized Rand-Index Method for Consensus Clustering of Separate Partitions of Separate Partitions of the Same Data Base," J. Classification, vol. 16, pp. 63-89, 1999.
[11] A. Topchy, B. Minaei-Bidgoli, A.K. Jain, and W.F. Punch, "Adaptive Clustering Ensembles," J. Classification, vol. 16, pp. 63-89, 1999.
[12] S. Monti, P. Tamayo, J. Mesirov, and T. Golub, "Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data," Machine Learning, vol. 52, no. 1/2, pp. 91-118, 2003.
[13] S. Swift, A. Tucker, V. Vinciotti, N. Martin, C. Orengo, X. Liu, and P. Kellam, "Consensus Clustering and Functional Interpretation of Gene-Expression Data," Genome Biology, vol. 5, no. R94, 2004.
[14] V. Filkov and S. Skiena, "Heterogeneous Data Integration with the Consensus Clustering Formalism," Proc. Int'l Workshop Data Integration in the Life Sciences (DILS '04), vol. 2994, pp. 110-123, 2004.
[15] T. Grotkjaer, O. Winther, B. Regenberg, J. Nielsen, and L.K. Hansen, "Robust Multi-Scale Clustering of Large DNA Microarray Datasets with the Consensus Algorithm," Bioinformatics, vol. 22, no. 1, pp. 58-67, 2006.
[16] S. Kirkpatrick, C.D.J. Gelatt, and M.P. Vecchi, "Optimization by Simulated Annealing," Science, vol. 220, pp. 671-680, 1983.
[17] Y. Wakabayashi, "The Complexity of Computing Medians of Relations," Resenhas IME-USP, vol. 3, pp. 323-349, 1998.
[18] M. Krivanek and J. Moravek, "Hard Problems in Hierarchical-Tree Clustering," Acta Informatica, vol. 23, pp. 311-323, 1986.
[19] B. Raskutti, H. Ferra, and A. Kowalczyk, "Combining Clustering and Co-Training to Enhance Text Classification Using Unlabelled Data," Proc. Eighth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD), 2002.
[20] A. Strehl and J. Ghosh, "Cluster Ensembles—A Knowledge Reuse Framework for Combining Partitionings," Proc. 18th Nat'l Conf. Artificial Intelligence (AAAI '02), pp. 93-98, 2002.
[21] M. Bittner, P. Meltzer, Y. Chen, Y. Jiang, E. Seftor, M. Hendrix, M. Radmacher, R. Simon, Z. Yakhinik, A. Ben-Dork, N. Sampask, E. Dougherty, E. Wang, F. Marincola, C. Gooden, J. Lueders, A. Glatfelter, P. Pollock, J. Carpten, E. Gillanders, D. Leja, K. Dietrich, C. Beaudry, M. Berens, D. Alberts, V. Sondak, N. Hayward, and J. Trent, "Molecular Classification of Cutaneous Malignant Melanoma by Gene Expression Profiling," Nature, vol. 406, no. 3, 2000.
[22] P. Mahata, W. Costa, C. Cotta, and P. Moscato, "Hierarchical Clustering, Languages and Cancer," Proc. EvoWorkshops '06, pp. 67-78, 2006.
[23] R. Rizzi, P. Mahata, W. Costa, and P. Moscato, Hierarchical Clustering Using Arithmetic-Harmonic Cut, submitted, 2007.
[24] I. Dyen, J.B. Kruskal, and P. Black, "An Indo-European Classification: A Lexicostatistical Experiment," Trans. Am. Philosophical Soc., New Series, vol. 82, no. 5, pp. 1-132, 1992.
[25] D.T. Ross, U. Scherf, M. Eisen, C. Perou, C. Rees, P. Spellman, V. Iyer, S. Jeffrey, M. Rijn, M. Waltham, A. Pergamenschikov, J.C. Lee, D. Lashkari, D. Shalon, T. Myers, J.N. Weinstein, D. Botstein, and P. Brown, "Systematic Variation in Gene Expression Patterns in Human Cancer Cell Lines," Nature Genetics, vol. 24, pp. 227-235, Mar. 2000.
[26] J. Shi and J. Malik, "Normalized Cuts and Image Segmentation," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 888-905, Aug. 2000.
[27] W.M. Rand, "Objective Criteria for the Evaluation of Clustering Methods," J. Am. Statistical Assoc., vol. 66, pp. 846-850, 1971.
[28] L. Hubert and P. Arabie, "Comparing Partitions," J. Classification, vol. 2, no. 1, pp. 193-218, 1985.
[29] P. Rousseeuw, "Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis," J. Computational and Applied Math., vol. 20, pp. 53-65, 1987.
[30] J. Bezdek and N. Pal, "Some New Indexes of Cluster Validity," IEEE Trans. Systems, Man, and Cybernetics, vol. 28 B, pp. 301-315, 1998.
[31] D.L. Davies and D.W. Bouldin, "A Cluster Separation Measure," IEEE Trans. Pattern Recognition and Machine Intelligence, vol. 1, no. 2, pp. 224-227, Feb. 1979.
[32] F. Azuaje and N. Bolshakova, "Clustering Genome Expression Data: Design and Evaluation Principles," Understanding and Using Microarray Analysis Techniques: A Practical Guide, D. Berrar, W. Dubitzky, and M. Granzow, eds., Springer-Verlag, 2002.
[33] T.R. Golub, D.K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J.P. Mesirov, H. Coller, M.L. Loh, J.R. Downing, M.A. Caligiuri, C.D. Bloompeld, and E.S. Lander, "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring," Science, vol. 286, pp. 531-537, 1999.
[34] I.H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques, second ed. Morgan Kaufmann, 2005.
[35] M. Smolkin and D. Ghosh, "Cluster Stability Scores for Microarray Data in Cancer Studies," BMC Bioinformatics, vol. 4, no. 36, 2003.
[36] A. Ben Dor, R. Shamir, and Z. Yakhini, "Clustering Gene Expression Patterns," J. Computational Biology, vol. 6, pp. 281-297, 1999.
[37] T.B. Lewis, J.E. Robison, R. Bastien, B. Milash, K. Boucher, W.E. Samlowski, S.A. Leachman, R.D. Noyes, C.T. Wittwer, L. Perreard, and P.S. Bernard, "Molecular Classification of Melanoma Using Real-Time Quantitative Reverse Transcriptase-Polymerase Chain Reaction," Cancer, vol. 104, no. 8, pp. 1678-1686, Oct. 2005.
[38] I.M. Bachmann, O. Straume, H.E. Puntervoll, M.B. Kalvenes, and L.A. Akslen, "Importance of p-Cadherin, Beta-Catenin, and wnt5a/frizzled for Progression of Melanocytic Tumors and Prognosis in Cutaneous Melanoma," Clinical Cancer Research, vol. 11, no. 24, Pt. 1, pp. 8606-8614, 2005.
[39] H. Boukerche, Z.Z. Su, L. Emdad, D. Sarkar, and P.B. Fisher, "mda-9/syntenin Regulates the Metastatic Phenotype in Human Melanoma Cells by Activating Nuclear Factor-Kappab," Cancer Research, vol. 67, no. 4, pp. 1812-1822, Feb. 2007.
[40] K. McPherson, C.M. Steel, and J.M. Dixon, "ABC of Breast Diseases: Breast Cancer—Epidemiology, Risk Factors, and Genetics," British Medical J., vol. 321, pp. 624-628, 2000.
[41] L.C. Dorssers, S.V. der Flier, A.B.T. van Agthoven, J. Veldscholte, E.M. Berns, J. Klijn, L.V. Beex, and J.A. Foekens, "Tamoxifen Resistance in Breast Cancer: Elucidating Mechanisms," Drugs, vol. 61, no. 12, pp. 1721-1733, 2001.
[42] I. Hedenfalk, D. Duggan, Y. Chen, M. Radmacher, M. Bittner, R. Simon, P. Meltzer, B. Gusterson, M. Esteller, O.P. Kallioniemi, B.W. Wilfond, A.B.J. Trent, M. Raffeld, Z. Yakhini, A. Ben-Dor, E. Dougherty, J. Kononen, L. Bubendorf, S. Fehrle, S. Pittaluga, S.G.S.N. Loman, O. Johannsson, H. Olsson, and G. Sauter, "Gene-Expression Profiles in Hereditary Breast Cancer," New England J. Medicine, vol. 344, no. 8, pp. 539-548, 2001.
[43] K.E. Lee, N. Sha, E.R. Dougherty, M. Vannucci, and B.K. Mallick, "Gene Selection: A Bayesian Variable Selection Approach," Bioinformatics, vol. 19, pp. 90-97, 2003.
[44] P. Mahata and K. Mahata, "Selecting Differentially Expressed Genes Using Minimum Probability of Classification Error," J. Biomedical Informatics, vol. 40, no. 6, pp. 775-786, 2007.
[45] Y. Pawitan, J. Bjhle, L. Amler, A. Borg, S. Egyhazi, P. Hall, X. Han, L. Holmberg, F. Huang, S. Klaar, E.T. Liu, L. Miller, H. Nordgren, A. Ploner, K. Sandelin, P.M. Shaw, J. Smeds, L. Skoog, S. Wedrn, and J. Bergh, "Gene Expression Profiling Spares Early Breast Cancer Patients from Adjuvant Therapy: Derived and Validated in Two Population-Based Cohorts," Breast Cancer Research, vol. 7, pp. R953-R964, 2005.
[46] S. Paik, S. Shak, G. Tang, C. Kim, J. Baker, M. Cronin, F.L. Baehner, M.G. Walker, D. Watson, T. Park, W. Hiller, E.R. Fisher, L. Wickerham, J. Bryant, and N. Wolmark, "A Multigene Assay to Predict Recurrence of Tamoxifen-Treated, Node-Negative Breast Cancer," New England J. Medicine, no. 27, pp. 2817-2826, 2004.
[47] T. Sorlie, C.M. Perou, R. Tibshirani, T. Aas, S. Geisler, H. Johnsen, T. Hastie, M.B. Eisen, M. ban de Rijn, S.S. Jeffrey, T. Thorsen, H. Quist, J.C. Matese, P.O. Brown, D. Botstein, P.E. Lonnin, and A. Brresen-Dale, "Gene Expression Patterns of Breast Carcinomas Distinguish Tumor Subclasses with Clinical Implications," Proc. Nat'l Academy of Sciences USA, vol. 98, no. 19, pp. 10 869-10 874, Sept. 2001.
[48] S.N. Agoff, P.E. Swanson, H. Linden, S.E. Hawes, and T.J. Lawton, "Androgen Receptor Expression in Estrogen Receptor-Negative Breast Cancer. Immunohistochemical, Clinical, and Prognostic Associations," Am. J. Clinical Pathology, vol. 120, no. 5, pp. 725-731, Nov. 2003.
[49] H. Nakshatri and S. Badve, "Foxa1 as a Therapeutic Target for Breast Cancer," Expert Opinion on Therapeutic Targets, vol. 11, no. 4, pp. 507-514, Apr. 2007.
[50] R. Mehra, S. Varambally, L. Ding, R. Shen, M.S. Sabel, D. Ghosh, A.M. Chinnaiyan, and C.G. Kleer, "Identification of GATA3 as a Breast Cancer Prognostic Marker by Global Gene Expression Meta-Analysis," Cancer Research, vol. 65, pp. 11 259-11 264, Dec. 2005.
[51] M. Zafrakas, M. Chorovicer, I. Klaman, G. Kristiansen, P. Wild, U. Heindrichs, R. Knchel, and E. Dahl, "Systematic Characterization of Gabrp Expression in Sporadic Breast Cancer and Normal Breast Tissue," Int'l J. Cancer, vol. 118, no. 6, pp. 1453-1459, 2006.
[52] M.J. van de Vijver, Y.D. He, L.J. van't Veer, H. Dai, A.A.M. Hart, D.W. Voskuil, G.J. Schreiber, J.L. Peterse, C. Roberts, M.J. Marton, M. Parrish, D. Atsma, A.T. Witteveen, A. Glas, L. Delahaye, T. van der Velde, H. Bertelink, S. Rodenhuis, E.T. Rutgers, S.H. Friend, and R. Bernards, "A Gene-Expression Signature as a Predictor of Survival in Breast Cancer," New England J. Medicine, vol. 347, no. 25, pp. 1999-2009, 2002.
18 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool