Issue No. 01 - January-March (2010 vol. 7)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.33
Pritha Mahata , University of Newcastle, Australia, Newcastle
Finding subtypes of heterogeneous diseases is the biggest challenge in the area of biology. Often, clustering is used to provide a hypothesis for the subtypes of a heterogeneous disease. However, there are usually discrepancies between the clusterings produced by different algorithms. This work introduces a simple method which provides the most consistent clusters across three different clustering algorithms for a melanoma and a breast cancer data set. The method is validated by showing that the Silhouette, Dunne's and Davies-Bouldin's cluster validation indices are better for the proposed algorithm than those obtained by k-means and another consensus clustering algorithm. The hypotheses of the consensus clusters on both the data sets are corroborated by clear genetic markers and 100 percent classification accuracy. In Bittner et al.'s melanoma data set, a previously hypothesized primary cluster is recognized as the largest consensus cluster and a new partition of this cluster into two subclusters is proposed. In van't Veer et al.'s breast cancer data set, previously proposed "basal” and "luminal A” subtypes are clearly recognized as the two predominant clusters. Furthermore, a new hypothesis is provided about the existence of two subgroups within the "basal” subtype in this data set. The clusters of van't Veer's data set is also validated by high classification accuracy obtained in the data set of van de Vijver et al.
Consensus clustering, melanoma, breast cancer.
P. Mahata, "Exploratory Consensus of Hierarchical Clusterings for Melanoma and Breast Cancer," in IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 7, no. , pp. 138-152, 2008.