13th IEEE International Conference on BioInformatics and BioEngineering (2003)
Mar. 10, 2003 to Mar. 12, 2003
Daxin Jiang , State University of New York at Buffalo
Jian Pei , State University of New York at Buffalo
Aidong Zhang , State University of New York at Buffalo
<p>Clustering the time series gene expression data is an important task in bioinformatics research and biomedical applications. Recently, some clustering methods have been adapted or proposed. However, some concerns still remain, such as the robustness of the mining methods, as well as the quality and the interpretability of the mining results.</p> <p>In this paper, we tackle the problem of effectively clustering time series gene expression data by proposing algorithm DHC, a density-based, hierarchical clustering method. We use a density-based approach to identify the clusters such that the clustering results are of high quality and robustness. Moreover, The mining result is in the form of a density tree, which uncovers the embedded clusters in a data set. The inner-structures, the borders and the outliers of the clusters can be further investigated using the attraction tree, which is an intermediate result of the mining. By these two trees, the internal structure of the data set can be visualized effectively. Our empirical evaluation using some real-world data sets show that the method is effective, robust and scalable. It matches the ground truth provided by bioinformatics experts very well in the sample data sets.</p>
A. Zhang, J. Pei and D. Jiang, "DHC:A Density-Based Hierarchical Clustering Method for Time Series Gene Expression Data," 13th IEEE International Conference on BioInformatics and BioEngineering(BIBE), Bethesda, Maryland, 2003, pp. 393.