Fourth IEEE Symposium on Bioinformatics and Bioengineering (BIBE'04)
Integration of Cluster Ensemble and Text Summarization for Gene Expression Analysis
Taichung, Taiwan, ROC
May 19-May 21
ISBN: 0-7695-2173-8
Generating high quality gene clusters and identifying the underlying biological mechanism of the gene cluster are the important goals of clustering gene expression analysis. To get high quality cluster results, most of the current approaches rely on choosing the best cluster algorithm whose design biases and assumptions meet the underlying distribution of the data set. There are two issues for this approach: (1) usually the underlying data distribution of the gene expression data sets is unknown, and (2) there are so many clustering algorithms available and it is very challenging to choose the proper one. To provide a textual summary of the gene clusters, the most explored approach is the extractive approach that essentially builds upon techniques borrowed from the information retrieval, in which the objective is to provide terms to be used for query expansion, and not to act as a stand alone summary for the entire document sets. Another drawback is that the clustering quality and cluster interpretation are treated as two isolated research problems and are studied separately. But cluster quality and cluster interpretation are closely related and must be addressed in a coherent and unified way. It is essential to have relatively high quality clusters first, in order to get a correct, informative biological explanation of the gene cluster, otherwise, the biological explanation will be incorrect or misleading, no matter how good or robust the text summarization technique is. Based on this consideration, we design and develop a unifed system GE-Miner (Gene Expression Miner) to address these challenging issues in a principled and general manner by itegrating cluster ensemble and text symmarization and provide an environment for comprehensive gene expression data analysis. Experiemental results demonstrate that our system can obtian hugh quality clusters and provide concise and infomrative textual summary for the gene clusters.
Index Terms:
cluster ensemble, text mining, gene expression analysis
Citation:
Xiaohua Hu, "Integration of Cluster Ensemble and Text Summarization for Gene Expression Analysis," bibe, pp.251, Fourth IEEE Symposium on Bioinformatics and Bioengineering (BIBE'04), 2004