Issue No.01 - January (2008 vol.20)
Clustering is a popular technique for analyzing microarray datasets, with n genes and m experimental conditions. As explored by biologists, there is a real need to identify co-regulated gene clusters, which includes both positive/negative regulated gene clusters. The existing pattern-based and tendency-based clustering approaches cannot be directly applied to find such co-regulated gene clusters, because they are designed for finding positive regulated gene clusters. In this paper, in order to cluster co-regulated genes, we propose a coding scheme which allows us to cluster two genes into the same cluster if they have the same code, where two genes that have the same code can be either positive or negative regulated. Based on the coding scheme, we propose a new algorithm to find maximal subspace co-regulated gene clusters with new pruning techniques. A maximal subspace co-regulated gene cluster clusters a set of genes on a condition sequence such that the cluster is not included in any other subspace co-regulated gene clusters. We conduct extensive experimental studies. Our approach can effectively and efficiently find maximal subspace co-regulated gene clusters. In addition, our approach outperforms the existing approaches to finding positive regulated gene clusters.
Data mining, Clustering, classification, and association rules
Jeffrey Xu Yu, Guoren Wang, Yuhai Zhao, Bin Wang, Ge Yu, "Maximal Subspace Coregulated Gene Clustering", IEEE Transactions on Knowledge & Data Engineering, vol.20, no. 1, pp. 83-98, January 2008, doi:10.1109/TKDE.2007.190670