This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Hierarchical Clustering of High- Throughput Expression Data Based on General Dependences
July-Aug. 2013 (vol. 10 no. 4)
pp. 1080-1085
Tianwei Yu, Dept. of Biostat. & Bioinf., Emory Univ., Atlanta, GA, USA
Hesen Peng, Dept. of Biostat. & Bioinf., Emory Univ., Atlanta, GA, USA
High-throughput expression technologies, including gene expression array and liquid chromatography--mass spectrometry (LC-MS) and so on, measure thousands of features, i.e., genes or metabolites, on a continuous scale. In such data, both linear and nonlinear relations exist between features. Nonlinear relations can reflect critical regulation patterns in the biological system. However, they are not identified and utilized by traditional clustering methods based on linear associations. Clustering based on general dependences, i.e., both linear and nonlinear relations, is hampered by the high dimensionality and high noise level of the data. We developed a sensitive nonparametric measure of general dependence between (groups of) random variables in high dimensions. Based on this dependence measure, we developed a hierarchical clustering method. In simulation studies, the method outperformed correlation- and mutual information (MI)-based hierarchical clustering methods in clustering features with nonlinear dependences. We applied the method to a microarray data set measuring the gene expression in cell-cycle time series to show it generates biologically relevant results. The R code is available at http://userwww.service.emory.edu/~tyu8/GDHC.
Index Terms:
statistical analysis,bioinformatics,cellular biophysics,genetics,lab-on-a-chip,cell-cycle time series,high- throughput expression data,general dependence sensitive nonparametric measure,high-throughput expression technology,gene expression array,liquid chromatography-mass spectrometry,LC-MS method,metabolite,feature linear relation,feature nonlinear relation,biological system critical regulation pattern,linear association,data high dimensionality effect,data high noise level effect,high dimension random variable,simulation study,correlation-based hierarchical clustering method,mutual information-based hierarchical clustering method,feature nonlinear dependence clustering,microarray data set,gene expression measurement,Noise,Vectors,Couplings,Clustering methods,Random variables,Standards,Bioinformatics,similarity measures,Algorithms,clustering
Citation:
Tianwei Yu, Hesen Peng, "Hierarchical Clustering of High- Throughput Expression Data Based on General Dependences," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 10, no. 4, pp. 1080-1085, July-Aug. 2013, doi:10.1109/TCBB.2013.99
Usage of this product signifies your acceptance of the Terms of Use.