This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Robust Bayesian Clustering for Replicated Gene Expression Data
Sept.-Oct. 2012 (vol. 9 no. 5)
pp. 1504-1514
Jianyong Sun, Centre for Plant Integrative Biol. (CPIB), Univ. of Nottingham, Nottingham, UK
Jonathan M. Garibaldi, Sch. of Comput. Sci., Univ. of Nottingham, Nottingham, UK
Kim Kenobi, Centre for Plant Integrative Biol. (CPIB), Univ. of Nottingham, Nottingham, UK
Experimental scientific data sets, especially biology data, usually contain replicated measurements. The replicated measurements for the same object are correlated, and this correlation must be carefully dealt with in scientific analysis. In this paper, we propose a robust Bayesian mixture model for clustering data sets with replicated measurements. The model aims not only to accurately cluster the data points taking the replicated measurements into consideration, but also to find the outliers (i.e., scattered objects) which are possibly required to be studied further. A tree-structured variational Bayes (VB) algorithm is developed to carry out model fitting. Experimental studies showed that our model compares favorably with the infinite Gaussian mixture model, while maintaining computational simplicity. We demonstrate the benefits of including the replicated measurements in the model, in terms of improved outlier detection rates in varying measurement uncertainty conditions. Finally, we apply the approach to clustering biological transcriptomics mRNA expression data sets with replicated measurements.
Index Terms:
trees (mathematics),Bayes methods,biology computing,Gaussian processes,genetics,molecular biophysics,RNA,biological transcriptomics mRNA expression data sets,robust Bayesian clustering,replicated gene expression data,experimental scientific data sets,biology data,scientific analysis,robust Bayesian mixture model,tree-structured variational Bayes algorithm,infinite Gaussian mixture model,computational simplicity,Clustering algorithms,Robustness,Biological system modeling,Bayesian methods,Tin,Approximation methods,Data models,gene expression data.,Replicated measurement,clustering,robust clustering,outlier detection,variational Bayes
Citation:
Jianyong Sun, Jonathan M. Garibaldi, Kim Kenobi, "Robust Bayesian Clustering for Replicated Gene Expression Data," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 9, no. 5, pp. 1504-1514, Sept.-Oct. 2012, doi:10.1109/TCBB.2012.85
Usage of this product signifies your acceptance of the Terms of Use.