Issue No. 02 - March/April (2012 vol. 9)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TCBB.2011.111
Chien-Hao Su , Inst. of Inf. Sci., Acad. Sinica, Taipei, Taiwan
Tse-Yi Wang , Inst. of Inf. Sci., Acad. Sinica, Taipei, Taiwan
Ming-Tsung Hsu , Inst. of Inf. Sci., Acad. Sinica, Taipei, Taiwan
F. C-H Weng , Biodiversity Res. Center, Acad. Sinica, Taipei, Taiwan
Cheng-Yan Kao , Dept. of Comput. Sci. & Inf. Eng., Nat. Taiwan Univ., Taipei, Taiwan
Daryi Wang , Biodiversity Res. Center, Acad. Sinica, Taipei, Taiwan
Huai-Kuang Tsai , Inst. of Inf. Sci., Acad. Sinica, Taipei, Taiwan
Metagenomics enables the study of unculturable microorganisms in different environments directly. Discriminating between the compositional differences of metagenomes is an important and challenging problem. Several distance functions have been proposed to estimate the differences based on functional profiles or taxonomic distributions; however, the strengths and limitations of such functions are still unclear. Initially, we analyzed three well-known distance functions and found very little difference between them in the clustering of samples. This motivated us to incorporate suitable normalizations and phylogenetic information into the functions so that we could cluster samples from both real and synthetic data sets. The results indicate significant improvement in sample clustering over that derived by rank-based normalization with phylogenetic information, regardless of whether the samples are from real or synthetic microbiomes. Furthermore, our findings suggest that considering suitable normalizations and phylogenetic information is essential when designing distance functions for estimating the differences between metagenomes. We conclude that incorporating rank-based normalization with phylogenetic information into the distance functions helps achieve reliable clustering results.
Phylogeny, Accuracy, Communities, Bioinformatics, Correlation, Reliability, Computational biology
Chien-Hao Su et al., "The Impact of Normalization and Phylogenetic Information on Estimating the Distance for Metagenomes," in IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 9, no. 2, pp. 619-628, 2012.