The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.02 - March/April (2012 vol.9)
pp: 619-628
Chien-Hao Su , Inst. of Inf. Sci., Acad. Sinica, Taipei, Taiwan
Tse-Yi Wang , Inst. of Inf. Sci., Acad. Sinica, Taipei, Taiwan
Ming-Tsung Hsu , Inst. of Inf. Sci., Acad. Sinica, Taipei, Taiwan
F. C-H Weng , Biodiversity Res. Center, Acad. Sinica, Taipei, Taiwan
Cheng-Yan Kao , Dept. of Comput. Sci. & Inf. Eng., Nat. Taiwan Univ., Taipei, Taiwan
Daryi Wang , Biodiversity Res. Center, Acad. Sinica, Taipei, Taiwan
Huai-Kuang Tsai , Inst. of Inf. Sci., Acad. Sinica, Taipei, Taiwan
ABSTRACT
Metagenomics enables the study of unculturable microorganisms in different environments directly. Discriminating between the compositional differences of metagenomes is an important and challenging problem. Several distance functions have been proposed to estimate the differences based on functional profiles or taxonomic distributions; however, the strengths and limitations of such functions are still unclear. Initially, we analyzed three well-known distance functions and found very little difference between them in the clustering of samples. This motivated us to incorporate suitable normalizations and phylogenetic information into the functions so that we could cluster samples from both real and synthetic data sets. The results indicate significant improvement in sample clustering over that derived by rank-based normalization with phylogenetic information, regardless of whether the samples are from real or synthetic microbiomes. Furthermore, our findings suggest that considering suitable normalizations and phylogenetic information is essential when designing distance functions for estimating the differences between metagenomes. We conclude that incorporating rank-based normalization with phylogenetic information into the distance functions helps achieve reliable clustering results.
INDEX TERMS
Phylogeny, Accuracy, Communities, Bioinformatics, Correlation, Reliability, Computational biology,clustering., Metagenomics, normalization, phylogenetic information, distance functions
CITATION
Chien-Hao Su, Tse-Yi Wang, Ming-Tsung Hsu, F. C-H Weng, Cheng-Yan Kao, Daryi Wang, Huai-Kuang Tsai, "The Impact of Normalization and Phylogenetic Information on Estimating the Distance for Metagenomes", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.9, no. 2, pp. 619-628, March/April 2012, doi:10.1109/TCBB.2011.111
REFERENCES
[1] J. Handelsman, J. Tiedje, L. Alvarez-Cohen, M. Ashburner, I.K.O. Cann, and E.E. DeLong, The New Science of Metagenomics: Revealing the Secrets of Our Microbial Planet. Nat'l Academies Press, 2007.
[2] S. Mitra, J.A. Gilbert, D. Field, and D.H. Huson, “Comparison of Multiple Metagenomes Using Phylogenetic Networks Based on Ecological Indices,” ISME J., vol. 4, pp. 1236-1242, Apr. 2010.
[3] E.F. DeLong, C.M. Preston, T. Mincer, V. Rich, S.J. Hallam, N.U. Frigaard, A. Martinez, M.B. Sullivan, R. Edwards, B.R. Brito, S.W. Chisholm, and D.M. Karl, “Community Genomics among Stratified Microbial Assemblages in the Ocean's Interior,” Science, vol. 311, no. 5760, pp. 496-503, Jan. 2006.
[4] K. Kurokawa, T. Itoh, T. Kuwahara, K. Oshima, H. Toh, A. Toyoda, H. Takami, H. Morita, V.K. Sharma, T.P. Srivastava, T.D. Taylor, H. Noguchi, H. Mori, Y. Ogura, D.S. Ehrlich, K. Itoh, T. Takagi, Y. Sakaki, T. Hayashi, and M. Hattori, “Comparative Metagenomics Revealed Commonly Enriched Gene Sets in Human Gut Microbiomes,” DNA Research, vol. 14, no. 4, pp. 169-181, Aug. 2007.
[5] W. Li, “Analysis and Comparison of Very Large Metagenomes with Fast Clustering and Functional Annotation,” BMC Bioinformatics, vol. 10, article 359, 2009.
[6] S.G. Tringe, C. von Mering, A. Kobayashi, A.A. Salamov, K. Chen, H.W. Chang, M. Podar, J.M. Short, E.J. Mathur, J.C. Detter, P. Bork, P. Hugenholtz, and E.M. Rubin, “Comparative Metagenomics of Microbial Communities,” Science, vol. 308, no. 5721, pp. 554-557, Apr. 2005.
[7] P.B. Eckburg, E.M. Bik, C.N. Bernstein, E. Purdom, L. Dethlefsen, M. Sargent, S.R. Gill, K.E. Nelson, and D.A. Relman, “Diversity of the Human Intestinal Microbial Flora,” Science, vol. 308, no. 5728, pp. 1635-1638, June 2005.
[8] P.J. Turnbaugh, R.E. Ley, M. Hamady, C.M. Fraser-Liggett, R. Knight, and J.I. Gordon, “The Human Microbiome Project,” Nature, vol. 449, no. 7164, pp. 804-810, Oct. 2007.
[9] D.B. Rusch, A.L. Halpern, G. Sutton, K.B. Heidelberg, S. Williamson, S. Yooseph, D. Wu, J.A. Eisen, J.M. Hoffman, K. Remington, K. Beeson, B. Tran, H. Smith, H. Baden-Tillson, C. Stewart, J. Thorpe, J. Freeman, C. Andrews-Pfannkoch, J.E. Venter, K. Li, S. Kravitz, J.F. Heidelberg, T. Utterback, Y.H. Rogers, L.I. Falcon, V. Souza, G. Bonilla-Rosso, L.E. Eguiarte, D.M. Karl, S. Sathyendranath, T. Platt, E. Bermingham, V. Gallardo, G. Tamayo-Castillo, M.R. Ferrari, R.L. Strausberg, K. Nealson, R. Friedman, M. Frazier, and J.C. Venter, “The Sorcerer II Global Ocean Sampling Expedition: Northwest Atlantic through Eastern Tropical Pacific,” PLoS Biology, vol. 5, no. 3, p. e77, Mar. 2007.
[10] V. Kunin, A. Copeland, A. Lapidus, K. Mavromatis, and P. Hugenholtz, “A Bioinformatician's Guide to Metagenomics,” Microbiology and Molecular Biology Rev., vol. 72, no. 4, pp. 557-578, Dec. 2008.
[11] R.D. Sleator, C. Shortall, and C. Hill, “Metagenomics,” Letters in Applied Microbiology, vol. 47, no. 5, pp. 361-366, Nov. 2008.
[12] P.V. Patel, T.A. Gianoulis, R.D. Bjornson, K.Y. Yip, D.M. Engelman, and M.B. Gerstein, “Analysis of Membrane Proteins in Metagenomics: Networks of Correlated Environmental Features and Protein Families,” Genome Research, vol. 20, pp. 960-971, June 2010.
[13] V. Olman, F. Mao, H. Wu, and Y. Xu, “Parallel Clustering Algorithm for Large Data Sets with Applications in Bioinformatics,” IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 6, no. 2, pp. 344-352, Apr.-June 2009.
[14] N. Sharma, Y. Sudarsan, R. Sharma, and G. Singh, “RAPD Analysis of Soil Microbial Diversity in Western Rajasthan,” Current Science, vol. 94, no. 8, pp. 1058-1061, Apr. 2008.
[15] F.D. Gibbons and F.P. Roth, “Judging the Quality of Gene Expression-Based Clustering Methods Using Gene Annotation,” Genome Research, vol. 12, no. 10, pp. 1574-1581, Oct. 2002.
[16] P. D'haeseleer, “How Does Gene Expression Clustering Work?,” Nature Biotechnology, vol. 23, no. 12, pp. 1499-1501, Dec. 2005.
[17] P.A. Vaishampayan, J.V. Kuehl, J.L. Froula, J.L. Morgan, H. Ochman, and M.P. Francino, “Comparative Metagenomics and Population Dynamics of the Gut Microbiota in Mother and Infant,” Genome Biology and Evolution, vol. 2, pp. 53-66, 2010.
[18] F. Angly, B. Rodriguez-Brito, D. Bangor, P. McNairnie, M. Breitbart, P. Salamon, B. Felts, J. Nulton, J. Mahaffy, and F. Rohwer, “PHACCS, an Online Tool for Estimating the Structure and Diversity of Uncultured Viral Communities Using Metagenomic Information,” BMC Bioinformatics, vol. 6, article 41, 2005.
[19] A.P. Martin, “Phylogenetic Approaches for Describing and Comparing the Diversity of Microbial Communities,” Applied and Environmental Microbiology, vol. 68, no. 8, pp. 3673-3682, Aug. 2002.
[20] C. Lozupone, M. Hamady, and R. Knight, “UniFrac—An Online Tool for Comparing Microbial Community Diversity in a Phylogenetic Context,” BMC Bioinformatics, vol. 7, article 371, 2006.
[21] T. Pommier, B. Canback, P. Lundberg, A. Hagstrom, and A. Tunlid, “RAMI: A Tool for Identification and Characterization of Phylogenetic Clusters in Microbial Communities,” Bioinformatics, vol. 25, no. 6, pp. 736-742, Mar. 2009.
[22] S. Yooseph, W. Li, and G. Sutton, “Gene Identification and Protein Classification in Microbial Metagenomic Sequence Data via Incremental Clustering,” BMC Bioinformatics, vol. 9, article 182, 2008.
[23] J.C. Wooley, A. Godzik, and I. Friedberg, “A Primer on Metagenomics,” PLoS Computational Biology, vol. 6, no. 2, p. e1000667, 2010.
[24] G.L. Rosen, B.A. Sokhansanj, R. Polikar, M.A. Bruns, J. Russell, E. Garbarine, S. Essinger, and N. Yok, “Signal Processing for Metagenomics: Extracting Information from the Soup,” Current Genomics, vol. 10, no. 7, pp. 493-510, Nov. 2009.
[25] F.C. Weng, C.H. Su, M.T. Hsu, T.Y. Wang, H.K. Tsai, and D. Wang, “Reanalyze Unassigned Reads in Sanger Based Metagenomic Data Using Conserved Gene Adjacency,” BMC Bioinformatics, vol. 11, article 565, 2010.
[26] F.D. Ciccarelli, T. Doerks, C. von Mering, C.J. Creevey, B. Snel, and P. Bork, “Toward Automatic Reconstruction of a Highly Resolved Tree of Life,” Science, vol. 311, no. 5765, pp. 1283-1287, Mar. 2006.
[27] J. Felsenstein, Inferring Phylogenies. Sinauer Assoc., 2004.
[28] E.A. Dinsdale, R.A. Edwards, D. Hall, F. Angly, M. Breitbart, J.M. Brulc, M. Furlan, C. Desnues, M. Haynes, L. Li, L. McDaniel, M.A. Moran, K.E. Nelson, C. Nilsson, R. Olson, J. Paul, B.R. Brito, Y. Ruan, B.K. Swan, R. Stevens, D.L. Valentine, R.V. Thurber, L. Wegley, B.A. White, and F. Rohwer, “Functional Metagenomic Profiling of Nine Biomes,” Nature, vol. 452, no. 7187, pp. 629-632, Apr. 2008.
[29] A. Brady and S.L. Salzberg, “Phymm and PhymmBL: Metagenomic Phylogenetic Classification with Interpolated Markov Models,” Nature Methods, vol. 6, no. 9, pp. 673-676, Sept. 2009.
[30] K. Tamura, J. Dudley, M. Nei, and S. Kumar, “MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) Software Version 4.0,” Molecular Biology and Evolution, vol. 24, no. 8, pp. 1596-1599, Aug. 2007.
[31] M.L. Sogin et al., “Microbial Diversity in the Deep Sea and the Underexplored ‘Rare Biosphere’,” Proc. Nat'l Academy of Sciences of USA, vol. 103, pp. 12115-12120, 2006.
[32] U. Sauer, L. Bodrossy, and C. Preininger, “Evaluation of Substrate Performance for a Microbial Diagnostic Microarray Using a Four Parameter Ranking,” Analytica Chimica Acta, vol. 632, no. 2, pp. 240-246, Jan. 2009.
[33] C. Spearman, “The Proof and Measurement of Association between Two Things,” Am. J. Psychology, vol. 15, pp. 72-101, 1904.
[34] D. Willner, R.V. Thurber, and F. Rohwer, “Metagenomic Signatures of 86 Microbial and Viral Metagenomes,” Environmental Microbiology, vol. 11, pp. 1752-1766, July 2009.
[35] G. Cardona, M. Llabres, F. Rossello, and G. Valiente, “Nodal Distances for Rooted Phylogenetic Trees,” J. Math. Biology, vol. 61, pp. 253-276, Aug. 2010.
[36] L. Krause, N.N. Diaz, A. Goesmann, S. Kelley, T.W. Nattkemper, F. Rohwer, R.A. Edwards, and J. Stoye, “Phylogenetic Classification of Short Environmental DNA Fragments,” Nucleic Acids Research, vol. 36, no. 7, pp. 2230-2239, Apr. 2008.
[37] D.R. Singleton, M.A. Furlong, S.L. Rathbun, and W.B. Whitman, “Quantitative Comparisons of 16S rRNA Gene Sequence Libraries from Environmental Samples,” Applied and Environmental Microbiology, vol. 67, no. 9, pp. 4374-4376, Sept. 2001.
[38] C. von Mering, P. Hugenholtz, J. Raes, S.G. Tringe, T. Doerks, L.J. Jensen, N. Ward, and P. Bork, “Quantitative Phylogenetic Assessment of Microbial Communities in Diverse Environments,” Science, vol. 315, no. 5815, pp. 1126-1130, Feb. 2007.
[39] P.D. Schloss, S.L. Westcott, T. Ryabin, J.R. Hall, M. Hartmann, E.B. Hollister, R.A. Lesniewski, B.B. Oakley, D.H. Parks, C.J. Robinson, J.W. Sahl, B. Stres, G.G. Thallinger, D.J. Van Horn, and C.F. Weber, “Introducing Mothur: Open-Source, Platform-Independent, Community-Supported Software for Describing and Comparing Microbial Communities,” Applied and Environmental Microbiology, vol. 75, no. 23, pp. 7537-7541, Dec. 2009.
[40] J. Raes, K.U. Foerstner, and P. Bork, “Get the Most Out of Your Metagenome: Computational Analysis of Environmental Sequence Data,” Current Opinion in Microbiology, vol. 10, no. 5, pp. 490-498, Oct. 2007.
[41] J.R. White, N. Nagarajan, and M. Pop, “Statistical Methods for Detecting Differentially Abundant Features in Clinical Metagenomic Samples,” PLoS Computational Biology, vol. 5, no. 4, p. e1000352, Apr. 2009.
[42] D.C. Richter et al., “MetaSim: A Sequencing Simulator for Genomics and Metagenomics,” PLoS One, vol. 3, no. 10, p. e3373, 2008.
[43] D.H. Huson, A.F. Auch, J. Qi, and S.C. Schuster, “MEGAN Analysis of Metagenomic Data,” Genome Research, vol. 17, 377-386, 2007.
[44] T. Ghosh, M. Haque, and S. Mande, “DiScRIBinATE: A Rapid Method for Accurate Taxonomic Classification of Metagenomic Sequences,” BMC Bioinformatics, vol. 11, p. S14, 2010.
[45] M.P. Cummings, M.C. Neel, and K.L. Shaw, “A Genealogical Approach to Quantifying Lineage Divergence,” Evolution, vol. 62, no. 9, pp. 2411-2422, Sept. 2008.
[46] J.M. Janda and S.L. Abbott, “16S rRNA Gene Sequencing for Bacterial Identification in the Diagnostic Laboratory: Pluses, Perils, and Pitfalls,” J. Clinical Microbiology, vol. 45, no. 9, pp. 2761-2764, Sept. 2007.
66 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool