From the July/August 2012 Issue
Exploiting the Functional and Taxonomic Structure of Genomic Data by Probabilistic Topic Modeling
By Xin Chen, Xiaohua Hu, Tze Y. Lim, Xiajiong Shen, E.K. Park, & Gail L. Rosen
In this paper, we present a method that enable both homology-based approach and composition-based approach to further study the functional core (i.e., microbial core and gene core, correspondingly). In the proposed method, the identification of major functionality groups is achieved by generative topic modeling, which is able to extract useful information from unlabeled data. We first show that generative topic model can be used to model the taxon abundance information obtained by homology-based approach and study the microbial core. The model considers each sample as a "document," which has a mixture of functional groups, while each functional group (also known as a "latent topic”) is a weight mixture of species.
View the PDF of this article
View this issue in the digital library