loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
2009 Ninth IEEE International Conference on Data Mining
Modeling Syntactic Structures of Topics with a Nested HMM-LDA
Miami, Florida
December 06-December 09
ISBN: 978-0-7695-3895-2
Latent Dirichlet Allocation (LDA) is a commonly used topic modeling method for text analysis and mining. Standard LDA treats documents as bags of words, ignoring the syntactic structures of sentences. In this paper, we propose a hybrid model that embeds hidden Markov models (HMMs) within LDA topics to jointly model both the topics and the syntactic structures within each topic. Our model is general and subsumes standard LDA and HMM as special cases. Compared with standard LDA and HMM, our model can simultaneously discover both topic-specific content words and background functional words shared among topics. Our model can also automatically separate content words that play different roles within a topic. Using perplexity as evaluation metric, our model returns lower perplexity for unseen test documents compared with standard LDA, which shows its better generalization power than LDA.
Citation:
Jing Jiang, "Modeling Syntactic Structures of Topics with a Nested HMM-LDA," icdm, pp.824-829, 2009 Ninth IEEE International Conference on Data Mining, 2009
Usage of this product signifies your acceptance of the Terms of Use.