The Community for Technology Leaders
RSS Icon
Issue No.06 - June (2010 vol.32)
pp: 996-1011
Iulian Pruteanu-Malinici , Duke University, Durham
Lu Ren , Duke University, Durham
John Paisley , Duke University, Durham
Eric Wang , Duke University, Durham
Lawrence Carin , Duke University, Durham
We consider the problem of inferring and modeling topics in a sequence of documents with known publication dates. The documents at a given time are each characterized by a topic and the topics are drawn from a mixture model. The proposed model infers the change in the topic mixture weights as a function of time. The details of this general framework may take different forms, depending on the specifics of the model. For the examples considered here, we examine base measures based on independent multinomial-Dirichlet measures for representation of topic-dependent word counts. The form of the hierarchical model allows efficient variational Bayesian inference, of interest for large-scale problems. We demonstrate results and make comparisons to the model when the dynamic character is removed, and also compare to latent Dirichlet allocation (LDA) and Topics over Time (TOT). We consider a database of Neural Information Processing Systems papers as well as the US Presidential State of the Union addresses from 1790 to 2008.
Hierarchical models, variational Bayes, Dirichlet process, text modeling.
Iulian Pruteanu-Malinici, Lu Ren, John Paisley, Eric Wang, Lawrence Carin, "Hierarchical Bayesian Modeling of Topics in Time-Stamped Documents", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.32, no. 6, pp. 996-1011, June 2010, doi:10.1109/TPAMI.2009.125
[1] Q. An, E. Wang, I. Shterev, L. Carin, and D.B. Dunson, "Hierarchical Kernel Stick-Breaking Process for Multi-Task Image Analysis," Proc. 25th Int'l Conf. Machine Learning, 2008.
[2] M.J. Beal, "Variational Algorithms for Approximate Bayesian Inference," PhD thesis, Gatsby Computational Neuroscience Unit, Univ. College London, 2003.
[3] D.M. Blei and M.I. Jordan, "Variational Methods for the Dirichlet Process," Proc. 21st Int'l Conf. Machine Learning, 2004.
[4] D.M. Blei and J.D. Lafferty, "Dynamic Topic Models," Proc. 23rd Int'l Conf. Machine Learning, pp. 113-120, 2006.
[5] D.M. Blei and J.D. Lafferty, "A Correlated Topic Model of Science," The Annals of Applied Statistics, vol. 1, no. 1, pp. 17-35, 2007.
[6] D.M. Blei, A.Y. Ng, and M.I. Jordan, "Latent Dirichlet Allocation," J. Machine Learning Research, vol. 3, pp. 993-1022, 2003.
[7] J.F. Canny and T.L. Rattenbury, "A Dynamic Topic Model for Document Segmentation," technical report, Dept. of Electrical Eng. and Computer Sciences, Univ. of California at Berkeley, 2006.
[8] D.B. Dunson, "Bayesian Dynamic Modeling of Latent Trait Distributions," Biostatistics, vol. 7, pp. 551-568, 2006.
[9] D.B. Dunson and J.-H. Park, "KernelStick-Breaking Processes," Biometrika, vol. 95, pp. 307-323, 2007.
[10] A. Gruber, M. Rosen-Zvi, and Y. Weiss, "Hidden Topic Markov Models," Proc. Int'l Conf. Artificial Intelligence and Statistics, 2007.
[11] T. Hofmann, "Probabilistic Latent Semantic Analysis," Proc. Conf. Uncertainty in Artificial Intelligence, 1999.
[12] J. Ishwaran and L. James, "Gibbs Sampling Methods for Stick-Breaking Priors," J. Am. Statistical Assoc., vol. 96, p. 161174, 2001.
[13] J.-H. Park and D.B. Dunson, "Bayesian Generalized Product Partition Model," Statistica Sinica, 2009.
[14] M.L. Pennell, and D.B. Dunson, "Bayesian Semiparametric Dynamic Frailty Models for Multiple Event Time Data," Biometrics, vol. 6, pp. 1044-1052, 2006.
[15] L. Ren, D.B. Dunson, and L. Carin, "The Dynamic Hierarchical Dirichlet Process," Proc. Int'l Conf. Machine Learning, 2008.
[16] J. Sethuraman, "A Constructive Definition of Dirichlet Priors," Statistica Sinica, vol. 4, pp. 639-650, 1994.
[17] N. Srebro and S. Roweis, "Time-Varying Topic Models Using Dependent Dirichlet Processes," technical report, Dept. of Computer Science, Univ. of Toronto, 2005.
[18] Y.W. Teh, M.I. Jordan, M.J. Beal, and D.M. Blei, "Hierarchical Dirichlet Processes," J. Am. Statistical Assoc., vol. 101, pp. 1566-1582, 2005.
[19] C.J. van Rijsbergen, S.E. Robertson, and M.F. Porter, Information Retrieval, second ed., vol. 6, pp. 111-143. Butterworths, 1979.
[20] X. Wang and A. McCallum, "Topics over Time: A Non-Markov Continuous-Time Model of Topical Trends," Proc. 12th ACM SIGKDD, pp. 424-433, 2006.
[21] M. Welling, I. Porteous, and E. Bart, "Infinite State Bayes-Nets for Structured Domains," Proc. Int'l Conf. Neural Information Processing Systems, 2007.
[22] J. Winn and C.M. Bishop, "Variational Message Passing," J. Machine Learning Research, vol. 6, pp. 661-694, 2005.
[23] J. Zhang, Z. Ghahramani, and Y. Yang, "A Probabilistic Model for Online Document Clustering with Application to Novelty Detection," Proc. Neural Information Processing Systems, 2004.
23 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool