Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007)
Parallelized Variational EM for Latent Dirichlet Allocation: An Experimental Evaluation of Speed and Scalability
Omaha, Nebraska, USA
October 28-October 31
ISBN: 0-7695-3033-8
Statistical topic models such as the Latent Dirichlet Al- location (LDA) have emerged as an attractive framework to model, visualize and summarize large document collec- tions in a completely unsupervised fashion. Considering the enormous sizes of the modern electronic document col- lections, it is very important that these models are fast and scalable. In this work, we build parallel implementations of the variational EM algorithm for LDA in a multiproces- sor architecture as well as a distributed setting. Our ex- periments on various sized document collections indicate that while both the implementations achieve speed-ups, the distributed version achieves dramatic improvements in both speed and scalability. We also analyze the costs associated with various stages of the EM algorithm and suggest ways to further improve the performance.
Citation:
Ramesh Nallapati, William Cohen, John Lafferty, "Parallelized Variational EM for Latent Dirichlet Allocation: An Experimental Evaluation of Speed and Scalability," icdmw, pp.349-354, Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007), 2007