|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
2011 IEEE 11th International Conference on Data Mining
Learning Dirichlet Processes from Partially Observed Groups
Vancouver, Canada
December 11-December 14
ISBN: 978-0-7695-4408-3
| ASCII Text | x | ||
| Avinava Dubey, Indrajit Bhattacharya, Mrinal Das, Tanveer Faruquie, Chiranjib Bhattacharyya, "Learning Dirichlet Processes from Partially Observed Groups," Data Mining, IEEE International Conference on, pp. 141-150, 2011 IEEE 11th International Conference on Data Mining, 2011. | |||
| BibTex | x | ||
| @article{ 10.1109/ICDM.2011.85, author = {Avinava Dubey and Indrajit Bhattacharya and Mrinal Das and Tanveer Faruquie and Chiranjib Bhattacharyya}, title = {Learning Dirichlet Processes from Partially Observed Groups}, journal ={Data Mining, IEEE International Conference on}, volume = {0}, year = {2011}, issn = {1550-4786}, pages = {141-150}, doi = {http://doi.ieeecomputersociety.org/10.1109/ICDM.2011.85}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - CONF JO - Data Mining, IEEE International Conference on TI - Learning Dirichlet Processes from Partially Observed Groups SN - 1550-4786 SP141 EP150 A1 - Avinava Dubey, A1 - Indrajit Bhattacharya, A1 - Mrinal Das, A1 - Tanveer Faruquie, A1 - Chiranjib Bhattacharyya, PY - 2011 KW - topic analysis KW - grouped data KW - partial observations KW - Dirichlet Process VL - 0 JA - Data Mining, IEEE International Conference on ER - | |||
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDM.2011.85
Motivated by the task of vernacular news analysis using known news topics from national news-papers, we study the task of topic analysis, where given source datasets with observed topics, data items from a target dataset need to be assigned either to observed source topics or to new ones. Using Hierarchical Dirichlet Processes for addressing this task imposes unnecessary and often inappropriate generative assumptions on the observed source topics. In this paper, we explore Dirichlet Processes with partially observed groups (POG-DP). POG-DP avoids modeling the given source topics. Instead, it directly models the conditional distribution of the target data as a mixture of a Dirichlet Process and the posterior distribution of a Hierarchical Dirichlet Process with known groups and topics. This introduces coupling between selection probabilities of all topics within a source, leading to effective identification of source topics. We further improve on this with a Combinatorial Dirichlet Process with partially observed groups (POG-CDP) that captures finer grained coupling between related topics by choosing intersections between sources. We evaluate our models in three different real-world applications. Using extensive experimentation, we compare against several baselines to show that our model performs significantly better in all three applications.
Index Terms:
topic analysis, grouped data, partial observations, Dirichlet Process
Citation:
Avinava Dubey, Indrajit Bhattacharya, Mrinal Das, Tanveer Faruquie, Chiranjib Bhattacharyya, "Learning Dirichlet Processes from Partially Observed Groups," icdm, pp.141-150, 2011 IEEE 11th International Conference on Data Mining, 2011
Usage of this product signifies your acceptance of the Terms of Use.
