This Article 
 Bibliographic References 
 Add to: 
Weakly Supervised Joint Sentiment-Topic Detection from Text
June 2012 (vol. 24 no. 6)
pp. 1134-1145
Chenghua Lin, University of Exeter, Exeter
Yulan He, The Open University, Milton Keynes
Richard Everson, University of Exeter, Exeter
Stefan Rüger, The Open University, Milton Keynes
Sentiment analysis or opinion mining aims to use automated tools to detect subjective information such as opinions, attitudes, and feelings expressed in text. This paper proposes a novel probabilistic modeling framework called joint sentiment-topic (JST) model based on latent Dirichlet allocation (LDA), which detects sentiment and topic simultaneously from text. A reparameterized version of the JST model called Reverse-JST, obtained by reversing the sequence of sentiment and topic generation in the modeling process, is also studied. Although JST is equivalent to Reverse-JST without a hierarchical prior, extensive experiments show that when sentiment priors are added, JST performs consistently better than Reverse-JST. Besides, unlike supervised approaches to sentiment classification which often fail to produce satisfactory performance when shifting to other domains, the weakly supervised nature of JST makes it highly portable to other domains. This is verified by the experimental results on data sets from five different domains where the JST model even outperforms existing semi-supervised approaches in some of the data sets despite using no labeled documents. Moreover, the topics and topic sentiment detected by JST are indeed coherent and informative. We hypothesize that the JST model can readily meet the demand of large-scale sentiment analysis from the web in an open-ended fashion.

[1] B. Pang and L. Lee, "Opinion Mining and Sentiment Analysis," J. Foundations and Trends in Information Retrieval, vol. 2, nos. 1/2, pp. 1-135, 2008.
[2] P.D. Turney, "Thumbs Up Or Thumbs Down?: Semantic Orientation Applied to Unsupervised Classification of Reviews," Proc. Assoc. for Computational Linguistics (ACL '01), pp. 417-424, 2001.
[3] B. Pang, L. Lee, and S. Vaithyanathan, "Thumbs Up?: Sentiment Classification Using Machine Learning Techniques," Proc. ACL Conf. Empirical Methods in Natural Language Processing (EMNLP), pp. 79-86, 2002.
[4] B. Pang and L. Lee, "A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts," Proc. 42th Ann. Meeting on Assoc. for Computational Linguistics (ACL), pp. 271-278, 2004.
[5] C. Whitelaw, N. Garg, and S. Argamon, "Using Appraisal Groups for Sentiment Analysis," Proc. 14th ACM Int'l Conf. Information and Knowledge Management (CIKM), pp. 625-631, 2005.
[6] A. Kennedy and D. Inkpen, "Sentiment Classification of Movie Reviews Using Contextual Valence Shifters," Computational Intelligence, vol. 22, no. 2, pp. 110-125, 2006.
[7] J. Blitzer, M. Dredze, and F. Pereira, "Biographies, Bollywood, Boom-Boxes and Blenders: Domain Adaptation for Sentiment Classification," Proc. Assoc. for Computational Linguistics (ACL), pp. 440-447, 2007.
[8] A. Aue and M. Gamon, "Customizing Sentiment Classifiers to New Domains: A Case Study," Proc. Recent Advances in Natural Language Processing (RANLP), 2005.
[9] C. Lin and Y. He, "Joint Sentiment/Topic Model for Sentiment Analysis," Proc. 18th ACM Conf. Information and Knowledge Management (CIKM), pp. 375-384, 2009.
[10] D.M. Blei, A.Y. Ng, and M.I. Jordan, "Latent Dirichlet Allocation," J. Machine Learning Research, vol. 3, pp. 993-1022, 2003.
[11] Q. Mei, X. Ling, M. Wondra, H. Su, and C. Zhai, "Topic Sentiment Mixture: Modeling Facets and Opinions in Weblogs," Proc. 16th Int'l Conf. World Wide Web (WWW), pp. 171-180, 2007.
[12] I. Titov and R. McDonald, "A Joint Model of Text and Aspect Ratings for Sentiment Summarization," Proc. Assoc. Computational Linguistics—Human Language Technology (ACL-HLT), pp. 308-316, 2008.
[13] S. Hayakawa and E. Ehrlich, Choose the Right Word: A Contemporary Guide to Selecting the Precise Word for Every Situation. HarperPerennial, 1994.
[14] S. Li and C. Zong, "Multi-Domain Sentiment Classification," Proc. Assoc. Computational Linguistics—Human Language Technology (ACL-HLT), pp. 257-260, 2008.
[15] R. McDonald, K. Hannan, T. Neylon, M. Wells, and J. Reynar, "Structured Models for Fine-to-Coarse Sentiment Analysis," Proc. Assoc. for Computational Linguistics (ACL), pp. 432-439, 2007.
[16] A. Abbasi, H. Chen, and A. Salem, "Sentiment Analysis in Multiple Languages: Feature Selection for Opinion Classification in Web Forums," ACM Trans. Information Systems, vol. 26, no. 3, pp. 1-34, 2008.
[17] N. Kaji and M. Kitsuregawa, "Automatic Construction of Polarity-Tagged Corpus from HTML Documents," Proc. COLING/ACL on Main Conf. Poster Sessions, pp. 452-459, 2006.
[18] A. Andreevskaia and S. Bergler, "When Specialists and Generalists Work Together: Overcoming Domain Dependence in Sentiment Tagging," Proc. Assoc. Computational Linguistics—Human Language Technology (ACL-HLT), pp. 290-298, 2008.
[19] T. Li, Y. Zhang, and V. Sindhwani, "A Non-Negative Matrix Tri-Factorization Approach to Sentiment Classification with Lexical Prior Knowledge," Proc. Joint Conf. 47th Ann. Meeting of the ACL and the Fourth Int'l Joint Conf. Natural Language Processing of the AFNLP, pp. 244-252, 2009.
[20] I. Titov and R. McDonald, "Modeling Online Reviews with Multi-Grain Topic Models," Proc. 17th Int'l Conf. World Wide Web, pp. 111-120, 2008.
[21] T. Hofmann, "Probabilistic Latent Semantic Indexing," Proc. 22nd Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 50-57, 1999.
[22] M. Steyvers and T. Griffiths, "Probabilistic Topic Models," Handbook of Latent Semantic Analysis, vol. 427, no. 7, pp. 424-440, 2007.
[23] S. Lacoste-Julien, F. Sha, and M. Jordan, "DiscLDA: Discriminative Learning for Dimensionality Reduction and Classification," Proc. Neural Information Processing Systems (NIPS), 2008.
[24] D. Ramage, D. Hall, R. Nallapati, and C. Manning, "Labeled LDA: A Supervised Topic Model for Credit Attribution in Multi-Labeled Corpora," Proc. Conf. Empirical Methods in Natural Language Processing (EMNLP), pp. 248-256, 2009.
[25] H. Wallach, D. Mimno, and A. McCallum, "Rethinking LDA: Why Priors Matter," Proc. Topic Models: Text and Beyond Workshop Neural Information Processing Systems Conf., 2009.
[26] T. Minka, "Estimating a Dirichlet Distribution," technical report, MIT, 2003.
[27] S. Dasgupta and V. Ng, "Topic-Wise, Sentiment-Wise, or Otherwise? Identifying the Hidden Dimension for Unsupervised Text Classification," Proc. Conf. Empirical Methods in Natural Language Processing (EMNLP), pp. 580-589, 2009.

Index Terms:
Sentiment analysis, opinion mining, latent Dirichlet allocation (LDA), joint sentiment-topic (JST) model.
Chenghua Lin, Yulan He, Richard Everson, Stefan Rüger, "Weakly Supervised Joint Sentiment-Topic Detection from Text," IEEE Transactions on Knowledge and Data Engineering, vol. 24, no. 6, pp. 1134-1145, June 2012, doi:10.1109/TKDE.2011.48
Usage of this product signifies your acceptance of the Terms of Use.