The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.04 - April (2012 vol.24)
pp: 720-734
Yang Liu , Shandong University, Jinan
Xiaohui Yu , Shandong University, Jinan and York University, Toronto
Aijun An , York University, Toronto
ABSTRACT
Posting reviews online has become an increasingly popular way for people to express opinions and sentiments toward the products bought or services received. Analyzing the large volume of online reviews available would produce useful actionable knowledge that could be of economic values to vendors and other interested parties. In this paper, we conduct a case study in the movie domain, and tackle the problem of mining reviews for predicting product sales performance. Our analysis shows that both the sentiments expressed in the reviews and the quality of the reviews have a significant impact on the future sales performance of products in question. For the sentiment factor, we propose Sentiment PLSA (S-PLSA), in which a review is considered as a document generated by a number of hidden sentiment factors, in order to capture the complex nature of sentiments. Training an S-PLSA model enables us to obtain a succinct summary of the sentiment information embedded in the reviews. Based on S-PLSFA, we propose ARSA, an Autoregressive Sentiment-Aware model for sales prediction. We then seek to further improve the accuracy of prediction by considering the quality factor, with a focus on predicting the quality of a review in the absence of user-supplied indicators, and present ARSQA, an Autoregressive Sentiment and Quality Aware model, to utilize sentiments and quality for predicting product sales performance. Extensive experiments conducted on a large movie data set confirm the effectiveness of the proposed approach.
INDEX TERMS
Review mining, sentiment analysis, prediction.
CITATION
Yang Liu, Xiaohui Yu, Aijun An, "Mining Online Reviews for Predicting Sales Performance: A Case Study in the Movie Domain", IEEE Transactions on Knowledge & Data Engineering, vol.24, no. 4, pp. 720-734, April 2012, doi:10.1109/TKDE.2010.269
REFERENCES
[1] D. Gruhl, R. Guha, R. Kumar, J. Novak, and A. Tomkins, "The Predictive Power of Online Chatter," Proc. 11th ACM SIGKDD Int'l Conf. Knowledge Discovery in Data Mining (KDD), pp. 78-87, 2005.
[2] A. Ghose and P.G. Ipeirotis, "Designing Novel Review Ranking Systems: Predicting the Usefulness and Impact of Reviews," Proc. Ninth Int'l Conf. Electronic Commerce (ICEC), pp. 303-310, 2007.
[3] Y. Liu, X. Huang, A. An, and X. Yu, "ARSA: A Sentiment-Aware Model for Predicting Sales Performance Using Blogs," Proc. 30th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR), pp. 607-614, 2007.
[4] Y. Liu, X. Yu, X. Huang, and A. An, "Blog Data Mining: The Predictive Power of Sentiments," Data Mining for Business Applications, pp. 183-195, Springer, 2009.
[5] D. Gruhl, R. Guha, D. Liben-Nowell, and A. Tomkins, "Information Diffusion through Blogspace," Proc. 13th Int'l Conf. World Wide Web (WWW), pp. 491-501, 2004.
[6] T. Hofmann, "Probabilistic Latent Semantic Analysis," Proc. Uncertainty in Artificial Intelligence (UAI), 1999.
[7] C. Whitelaw, N. Garg, and S. Argamon, "Using Appraisal Groups for Sentiment Analysis," Proc. 14th ACM Int'l Conf. Information and Knowledge Management (CIKM), pp. 625-631, 2005.
[8] W. Enders, Applied Econometric Time Series, second ed. Wiley, 2004.
[9] L. Cao, C. Zhang, Q. Yang, D. Bell, M. Vlachos, B. Taneri, E. Keogh, P.S. Yu, N. Zhong, M.Z. Ashrafi, D. Taniar, E. Dubossarsky, and W. Graco, "Domain-Driven, Actionable Knowledge Discovery," IEEE Intelligent Systems, vol. 22, no. 4, pp. 78-88, July/Aug. 2007.
[10] L. Cao, Y. Zhao, H. Zhang, D. Luo, C. Zhang, and E.K. Park, "Flexible Frameworks for Actionable Knowledge Discovery," IEEE Trans. Knowledge and Data Eng., vol. 22, no. 9, pp. 1299-1312, Sept. 2009.
[11] B. Pang, L. Lee, and S. Vaithyanathan, "Thumbs Up? Sentiment Classification Using Machine Learning Techniques," Proc. ACL-02 Conf. Empirical Methods in Natural Language Processing (EMNLP), 2002.
[12] B. Pang and L. Lee, "A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts," Proc. 42nd Ann. Meeting on Assoc. for Computational Linguistics (ACL), pp. 271-278, 2004.
[13] J. Kamps and M. Marx, "Words with Attitude," Proc. First Int'l Conf. Global WordNet, pp. 332-341, 2002.
[14] P.D. Turney, "Thumbs Up or Thumbs Down?: Semantic Orientation Applied to Unsupervised Classification of Reviews," Proc. 40th Ann. Meeting on Assoc. for Computational Linguistics (ACL), pp. 417-424, 2001.
[15] B. Pang and L. Lee, "Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales," Proc. 43rd Ann. Meeting on Assoc. for Computational Linguistics (ACL), pp. 115-124, 2005.
[16] Z. Zhang and B. Varadarajan, "Utility Scoring of Product Reviews," Proc. 15th ACM Int'l Conf. Information and Knowledge Management (CIKM), pp. 51-57, 2006.
[17] B. Liu, M. Hu, and J. Cheng, "Opinion Observer: Analyzing and Comparing Opinions on the Web," Proc. 14th Int'l Conf. World Wide Web (WWW), pp. 342-351, 2005.
[18] N. Archak, A. Ghose, and P.G. Ipeirotis, "Show Me the Money!: Deriving the Pricing Power of Product Features by Mining Consumer Reviews," Proc. 13th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD), pp. 56-65, 2007.
[19] J.A. Chevalier and D. Mayzlin, "The Effect of Word of Mouth on Sales: Online Book Reviews," J. Marketing Research, vol. 43, no. 3, pp. 345-354, Aug. 2006.
[20] C. Dellarocas, X.M. Zhang, and N.F. Awad, "Exploring the Value of Online Product Ratings in Revenue Forecasting: The Case of Motion Pictures," J. Interactive Marketing, vol. 21, no. 4, pp. 23-45, 2007.
[21] S. Rosen, "Hedonic Prices and Implicit Markets: Product Differentiation in Pure Competition," J. Political Economy, vol. 82, no. 1, pp. 34-55, 1974.
[22] N.Z. Foutz and W. Jank, "The Wisdom of Crowds: Pre-Release Forecasting via Functional Shape Analysis of the Online Virtual Stock Market," Technical Report 07-114 Marketing Science Inst. of Reports, 2007.
[23] N.Z. Foutz and W. Jank, "Pre-Release Demand Forecasting for Motion Pictures Using Functional Shape Analysis of Virtual Stock Markets," Marketing Science, to be published, 2010.
[24] N. Jindal and B. Liu, "Opinion Spam and Analysis," Proc. Int'l Conf. Web Search and Web Data Mining (WSDM), pp. 219-230, 2008.
[25] J. Liu, Y. Cao, C.-Y. Lin, Y. Huang, and M. Zhou, "Low-Quality Product Review Detection in Opinion Summarization," Proc. Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP), pp. 334-342, 2007.
[26] C. Elkan, Method and System for Selecting Documents by Measuring Document Quality. US patent 7,200,606, Washington, D.C.: Patent and Trademark Office, Apr. 2007.
[27] B. Sarwar, G. Karypis, J. Konstan, and J. Reidl, "Item-Based Collaborative Filtering Recommendation Algorithms," Proc. 10th Int'l Conf. World Wide Web (WWW), pp. 285-295, 2001.
[28] T. Hofmann, "Probabilistic Latent Semantic Indexing," Proc. 22nd Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR), pp. 50-57, 1999.
[29] A. Popescul, L.H. Ungar, D.M. Pennock, and S. Lawrence, "Probabilistic Models for Unified Collaborative and Content-Based Recommendation in Sparse-Data Environments," Proc. 17th Conf. in Uncertainty in Artificial Intelligence (UAI), pp. 437-444, 2001.
[30] J. Basilico and T. Hofmann, "Unifying Collaborative and Content-Based Filtering," Proc. 21st Int'l Conf. Machine Learning (ICML), p. 9, 2004.
[31] X. Jin, Y. Zhou, and B. Mobasher, "A Maximum Entropy Web Recommendation System: Combining Collaborative and Content Features," Proc. 11th ACM SIGKDD Int'l Conf. Knowledge Discovery in Data Mining (KDD), pp. 612-617, 2005.
[32] A.P. Dempster, N.M. Laird, and D.B. Rubin, "Maximum Likelihood from Incomplete Data via the $em$ Algorithm," J. Royal Statistical Soc., vol. 39, no. 1, pp. 1-38, 1977.
[33] Y. Liu, X. Huang, A. An, and X. Yu, "Modeling and Predicting the Helpfulness of Online Reviews," Proc. Eighth IEEE Int'l Conf. Data Mining (ICDM), pp. 443-452, 2008.
[34] W. Jank, G. Shmueli, and S. Wang, "Dynamic, Real-Time Forecasting of Online Auctions via Functional Models," Proc. 12th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD), pp. 580-585, 2006.
[35] D. Blei, A. Ng, and M. Jordan, "Latent Dirichlet Allocation," J. Machine Learning Research, vol. 3, pp. 993-1022, 2003.
[36] S.-M. Kim, P. Pantel, T. Chklovski, and M. Pennacchiotti, "Automatically Assessing Review Helpfulness," Proc. Conf. Empirical Methods in Natural Language Processing (EMNLP), pp. 423-430, 2006.
[37] B. Liu, M. Hu, and J. Cheng, "Opinion Observer: Analyzing and Comparing Opinions on the Web," Proc. 14th Int'l Conf. World Wide Web (WWW), pp. 342-351, 2005.
[38] L. Cao, "Domain Driven Data Mining (d3m)," Proc. IEEE Int'l Conf. Data Mining Workshops (ICDM), pp. 74-76, 2008.
5 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool