The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.10 - October (2011 vol.23)
pp: 1498-1512
Anindya Ghose , New York University, New York
Panagiotis G. Ipeirotis , New York University, New York
ABSTRACT
With the rapid growth of the Internet, the ability of users to create and publish content has created active electronic communities that provide a wealth of product information. However, the high volume of reviews that are typically published for a single product makes harder for individuals as well as manufacturers to locate the best reviews and understand the true underlying quality of a product. In this paper, we reexamine the impact of reviews on economic outcomes like product sales and see how different factors affect social outcomes such as their perceived usefulness. Our approach explores multiple aspects of review text, such as subjectivity levels, various measures of readability and extent of spelling errors to identify important text-based features. In addition, we also examine multiple reviewer-level features such as average usefulness of past reviews and the self-disclosed identity measures of reviewers that are displayed next to a review. Our econometric analysis reveals that the extent of subjectivity, informativeness, readability, and linguistic correctness in reviews matters in influencing sales and perceived usefulness. Reviews that have a mixture of objective, and highly subjective sentences are negatively associated with product sales, compared to reviews that tend to include only subjective or only objective information. However, such reviews are rated more informative (or helpful) by other users. By using Random Forest-based classifiers, we show that we can accurately predict the impact of reviews on sales and their perceived usefulness. We examine the relative importance of the three broad feature categories: “reviewer-related” features, “review subjectivity” features, and “review readability” features, and find that using any of the three feature sets results in a statistically equivalent performance as in the case of using all available features. This paper is the first study that integrates econometric, text mining, and predictive modeling techniques toward a more complete analysis of the information captured by user-generated online reviews in order to estimate their helpfulness and economic impact.
INDEX TERMS
Internet commerce, social media, user-generated content, textmining, word-of-mouth, product reviews, economics, sentiment analysis, online communities.
CITATION
Anindya Ghose, Panagiotis G. Ipeirotis, "Estimating the Helpfulness and Economic Impact of Product Reviews: Mining Text and Reviewer Characteristics", IEEE Transactions on Knowledge & Data Engineering, vol.23, no. 10, pp. 1498-1512, October 2011, doi:10.1109/TKDE.2010.188
REFERENCES
[1] N. Hu, P.A. Pavlou, and J. Zhang, "Can Online Reviews Reveal a Product's True Quality? Empirical Findings and Analytical Modeling of Online Word-of-Mouth Communication," Proc. Seventh ACM Conf. Electronic Commerce (EC '06), pp. 324-330, 2006.
[2] C. Dellarocas, N.F. Awad, and X.M. Zhang, "Exploring the Value of Online Product Ratings in Revenue Forecasting: The Case of Motion Pictures," Working Paper, Robert H. Smith School Research Paper, 2007.
[3] J.A. Chevalier and D. Mayzlin, "The Effect of Word of Mouth on Sales: Online Book Reviews," J. Marketing Research, vol. 43, no. 3, pp. 345-354, Aug. 2006.
[4] D. Reinstein and C.M. Snyder, "The Influence of Expert Reviews on Consumer Demand for Experience Goods: A Case Study of Movie Critics," J. Industrial Economics, vol. 53, no. 1, pp. 27-51, Mar. 2005.
[5] C. Forman, A. Ghose, and B. Wiesenfeld, "Examining the Relationship between Reviews and Sales: The Role of Reviewer Identity Disclosure in Electronic Markets," Information Systems Research, vol. 19, no. 3, pp. 291-313, Sept. 2008.
[6] Y. Liu, "Word of Mouth for Movies: Its Dynamics and Impact on Box Office Revenue," J. Marketing, vol. 70, no. 3, pp. 74-89, July 2006.
[7] W. Duan, B. Gu, and A.B. Whinston, "The Dynamics of Online Word-of-Mouth and Product Sales: An Empirical Investigation of the Movie Industry," J. Retailing, vol. 84, no. 2, pp. 233-242, 2008.
[8] R.G. Hass, "Effects of Source Characteristics on Cognitive Responses and Persuasion," Cognitive Responses in Persuasion, R.E. Petty, T.M. Ostrom, and T.C. Brock, eds., pp. 1-18, Lawrence Erlbaum Assoc., 1981.
[9] S. Chaiken, "Heuristic versus Systematic Information Processing and the Use of Source versus Message Cues in Persuasion," J. Personality and Social Psychology, vol. 39, no. 5, pp. 752-766, 1980.
[10] S. Chaiken, "The Heuristic Model of Persuasion," Proc. Social Influence: The Ontario Symp., M.P. Zanna, J.M. Olson, and C.P. Herman, eds., vol. 5, pp. 3-39, 1987.
[11] J.J. Brown and P.H. Reingen, "Social Ties and Word-of-Mouth Referral Behavior," J. Consumer Research, vol. 14, no. 3, pp. 350-362, Dec. 1987.
[12] R. Spears and M. Lea, "Social Influence and the Influence of the 'Social' in Computer-Mediated Communication," Contexts of Computer-Mediated Communication, M. Lea, ed., pp. 30-65, Harvester Wheatsheaf, June 1992.
[13] S.L. Jarvenpaa and D.E. Leidner, "Communication and Trust in Global Virtual Teams," J. Interactive Marketing, vol. 10, no. 6, pp. 791-815, Nov./Dec. 1999.
[14] K.Y.A. McKenna and J.A. Bargh, "Causes and Consequences of Social Interaction on the Internet: A Conceptual Framework," Media Psychology, vol. 1, no. 3, pp. 249-269, Sept. 1999.
[15] U.M. Dholakia, R.P. Bagozzi, and L.K. Pearo, "A Social Influence Model of Consumer Participation in Network- and Small-Group-Based Virtual Communities," Int'l J. Research in Marketing, vol. 21, no. 3, pp. 241-263, Sept. 2004.
[16] T. Hennig-Thurau, K.P. Gwinner, G. Walsh, and D.D. Gremler, "Electronic Word-of-Mouth via Consumer-Opinion Platforms: What Motivates Consumers to Articulate Themselves on the Internet?," J. Interactive Marketing, vol. 18, no. 1, pp. 38-52, 2004.
[17] M. Ma and R. Agarwal, "Through a Glass Darkly: Information Technology Design, Identity Verification, and Knowledge Contribution in Online Communities," Information Systems Research, vol. 18, no. 1, pp. 42-67, Mar. 2007.
[18] A. Ghose, P.G. Ipeirotis, and A. Sundararajan, "Opinion Mining Using Econometrics: A Case Study on Reputation Systems," Proc. 44th Ann. Meeting of the Assoc. for Computational Linguistics (ACL '07), pp. 416-423, 2007.
[19] N. Archak, A. Ghose, and P.G. Ipeirotis, "Show Me the Money! Deriving the Pricing Power of Product Features by Mining Consumer Reviews," Proc. 12th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '07), pp. 56-65, 2007.
[20] A. Ghose and P.G. Ipeirotis, "Designing Ranking Systems for Consumer Reviews: The Impact of Review Subjectivity on Product Sales and Review Quality," Proc. Workshop Information Technology and Systems, 2006.
[21] A. Ghose and P.G. Ipeirotis, "Designing Novel Review Ranking Systems: Predicting the Usefulness and Impact of Reviews," Proc. Ninth Int'l Conf. Electronic Commerce (ICEC '07), pp. 303-310, 2007.
[22] A. Ghose and P.G. Ipeirotis, "Estimating the Socio-Economic Impact of Product Reviews: Mining Text and Reviewer Characteristics," Center for Digital Economy Research, Technical Report CeDER-08-06, New York Univ., Sept. 2008.
[23] Z. Zhang and B. Varadarajan, "Utility Scoring of Product Reviews," Proc. ACM Int'l Conf. Information and Knowledge Management (CIKM '06), pp. 51-57, 2006.
[24] S.-M. Kim, P. Pantel, T. Chklovski, and M. Pennacchiotti, "Automatically Assessing Review Helpfulness," Proc. Conf. Empirical Methods in Natural Language Processing (EMNLP '06), pp. 423-430, 2006.
[25] J. Liu, Y. Cao, C.-Y. Lin, Y. Huang, and M. Zhou, "Low-Quality Product Review Detection in Opinion Summarization," Proc. Joint Conf. Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp. 334-342, 2007.
[26] J. Otterbacher, "Helpfulness in Online Communities: A Measure of Message Quality," CHI '09: Proc. 27th Int'l Conf. Human Factors in Computing Systems, pp. 955-964, 2009.
[27] Y. Liu, X. Huang, A. An, and X. Yu, "Modeling and Predicting the Helpfulness of Online Reviews," Proc. Eighth IEEE Int'l Conf. Data Mining (ICDM '08), pp. 443-452, 2008.
[28] M. Weimer, I. Gurevych, and M. Mühlhäuser, "Automatically Assessing the Post Quality in Online Discussions on Software," Proc. 44th Ann. Meeting of the Assoc. for Computational Linguistics (ACL '07), pp. 125-128, 2007.
[29] M. Weimer and I. Gurevych, "Predicting the Perceived Quality of Web Forum Posts," Proc. Conf. Recent Advances in Natural Language Processing (RANLP '07), 2007.
[30] J. Jeon, W.B. Croft, J.H. Lee, and S. Park, "A Framework to Predict the Quality of Answers with Non-Textual Features," Proc. 29th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR '06), pp. 228-235, 2006.
[31] L. Hoang, J.-T. Lee, Y.-I. Song, and H.-C. Rim, "A Model for Evaluating the Quality of User-Created Documents," Proc. Fourth Asia Information Retrieval Symp. (AIRS '08), pp. 496-501, 2008.
[32] Y.Y. Hao, Y.J. Li, and P. Zou, "Why Some Online Product Reviews Have No Usefulness Rating?," Proc. Pacific Asia Conf. Information Systems (PACIS '09), 2009.
[33] O. Tsur and A. Rappoport, "Revrank: A Fully Unsupervised Algorithm for Selecting the Most Helpful Book Reviews," Proc. Third Int'l AAAI Conf. Weblogs and Social Media (ICWSM '09), 2009.
[34] A. Ghose and A. Sundararajan, "Evaluating Pricing Strategy Using E-Commerce Data: Evidence and Estimation Challenges," Statistical Science, vol. 21, no. 2, pp. 131-142, May 2006.
[35] V. Hatzivassiloglou and K.R. McKeown, "Predicting the Semantic Orientation of Adjectives," Proc. 38th Ann. Meeting of the Assoc. for Computational Linguistics (ACL '97), pp. 174-181, 1997.
[36] M. Hu and B. Liu, "Mining and Summarizing Customer Reviews," Proc. 10th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '04), pp. 168-177, 2004.
[37] S.-M. Kim and E. Hovy, "Determining the Sentiment of Opinions," Proc. 20th Int'l Conf. Computational Linguistics (COLING '04), pp. 1367-1373, 2004.
[38] P.D. Turney, "Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews," Proc. 40th Ann. Meeting of the Assoc. for Computational Linguistics (ACL '02), pp. 417-424, 2002.
[39] B. Pang and L. Lee, "Thumbs Up? Sentiment Classification Using Machine Learning Techniques," Proc. Conf. Empirical Methods in Natural Language Processing (EMNLP '02), 2002.
[40] K. Dave, S. Lawrence, and D.M. Pennock, "Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews," Proc. 12th Int'l Conf. World Wide Web (WWW '03), pp. 519-528, 2003.
[41] B. Pang and L. Lee, "A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts," Proc. 42nd Ann. Meeting of the Assoc. for Computational Linguistics (ACL '04), pp. 271-278, 2004.
[42] H. Cui, V. Mittal, and M. Datar, "Comparative Experiments on Sentiment Classification for Online Product Reviews," Proc. 21st Nat'l Conf. Artificial Intelligence (AAAI '06), 2006.
[43] B. Pang and L. Lee, "Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales," Proc. 43rd Ann. Meeting of the Assoc. for Computational Linguistics (ACL '05), 2005.
[44] T. Wilson, J. Wiebe, and R. Hwa, "Recognizing Strong and Weak Opinion Clauses," Computational Intelligence, vol. 22, no. 2, pp. 73-99, May 2006.
[45] K. Nigam and M. Hurst, "Towards a Robust Metric of Opinion," Proc. AAAI Spring Symp. Exploring Attitude and Affect in Text, pp. 598-603, 2004.
[46] B. Snyder and R. Barzilay, "Multiple Aspect Ranking Using the Good Grief Algorithm," Proc. Human Language Technology Conf. North Am. Chapter of the Assoc. of Computational Linguistics (HLT-NAACL '07), 2007.
[47] S. White, "The 2003 National Assessment of Adult Literacy (NAAL)," Center for Education Statistics (NCES), Technical Report NCES 2003495rev, US Dept. of Education, Inst. of Education Sciences, http://nces.ed.gov/pubsearchpubsinfo.asp?pubid= 2003495rev , Mar. 2003.
[48] W.H. DuBay, The Principles of Readability, Impact Information, http://www.nald.ca/library/research/readab readab.pdf, 2004.
[49] J.A. Chevalier and A. Goolsbee, "Measuring Prices and Price Competition Online: Amazon.com and BarnesandNoble.com," Quantitative Marketing and Economics, vol. 1, no. 2, pp. 203-222, 2003.
[50] J.M. Wooldridge, Econometric Analysis of Cross Section and Panel Data. The MIT Press, 2001.
[51] C.J.C. Burges, "A Tutorial on Support Vector Machines for Pattern Recognition," Data Mining and Knowledge Discovery, vol. 2, no. 2, pp. 121-167, June 1998.
[52] L. Breiman, "Random Forests," Machine Learning, vol. 45, no. 1, pp. 5-32, Oct. 2001.
[53] R. Caruana and A. Niculescu-Mizil, "An Empirical Comparison of Supervised Learning Algorithms," Proc. 23rd Int'l Conf. Machine Learning (ICML '06), pp. 161-168, 2006.
[54] R. Caruana, N. Karampatziakis, and A. Yessenalina, "An Empirical Evaluation of Supervised Learning in High Dimensions," Proc. 25th Int'l Conf. Machine Learning (ICML '08), 2008.
[55] C. Danescu-Niculescu-Mizil, G. Kossinets, J. Kleinberg, and L. Lee, "How Opinions Are Received by Online Communities: A Case Study on Amazon.com Helpfulness Votes," Proc. 18th Int'l Conf. World Wide Web (WWW '09), pp. 141-150, 2009.
[56] W. Shen, "Essays on Online Reviews: The Strategic Behaviors of Online Reviewers to Compete for Attention, and the Temporal Pattern of Online Reviews," PhD proposal, Krannert Graduate School of Management, Purdue Univ., 2008.
[57] Q. Miao, Q. Li, and R. Dai, "Amazing: A Sentiment Mining and Retrieval System," Expert Systems with Applications, vol. 36, no. 3, pp. 7192-7198, 2009.
[58] P. Victor, C. Cornelis, M. De Cock, and A. Teredesai, "Trust- and Distrust-Based Recommendations for Controversial Reviews," Proc. WebSci '09: Soc. On-Line, 2009.
[59] S.A. Yahia, A.Z. Broder, and A. Galland, "Reviewing the Reviewers: Characterizing Biases and Competencies Using Socially Meaningful Attributes," Proc. Assoc. for the Advancement of Artificial Intelligence (AAAI) Spring Symp., 2008.
22 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool